[bars] [mmra] 911 outage root cause revealed
Greg Troxel
gdt at lexort.com
Thu Jun 20 08:27:24 CDT 2024
"Larry Banks via groups.io" <larryb.w1dyj=verizon.net at groups.io> writes:
> On 6/19/2024 22:47, JWAHAR BAMMI via groups.io wrote:
>> Hmmm…. so the 911 system does not have a redundant system it can
>> fail over to if the primary system goes down? Even way more modest
>> enterprise systems (like the ones i sell) have 100% redundancy and
>> fail over to the disaster recovery site in real time. Does MA not
>> treat 911 as a mission critical system? Will be interesting to know
>> from someone who has intimate knowledge of the setup.
>>
>> 73 de k1jbd
>> bammi
>
> As the media has been pointing out, it's very old infrastructure.
>
> Larry / W1DYJ
It is hard to believe it is older than 15 years, and the concepts of
high-reliablity system design are not novel. AT&T did it quite well for
years. Even Verizon VOIP over FiOS and IP over FiOS is very solid.
I'm with bammi here; good design of a critical system involves full
redundancy (in a different physical location, with insight into the
actual fiber locations), a staging system for testing updates, and a
managed update process where only half of it is updated and then is
shaken out for a while before the other half is updated.
Even for some stuff I run at home, I have a staging setup to quality
software updates. Most software can just be rolled back, but things
that do database scheme upgrades are much harder.
Firewall configuration is hard; they were a little disingenous to say
that the cause was a firewall, avoiding saying whether there was a
hardware failure, a latent vendor bug that was expressed, a new config
that was loaded, operator error, etc.
I wonder if this system handles VZ-VOIP and VZ-copper calls too, or if
it's for cell only and the state is acting like that's all there is.
(No, I didn't dial 911 to test this during the outage!)
More information about the bars
mailing list