[bars] [mmra] 911 outage root cause revealed

Greg Troxel gdt at lexort.com
Thu Jun 20 08:27:24 CDT 2024


"Larry Banks via groups.io" <larryb.w1dyj=verizon.net at groups.io> writes:

> On 6/19/2024 22:47, JWAHAR BAMMI via groups.io wrote:
>> Hmmm…. so the 911 system does not have a redundant system it can
>> fail over to if the primary system goes down? Even way more modest
>> enterprise systems (like the ones i sell) have 100% redundancy and
>> fail over to the disaster recovery site in real time. Does MA not
>> treat 911 as a mission critical system? Will be interesting to know
>> from someone who has intimate knowledge of the setup.
>>
>> 73 de k1jbd
>> bammi
>
> As the media has been pointing out, it's very old infrastructure.
>
> Larry / W1DYJ

It is hard to believe it is older than 15 years, and the concepts of
high-reliablity system design are not novel.  AT&T did it quite well for
years.  Even Verizon VOIP over FiOS and IP over FiOS is very solid.

I'm with bammi here; good design of a critical system involves full
redundancy (in a different physical location, with insight into the
actual fiber locations), a staging system for testing updates, and a
managed update process where only half of it is updated and then is
shaken out for a while before the other half is updated.

Even for some stuff I run at home, I have a staging setup to quality
software updates.  Most software can just be rolled back, but things
that do database scheme upgrades are much harder.

Firewall configuration is hard; they were a little disingenous to say
that the cause was a firewall, avoiding saying whether there was a
hardware failure, a latent vendor bug that was expressed, a new config
that was loaded, operator error, etc.

I wonder if this system handles VZ-VOIP and VZ-copper calls too, or if
it's for cell only and the state is acting like that's all there is.
(No, I didn't dial 911 to test this during the outage!)



More information about the bars mailing list