How four rotten packets broke CenturyLink's network for 37 hours, knackering 911 calls, VoIP, broadband
A handful of bad network packets triggered a massive chain reaction that crippled the entire network of US telco CenturyLink for roughly a day and a half.
This is according to the FCC's official probe into the December 2018 super-outage, during which CenturyLink's broadband internet and VoIP services fell over and stayed down for a total of 37 hours. This meant subscribers couldn't, among other things, call 911 over VoIP at the time – which is a violation of FCC rules, and triggered a formal investigation.
"This outage was caused by an equipment failure catastrophically exacerbated by a network configuration error," America's communications regulator said in its summary of its inquiry, published yesterday.
"It affected communications service providers, business customers, and consumers who directly or indirectly relied upon CenturyLink’s transport services, which route communications traffic from various providers to locations across the country, resulting in extensive disruptions to phone service, including 911 calling."
CenturyLink has six long-haul networks that make up the backbone of its digital empire, interconnecting regions of America. These networks use Infinera-built nodes to switch packets over high-speed optic fiber: data flowing into each node is directed to other nodes, ultimately pumping VoIP, regular internet traffic, and more, across the nation as needed.
Each dodgy packet would arrive at a node, get rejected and be passed along a chain of filters until it was injected into a management channel and handed to all connecting nodes. Here's a flow diagram, courtesy of the FCC, showing how the corrupted packets ended up being forwarded on to all neighbouring nodes, and so on and so on, producing a growing chain reaction of corrupted packets.
"Due to the packets’ broadcast destination address, the malformed network management packets were delivered to all connected nodes. Consequently, each subsequent node receiving the packet retransmitted the packet to all its connected nodes, including the node where the malformed packets originated," the FCC said in its report.
"Each connected node continued to retransmit the malformed packets across the proprietary management channel to each node with which it connected because the packets appeared valid and did not have an expiration time. This process repeated indefinitely."
As you might imagine, the exponentially growing storm of packets soon overwhelmed CenturyLink's optic-fibre backbone, causing regular traffic to stop flowing: VoIP phones stopped working, internet access slowed to a halt, and so on. Folks in New Orleans were first to spot their connections stalling, at roughly 0356 EST on December 27.
Here is where things went from really, really bad to terrible: the nodes along the fiber network were so flooded, they could not be reached by their administrators to troubleshoot the issue. It wasn't until some 15 hours later the techies could finally track down the single errant node in Colorado responsible for sparking the deluge, not that replacing it helped. The packet tsunami was still washing back and forth, knocking nodes over.
"At 2102 on December 27, CenturyLink network engineers identified and removed the module that had generated the malformed packets," the report noted. "The outage, however, did not immediately end; the malformed packets continued to replicate and transit the network, generating more packets as they echoed from node to node."
It would be another three hours before CenturyLink's network admins could begin to get through to the other nodes and get them to kill off the spread of bad packets. It took until 1130 on December 28 to get visibility of the network back, and it wasn't until 2336 that all nodes had been restored. On December 29, just after midday, CenturyLink finally declared the crisis over.
"The event caused a nationwide voice, IP, and transport outage on CenturyLink’s fiber network. CenturyLink estimates that 12,100,108 calls were blocked or degraded due to the incident," the FCC said.
"Where long-distance voice callers experienced call quality issues, some customers received a fast-busy signal, some received an error message, and some just had a terrible connection with garbled words."
The outage also knackered local governments and telcos that relied on the CenturyLink network for portions of their services. State governments in Illinois, Kansas, Minnesota, and Missouri all had portions of their networks down for roughly 36 hours thanks to CenturyLink, and phone services sold by Comcast, Verizon, TeleCommunication Systems, General Dynamics IT, and West Safety Services – including 911 call centers – saw connectivity interrupted for some or all of the outage period.
As to what can be done to prevent similar failures, the FCC is recommending CenturyLink and other backbone providers take some basic steps, such as disabling unused features on network equipment, installing and maintaining alarms that warn admins when memory or processor use is reaching its peak, and having backup procedures in the event networking gear becomes unreachable.
"Currently, CenturyLink is in the process of updating its nodes’ Ethernet policer to reduce the chance of the transmission of a malformed packet in the future," the report notes. "The improved ethernet policer quickly identifies and terminates invalid packets, preventing propagation into the network. This work is expected to be completed in Fall, 2019."
Industry: Unified Communications & Telecommunications
- Security Analyst, End User. Level 1 Service Provider.
CH7843 Security Analyst, End User. Level 1 Service Provider. £50,000 London Security Analyst needed to join an Level 1 Service Provider. The Security Analyst will be responsible monitoring, configuring, fine tuning and generally improving the security tool capability. Specific experinece with firewall, phishing, AV, vulnerability management, IAM is needed. Specific tools suce as CyberArk, Tripwire are highly desirable. You will be working within a specialist security team reporting to the CISO. Expeirnece working within an end user environment within financial services is highly desirable. Flexible location. This is an exclusive role to DCL Search & Selection. https://calendly.com/chris-holt/arranged-call-with-chris-holt-soc-role-clone
- Internal Security Auditor, Level 1 Service Provider (ISO27001)
- Upto 65,000 plus benefits
Internal Security Auditor ISO 27001, PCI, needed to join a Cyber team within this expanding Fintech business. The Internal Security Auditor will have end to end responsibility for planning, delivering, remediating any findings etc. Experience working within financial services is highly desirable. This Is a great time to join a newly formed and growing Cyber team within a rapidly expanding fintech, that is taking a major share of its market. We are looking for someone with experience, (but not to be limited to) a mix of Information Security standards, frameworks, audit principles, controls / policies and the management and use of the technical tooling etc. ISO 22301, ISO 27001, NIST Cybersecurity Framework etc An ideal candidate will be working within an end user environment with a cyber consultancy background. Experience taking a company through accreditation is highly desirable Experience managing internal stakeholders, technical teams and external third parties essential Flexible working, but with the ability to get into London. This is an exclusive role to DCL Search & Selection.
- DevOps Engineer with IdAM
- Upto £80,000 plus benefits
We are ooking for an DevOps engineer, idealy with IdAM (identity access Management) experience, this is a senior role for someone that can be the lead hands on person on a project. Your role will be to work on the deployment project implementing the solution into the exsiting application so will be used to connect an applications into mulipe 3rd party appliactions. We Would look at someone who has done DevOps with Security and can cross train into IdaM, but preference would be given to someone with the IdAM experience this is a great opportunity to join a consultancy that work on some truely amazing and differnet solutions
- Senior SOC Analyst. Level 3 Palo Alto Wildfire, Rapid 7, Fortify, Splunk.
REF CH7840 Senior SOC analyst (Palo Alto Wildfire, Rapid 7, Fortify AND Splunk) Flexible location £55,000 + Senior SOC analyst needed (Level 3) that can achieve SC clearance for a permanent role. We are looking for Level 3 SOC Analysts with two or more of the following; Palo Alto Networks Wildfire (#malware) Rapid7 Nexpose Micro Focus Fortify (#automated #applicationsecurity) AND ideally Splunk. The role will include, but not be limited to; managing and handling incidents end to end, supporting and mentoring level 1 / level 2 staff, supporting the SOC manager in the delivery of the SOC roadmap, engaging with the client stakeholders (other technical teams) as and where needed, use case development, advanced search and reporting etc. Flexible location, commutable in the future to London or Birmingham This role will sit within a public sector client so the individual must be able to achieve SC clearance. To arrange a call with Chris Holt use this calendy link https://calendly.com/chris-holt/arranged-call-with-chris-holt-remote-soc-role Chris.Holt@dclsearch.com 07884666351