Last week Comcast suffered a major broadband outage. The worst imaginable set of events occurred when there two simultaneous fiber cuts on major legs of their backbone – one between Chicago and New York and one between Ashburn, Virginia and South Carolina. In case you don’t know, Ashburn is the home of the major Internet POP serving Washington DC and surrounding cities.
This is a network planner’s worth nightmare. Planners always try to build redundancy into fiber routes so that the network won’t crash from a single fiber cut. Modern backbone electronics can be set to automatically forward traffic in both directions around a ring so that service isn’t interrupted in the case of a fiber cut or failure of ring electronics somewhere along the ring. But rings using this technology can’t withstand two simultaneous cuts.
What was a bit surprising to me was the failure of a large part of the Comcast network with fiber cuts that were so far apart. It seems unlikely that the company has a fiber ring that sends all Internet traffic in such a large circle. It’s more likely that the company has centralized one or more of their routing functions, such as DNS routing in one place on the network and the fiber cuts might have isolated that key function, which would shut down their Internet product.
Redundancy is a big concern for most smaller network owners. Lack of redundancy was one of the major issues that drove Cook County, Minnesota to build their own fiber network. There is no cable provider in the county and their entire telecom network was provided by CenturyLink. Tourism is the major driver of the economy and a decade ago there was a cut in the CenturyLink fiber from Duluth that isolated the county during peak tourist season. That meant that the Internet, telephones, and cell phones didn’t work. Businesses couldn’t take credit cards, restaurants and hotels couldn’t take reservations, and family members on vacation couldn’t communicate with each other. This prompted the County to pursue a fiber network that included creating redundancy traffic in and out of the county. The network was ultimately built and operated by the local power cooperative, and today there is greatly reduced chance of a major telecom outage.
Even where there is redundancy there can be outages. One of my clients operates a large statewide fiber network that stretches for hundreds of miles. They followed good engineering practice and scheduled an upgrade of the ring electronics after midnight. While one of the nodes was being upgraded the fiber was cut on a different part of the network when a truck knocked down a telephone pole, and the whole network went dark. Fiber cuts in the middle of the night are somewhat rare, but they happen.
Whenever possible fiber engineers also build redundancy into a local fiber network. They might build a ring connecting the fiber huts serving neighborhoods so that the network keeps functioning with a cut along the ring. It’s nearly impossible to design such redundancy in the last-mile loop, but fiber cuts in the last mile only isolate homes associated with the specific fiber.
But just like with Comcast and Cook County, many local networks have a hard time creating redundancy outside of their immediate network. In geographically remote areas it’s often impossible to find a second secure route to the Internet, leaving a network, or whole communities vulnerable to a fiber cut somewhere outside their area.
Unfortunately, it’s getting easier for fiber providers to run into the same kind of issue that hit Comcast. We are migrating numerous functions to the cloud and having redundant fiber routing does not always mean that there is an automatic redundant connection made to a key cloud server. I have clients that are now relying on the cloud for all sorts of services such as VoIP, cable TV programming, DNS routing for the Internet, the use of cloud-based operational software, etc. These ISPs may have a redundant path to the Internet, but still have only one path to get to the company providing their cable TV signal or DNS routing.
The Comcast outage should prompt companies to look again at redundancy. Don’t assume that every function in the cloud is redundant even if you have a redundant connection to the Internet.