Broadband Resiliency

It’s almost impossible to talk about broadband at the community level without talking about resiliency and redundancy. It’s hard to find rural communities that haven’t experienced a broadband outage due to a fiber being cut somewhere. The issue hits the news when there are reports of regional or national broadband outages.

It’s something that regulators talk about a lot. The FCC talks about having goals for resiliency in many different dockets. The FCC, NTIA, and USDA require that grant proposals promise to promote resiliency. While it’s something that gets talked about, there are no national or state plans for broadband resiliency. There are no specific standards related to resiliency for large networks. There are no national, state, or regional standards for broadband resiliency.

We don’t have to look far to find a similar industry that dealt with the issue. In August 2003, a high-voltage power line in rural Ohio brushed against an untrimmed tree and set off a huge blackout that lasted for two days, and that spread to cover 50 million people in the Midwest, Northeast, and into Canada.

This was a wake-up call for the power industry. At that time, local electric grids were set up to interface automatically with neighboring grids to make it easy to pass excess power from region to region. While that was beneficial to electric companies, it also meant that a problem on one grid could quickly spread to neighboring grids. Electric companies nationwide responded to this and similar large outages by dividing the national electric grid into regional grids that could block rolling blackouts or brownouts. There are still occasional regional electric outages, but even within regions, electric grids have been modified to hinder widespread outages.

When the only communication grids in the country were provided by telephone companies, the communications networks had some similar protections built in. The Bell companies and a few other large telephone companies operated a nationwide network comprised of regional hubs centered around large tandem switching centers. While a local community could lose voice traffic if wires were cut, it was not possible for any local event to knock out multiple regions or have a nationwide impact.

Today’s communications networks are configured differently, both in terms of fiber resiliency and electronics resiliency.

The long-haul fiber grid is comprised of fiber routes built by numerous fiber builders like Lumen (Level 3), Zayo, AT&T, Verizon, large cable companies, and many other regional fiber providers. There is no nationwide planning or coordination for the placement of long-haul fiber. Companies build fiber routes in places and along routes where they think they can make money. The first major fiber long-haul routes were built to connect major and regional Internet POPs. Today, the routes have been extended to reach numerous data centers, which, more often than not, are off the beaten path. From a fiber routing perspective, the long-haul fiber network mostly has route redundancy. If a major fiber gets cut, most of the traffic can be rerouted in different directions.

Redundancy is far more hit-and-miss regionally, or inside a state. Any state with large rural areas can still point out communities that have only one option for fiber backhaul (and even some with none). Redundancy at the regional and local level is often rare, and a fiber cut can knock out a lot of customers or even a whole town or a region. There is also no consistency in pricing, with local middle-mile fiber often priced at exorbitant rates. We know how to create more route resiliency – by building the needed middle-mile fiber network. This is obviously not a national priority since the IIJA legislation provided only $1 billion in grants to address this issue nationwide.

Electronic resiliency has gotten worse over time. The big carriers that operate the Internet have consolidated network operations so that there are only a handful of hubs and a few technicians that monitor the big nationwide networks. This consolidation has greatly increased the risk of large scale outages. A hardware or software failure at one of these hubs can spread and affect networks all over the country. We’ve unfortunately been seeing more of these big outages, which were not possible a few decades ago.

There are no easy solutions for creating the resiliency needed to prevent widespread broadband outages. I fear that the advent of AI could make things worse before it might make things better since it is going to encourage even more consolidation of network operation and monitoring.

Electric companies fixed their grids by getting all of the major electric companies in each region in a room to hammer out a plan to improve resiliency. I’m not sure what it might take to lead the big fiber carriers to have that same conversation – maybe it will take a multiple day catastrophic broadband outage like happened to the electric grids in 2003.

How Safe is Your Network?

Last week Comcast suffered a major broadband outage. The worst imaginable set of events occurred when there two simultaneous fiber cuts on major legs of their backbone – one between Chicago and New York and one between Ashburn, Virginia and South Carolina. In case you don’t know, Ashburn is the home of the major Internet POP serving Washington DC and surrounding cities.

This is a network planner’s worth nightmare. Planners always try to build redundancy into fiber routes so that the network won’t crash from a single fiber cut. Modern backbone electronics can be set to automatically forward traffic in both directions around a ring so that service isn’t interrupted in the case of a fiber cut or failure of ring electronics somewhere along the ring. But rings using this technology can’t withstand two simultaneous cuts.

What was a bit surprising to me was the failure of a large part of the Comcast network with fiber cuts that were so far apart. It seems unlikely that the company has a fiber ring that sends all Internet traffic in such a large circle. It’s more likely that the company has centralized one or more of their routing functions, such as DNS routing in one place on the network and the fiber cuts might have isolated that key function, which would shut down their Internet product.

Redundancy is a big concern for most smaller network owners. Lack of redundancy was one of the major issues that drove Cook County, Minnesota to build their own fiber network. There is no cable provider in the county and their entire telecom network was provided by CenturyLink. Tourism is the major driver of the economy and a decade ago there was a cut in the CenturyLink fiber from Duluth that isolated the county during peak tourist season. That meant that the Internet, telephones, and cell phones didn’t work. Businesses couldn’t take credit cards, restaurants and hotels couldn’t take reservations, and family members on vacation couldn’t communicate with each other. This prompted the County to pursue a fiber network that included creating redundancy traffic in and out of the county. The network was ultimately built and operated by the local power cooperative, and today there is greatly reduced chance of a major telecom outage.

Even where there is redundancy there can be outages. One of my clients operates a large statewide fiber network that stretches for hundreds of miles. They followed good engineering practice and scheduled an upgrade of the ring electronics after midnight. While one of the nodes was being upgraded the fiber was cut on a different part of the network when a truck knocked down a telephone pole, and the whole network went dark. Fiber cuts in the middle of the night are somewhat rare, but they happen.

Whenever possible fiber engineers also build redundancy into a local fiber network. They might build a ring connecting the fiber huts serving neighborhoods so that the network keeps functioning with a cut along the ring. It’s nearly impossible to design such redundancy in the last-mile loop, but fiber cuts in the last mile only isolate homes associated with the specific fiber.

But just like with Comcast and Cook County, many local networks have a hard time creating redundancy outside of their immediate network. In geographically remote areas it’s often impossible to find a second secure route to the Internet, leaving a network, or whole communities vulnerable to a fiber cut somewhere outside their area.

Unfortunately, it’s getting easier for fiber providers to run into the same kind of issue that hit Comcast. We are migrating numerous functions to the cloud and having redundant fiber routing does not always mean that there is an automatic redundant connection made to a key cloud server. I have clients that are now relying on the cloud for all sorts of services such as VoIP, cable TV programming, DNS routing for the Internet, the use of cloud-based operational software, etc. These ISPs may have a redundant path to the Internet, but still have only one path to get to the company providing their cable TV signal or DNS routing.

The Comcast outage should prompt companies to look again at redundancy. Don’t assume that every function in the cloud is redundant even if you have a redundant connection to the Internet.

The Need for Fiber Redundancy

I just read a short article that mentioned that 30,000 customers in Corvallis, Oregon lost broadband and cable service when a car struck a utility pole and cut a fiber. It took Comcast 23 hours to restore service. There is nothing unusual about this outage and such outages happen every day across the country. I’m not even sure why this incident made the news other than that the number of customers that lost service from a single incident was larger than normal.

But this incident points to the issue of network redundancy – the ability of a network to keep working after a fiber gets cut. Since broadband is now becoming a necessity and not just a nice-to-have thing we are going to be hearing a lot more about redundancy in the future.

Lack of redundancy can strike anywhere, in big cities or small – but the effects in rural areas can be incredibly devastating. A decade ago I worked with Cook County, Minnesota, which is a county in the far north of the state. The economy of the county is driven by recreation and they were interested in getting better broadband. But what drove them to get serious about finding a solution was an incident that knocked out broadband and telephone to the entire county for several days. They County has now built their own fiber network that now includes redundant route diversity to the rest of the world.

We used to have this same concern about the telephone networks and smaller towns often got isolated from making or receiving calls when there was a cable cut. But as cellphones have become prevalent the cries about losing landline telephone have diminished. But the cries about lack of redundancy are back after communities suffer the kinds of outages just experienced by Corvallis. Local officials and the public want to know why our networks can’t be protected against these kinds of outages.

The simple answer is money. It often means building more fiber, and at a minimum it takes a lot more expensive electronics to create network redundancy. The way that redundancy works is simple – there must be separate fiber or electronic paths to provide service to an area in order to provide two broadband feeds. This can be created in two ways. On larger networks it’s created with fiber rings. In a ring configuration two sets of electronics are used to send every fiber signal in both directions around a fiber. In that configuration, when a fiber is cut the signal is still being received from the opposite direction. The other (and even more expensive) way to create diversity is to lay two separate fiber networks to reach a given location.

Route redundancy tends to diminish as a network gets closer to customers. In the US we have many different types of fiber networks. The long-haul fiber networks that connect the NFL cities are largely on rings. From the major cities there are then regional fiber networks that are built to reach surrounding communities. Some of these networks are also on fiber rings, but a surprising number are not and face the same kind of outages that Cook County had. Finally, there are local networks built of fiber, telephone copper, or coaxial cable that are built to get to customers. It’s rare to see route diversity at the local level.

But redundancy can be added anywhere in the network, at a cost. For example, it is not unusual for large businesses to seek local route diversity. They most often achieve this by buying broadband from more than one provider. But sometimes this doesn’t work if those providers are sharing the same poles to reach the business. I’ve also seen fiber providers create a local ring for large businesses willing to pay the high price for redundancy. But most of the last mile that we all live and work on has no protection. We are always one local disaster away from losing service like happened in Corvallis.

But the Corvallis outage was not an outage where a cut wire knocked out a dozen homes on a street. The fiber that got cut was obviously one that was being used to provide coverage to a wide area. A lot of my clients would not design a network where an outage could affect so many customers. If they served a town the size of Corvallis they would build some local rings to significantly reduce the number of customers that could be knocked out by an outage.

But the big ISPs like Comcast have taken shortcuts over the years and they have not spent the money to build local rings. But I am not singling out Comcast here because I think this is largely true of all of the big ISPs.

The consequences of a fiber cut like the one in Corvallis are huge. That outage had to include numerous businesses that lost their broadband connection for a day – and many businesses today cannot function without broadband. Businesses that are run out of homes lost service. And the cut disrupted homework, training, shopping, medical monitoring, security alarms, banking – you name it – for 30,000 homes and businesses.

There is no easy fix for this, but as broadband continues to become essential in our lives these kinds of outages are going to become less acceptable. We are going to start to hear people, businesses, and local governments shouting for better network redundancy, just as Cook County did a decade ago. And that clamor is going to drive some of these communities to seek their own fiber solution to protect from the economic devastation that can come with even moderate network outages. And to some degree, if this happens the carriers will have brought this upon themselves due to pinching pennies and not making redundancy a higher priority in network design.