Relying on Cellular Broadband (Part II)

One of my recent blogs talked about the reliability of cellular data as a substitute for wireline broadband. Almost immediately I had an example of a wireless outage shoved in my face. I was in Phoenix at an all-day meeting. When I left at about 4:00 I tried my Uber app and it wasn’t working. The app cycled through but would not find a driver. This was inconvenient because I was standing in the 100-degree sun, so I immediately looked for shade. I tried a few more times. Giving up on Uber I tried Lyft and got the same results. Now I’m figuring a data outage, but since Android phones are sometimes squirrelly, to be safe I rebooted my phone.

That didn’t work and I was standing waiting in hot weather to get a ride to my hotel which was 20-miles away. Uber, Lyft and taxis were out of the question. Luckily my voice was still working, so I called my wife who ordered an Uber for me. But had she not been available I’m not sure how I would have gotten to my hotel. I’m picturing the huge number of other people this also inconvenienced. How many people landed at an airport and couldn’t get a ride? How many people were driving and suddenly lost access to their mapping software? How many businessmen were traveling and couldn’t read or respond to email?

When I got back to a landline connection I looked at the AT&T outage website and it was lit up like a Christmas tree. It looked like the east coast was totally out, but almost every other NFL city also showed an outage. Phoenix, which I knew to be out, didn’t even show on the map as having a problem, and it’s possible that the whole nationwide AT&T network had a data outage. A few days later I checked and AT&T had said nothing about the cause of the outage. Their outage website shows a 17-hour outage that day, without specifying the extent or the reason for the outage.

There is obviously something shoddy in the AT&T national network if an event of any kind can knock out the whole nationwide data network for that long. It’s hard to believe that the company would not have redundant backup for every critical system that is needed to keep the network functioning. There are only a few possible explanations. Possibly some critical component of data routing failed, such as their DNS system that routes Internet traffic for cellphones. The company might also have gone too far with software defined networking and created some new points of failure that could affect the whole network. Or the company had a major fiber cut that feeds the site of one of those key network systems. There is no excuse for any of these possibilities, and a company with nearly 160 million customers ought to have redundancy for every critical component of their wireless network.

I contrast this to the hundreds of companies I know with landline broadband networks. All of my clients worry about total network failure and they work hard to avoid it. Unless they are geographically isolated, most of my clients have redundant routes between their network and the Internet. They generally have redundancy of key routers and switches to keep critical functions operational. Most of my clients have almost no outages that are not caused in the last mile. Local broadband networks are always susceptible to cable cuts in the last mile. But those cuts, by design, only knock out customers who are ‘downstream’ from the cut. It’s becoming extremely rare for my clients to have a total network outage, and if they do they usually take steps to stop it from happening a second time.

The press is in love with wireless right now and there are dozens of articles every month declaring how wireless is our future. Cellphones are going to become blazingly fast and 5G will fill in the gaps where cellular isn’t good enough. I’ve written enough blogs about this that you probably know that I think we are still a number of years away from seeing such wireless technologies.

But this outage makes me wonder about whether people will ever fully trust wireless technologies if they are operated by the big ISPs. The big ISPs are cavalier about network outages and they seem to suppose that their customers will just accept them. If my ISP clients had a 17-hour outage they would have taken steps after the outage to made amends with customers. They would have explained the cause of the outage and talked about their plans to make sure that it didn’t happen again. They likely would have given every customer a day’s credit on their bill for the downtime.

It astounds me that something like this outage could happen. If I was the head of AT&T, heads would have rolled after this was fixed. There is no excuse for a company with a $23 billion annual capital budget to have a network that is vulnerable to a widespread outage. The only reason the company could have such outages is that they don’t place value on redundancy. Until the big ISPs can make their wireless networks as reliable as landline networks I will never consider using them for broadband. I can’t see customers sticking with a 5G network that has a 17-hour outage. Broadband is now critical to many of us and I expect outages to be measured in minutes, not in hours or days.

Why No Redundancy?

Copper wireI usually load a blog every morning between 7:00 and 8:00 eastern. But today my Internet was down. I first noticed then when I woke up around 2:30. Don’t even ask why I was up then, but that is not unusual for me. My Internet outage was also not that unusual. I have Comcast as my ISP and they seem to go out a few times per month. I’ve always given them the benefit of the doubt and assumed that a few of the late night outages are due to routine network maintenance.

So I grab my cell phone to turn on my mobile hot spot. Most of the outages here last an hour or two and that is the easy way to get through outages. But bam! – AT&T is out too. I have no bars on my LTE network. So my first thought is cable cut. The only realistic way that both carriers go out in this area is if the whole area is isolated by a downed fiber.

I check back and hit a few web sites and I find at about 3:00 that I have a very slow Facebook connection, but that it’s working. I can get Facebook updates and I can post to Facebook, but none of the links outside of Facebook work. And nothing else seems to be working. This tells me that Facebook has a peering arrangement of some kind with Comcast and must come into the area by a different fiber than the one that was cut.

So I start looking around. The first thing I find is that Netflix is working normally, just as fast as ever. So now I have a slow Facebook feed and fast Netflix and still nothing else. After a while Google starts working. It wasn’t working earlier, but it seems that I can search Google, although none of the links work. This tells me that Comcast peers with Google but that the Google links use the open Internet. I force a few links back through the Google URL just to see if that will work and I find that I can read links through Google. No other search engines seem to be working.

The only other think I found that worked with the NFL highlight films and I was able to see the walk-off blocked punt in last night’s Ravens – Browns game. It’s highly unlikely that the NFL has a peering relationship with anybody and they must have a deal with Google.

So now I know a bit about the Comcast Network. They peer with Netflix, Google and Facebook – and since these are three of the largest traffic producers on the web that is not unusual. And at least in my area the peering comes into the area on a different fiber path than the normal Internet backbone that has knocked out both Comcast and AT&T.

But I also now know that in my area that Comcast has no redundancy in the network. I find this interesting because most of my small clients insist on having redundancy in their networks. Of course, most of them operate in rural areas that are used to getting isolated when cables get cuts – it happened for many years with telephone lines and now with the Internet.

But I can see that Comcast hasn’t bothered creating a redundant network. This particular outage went for 7 or 8 hours which is a bit long, so this must be from a major fiber cut. But I look at a map of Florida and it is a natural candidate to have rings. Everybody lives on one of the two coasts and there are several major east-west connector roads. This makes for natural rings. And if our backbone was on a ring we wouldn’t even know there was an outage. But with all of their billions of dollars of profits, neither Comcast nor AT&T wireless cares enough about redundancy to have put our area backbone on a ring.

And I also don’t understand why they don’t have automatic alternate routing to bypass a fiber cut. If Netflix, Facebook and Google were connected everything else could have been routed along those same other fibers. That is something else my clients would have done to minimize outages for customers.

This is honestly unconscionable and perhaps it’s time we start clamoring to the FCC to require the big companies to plow some of their profits back into a better network. These same sort of outages happened a few times to the power grid a decade ago and the federal response was that the electric companies had to come up with a better network that could stop rolling outages. I know some of my clients that are electric companies spent some significant dollars towards that effort, and it seems to have worked. Considering how important the Internet has become for our daily lives and for commerce perhaps it’s time for the FCC to do the same thing.

Our Internet Infrastructure

Paul Barford, a professor at the University of Wisconsin, led an effort to map the major routes used by the Internet in the U.S. He believes that making knowledge of the map can help us plan better to make the Internet less susceptible to natural disasters, accidents, or intentional sabotage.

I can remember two times when the Internet backbone took a serious hit in this country and they were both in 2001. First, a 60-car CSX train derailed in the Howard Street tunnel in Baltimore, the resulting fire melted a lot of fiber cables that were on the east coast north-south route. Then later that year on 9/11, the twin World Trade Towers collapsed taking out the main carriers’ hotel and data center in Manhattan.

And there is no reason to think that we won’t have more disasters. When you look at the map, my first reaction is how few routes there are in the main backbone.

map_of_internets_backbonex519

 

Professor Barford hopes the map will spur conversation about the need for more route diversity. The Department of Homeland Security agrees and is publishing the map and making the details of the routes available to government, public, and private researchers.

Some might say that publishing such a map makes us more vulnerable. I don’t think it does. Everybody in the industry knows the addresses of the main Internet POPs since those are the end points of the data connections that ISPs buy to connect to the Internet. And I didn’t really need this map to know that the major routes of fiber mostly follow the Interstate highways. In Florida, where I live, there is a route on I-95 on one side of the state and I-75 on the other with a spur to Orlando. I doubt that anybody here in the industry didn’t already know that.

The one thing that strikes me about the map is that once you get off the major big city routes that many of the smaller US markets only have one route into and out of their hub; it doesn’t look that hard to isolate some markets with a couple of fiber cuts. I know that some of the carriers involved in the backbone have contingency plans that don’t show up on this map, and there are other fiber routes that can pick up the slack fairly soon after a major Internet outage in most places.

The other thing you realize about this network is that it wasn’t really designed—it grew organically. The network takes the shortest path between major markets using major roads and thus follows the routes built by the first fiber pioneers in the 80s and early 90s.

Hopefully this map spurs the carriers to get together and plan a more robust backbone going into the future. It’s very easy to get complacent about a network that is functioning, but this map highlights a number of vulnerable points in the network that could be improved. This kind of planning was undertaken by the large electric grids after a number of power outages a decade ago. Let’s not wait for major Internet outages to get us to pay attention to making the network safer and more redundant.