The FCC’s Public Safety and Homeland Security Bureau just released a list of recommended network practices. These recommendations are not a comprehensive list of good network practices, but rather are compiled by analyzing the actual network outages reported to the FCC over the last five years. Telcos are required to notify the FCC of significant network outages and every item on this list represents multiple actual network outages. It’s easy to look at some of the items on the list as think they are common sense, but there obviously there are regulated telcos that triggered had outages due to ignoring each of these network practices.
Following are some of the more interesting recommendations on the list:
Network Operators, Service Providers and Property Managers together with the Power Company and other tenants in the location, should verify that aerial power lines are not in conflict with hazards that could produce a loss of service during high winds or icy conditions. This speaks to having a regular inspection and tree trimming process to minimize damage from bad storms.
Network Operators and Property Managers should consider pre-arranging contact information and access to restoral information with local power companies. This seems like common sense, but I’ve been involved in outages where the technicians did not know how to immediately contact other utilities.
Network Operators, Service Providers and Public Safety should establish a routing plan so that in the case of lost connectivity or disaster impact affecting a Public Safety Answering Point (PSAP), 9-1-1 calls are routed to an alternate PSAP answering point. A lot of the recommendations on the FCC’s list involve 9-1-1 and involve having contingency plans in place to keep 9-1-1 working in the case of network failures.
Network Operators, Public Safety, and Property Managers should consider conducting physical site audits after a major event (e.g., weather, earthquake, auto wreck) to ensure the physical integrity and orientation of hardware has not been compromised. It’s easy to assume that sites that look undamaged after big storms are okay. But damage often doesn’t manifest as outages until days, weeks or months later.
Network Operators and Service Providers should verify both local and remote alarms and remote network element maintenance access on all new critical equipment installed in the network, before it is placed into service. I’ve seen outages where equipment was installed but the alarms were not tested. You don’t want to find out that an alarm isn’t working when it’s needed.
Network Operators, Service Providers, Public Safety and Property Managers should engage in preventative maintenance programs for network site support systems including emergency power generators, UPS, DC plant (including batteries), HVAC units, and fire suppression systems. This might easily be the biggest cause of network outages. ISPs get busy and don’t test all of the components critical to maintaining systems. A lot of outages I’ve been involved with were due to failures of minor components like fans or air conditioning compressors.
Network Operators, Service Providers, Public Safety, and Equipment Suppliers should consider the development of a vital records program to protect vital records that may be critical to restoration efforts. Today there is often software, databases and other vital records that must be restored in order first to get equipment up and functioning. Electronics records of this type need to be kept in a secure system that is separate and doesn’t rely on the network to be functioning, but that also can be accessed easily when needed.
Network Operators, Service Providers, Public Safety and Property Managers should take appropriate precautions to ensure that fuel supplies and alternate sources of power are available for critical installations in the event of major disruptions in a geographic area (e.g., hurricane, earthquake, pipeline disruption). Consider contingency contracts in advance with clear terms and conditions (e.g., Delivery time commitments, T&Cs). This is a lesson most recently experienced after the recent hurricanes where local gasoline supplies dried up and several utilities without their own private fuel supply were stranded along with the rest of the public.
This FCC list is a great reminder that it’s always a good idea to periodically assess your disaster and outage readiness. You don’t want to discover gaps in your processes during the middle of an outage.