Mitigating the Six Factors that Cause Unplanned Outages
Business growth is a primary focus within a data center. This goal is achieved by increasing capacity, boosting server density, and retrofitting server rooms into modern facilities with more efficient cooling.
These changes are great for business, but it is important to keep your disaster recovery and backup plan current with the changes made. It is critical to test this plan and make sure there are provisions for every factor so that it will be effective when you need it.
According to the Ponemon Institute, the average data center outage costs a business $740,357. In extreme cases such as Delta Airline’s data center crash, the company lost $150 Million.
Many times, you hear of an outage caused by something that could easily have been prevented.
These stories strike fear in the hearts of data center operators, but by focusing on these six common factors that cause data center outages, you can prevent downtime in your data center.
1. UPS system failure causes 25% of outages.
This problem is commonly caused by battery failure. It is important to monitor UPS’ batteries ambient temperatures and voltages and to test them regularly
2. Cyber crime causes 22% of outages.
While the term cybercrime sounds like it would be a difficult issue to mitigate, a 2015 IBM study shows that 55% of cyber crime is caused by people with insider information.
23.5% of cyber crime is done by inadvertent actors who unwittingly caused an outage. In these instances, someone with unrestricted system authorization unwittingly performed an action to cause the outage such as accidentally rebooting the system.
31.5% of cyber crime is performed by malicious insiders. In these instances, the actions were performed to deliberately cause harm to the company.
It is important to keep proper levels of system authorization and security. Unrestricted authorization should be given to few people and they should be thoroughly trained and trusted.
38% of cyber crime in a data center is committed by outsiders. One way to defend your business from these attacks is to leverage DDoS security solutions and perform regular system audits.
3. Accidental/human error causes 22% of outages.
Human error that can cause outages range from accidentally activating the Emergency Power Off to planning too many changes during a window of equipment maintenance.
Make sure that your Emergency Power Off Button is adequately covered and that employees are adequately trained in Emergency Power Off procedures.
In terms of equipment maintenance, make changes systematically. If too many changes are made at once, then it is difficult to troubleshoot post-change problems that may arise.
4. Water, heat, or CRAC failure causes 11% of outages.
Cooling is vital to data center uptime. It is essential to have an efficient cooling system with unobstructed air ducts. It is important that in the event of your primary cooling system failing that you have a backup cooling plan in place to prevent your equipment from overheating.
As far as water issues go, it is crucial to repair any water leaks immediately within the fire suppression system and to use pre-action single or double-interlock sprinkler heads.
5. Generator failure causes 6% of outages. It is crucial to perform maintenance and to test generators regularly to ensure that you have sufficient redundant power in the event of a power outage.
6. IT equipment failure causes 4% of outages.
One solution is to mitigate the risk of aging equipment in your server room. Eventually, aging equipment will fail. It is essential to refresh legacy it equipment as soon as possible.
It should be noted that weather is the seventh factor that causes data center outages. Weather causes 10% of all data center outages.
Unfortunately, nobody can control the weather.
But, by focusing on these other six aspects, you can prevent most simple issues that cause outages from causing an outage in your data center.