The most important aspect of nearly every network is availability. Performance, scalability, management, agility, etc. all require the network to actually be online. In conversations with Gartner clients, availability often comes up which is echoed in surveys we’ve done:
I would argue that availability is higher than 20%, but clients don’t score it that way because a) it is assumed as foundational to all vendors and hence is not perceived as a major differentiator and/or b) all the hype around SDN has people focused on agility and orchestration. Availability is relatively boring compared to other “cool” stuff like SDN and disaggregation, unless you’re talking about the Netflix chaos monkey (more on that below).
When talking to clients, availability often comes up after an outage. In some cases, network outages drive significant investment from IT, particularly in the DDI market (“..we could never cost justify DDI until we fat-fingered our public website A-record…”). The undisputed #1 cause of network outages is human error, with estimates as high as 32% according to Dimension Data’s 2014 Network Barometer report, not to mention a study from Avaya indicating 82% of folks experienced network downtime due to human error. In my 16+ years running large corporate networks, there was no feeling worse than the post-mortem meeting after a big outage. I’ll never forget one particular meeting in which my CIO said “…well that was just plain stupid…”. Fortunately, this only happened to me a very small number of times. And, we have research that can help you avoid networking outages, including:
Take A Four-Step NCCM Approach to Stem Disasters (Vivek Bhalla)
Summary: While businesses invest in their networks to gain a competitive edge, they often fail to ensure adequate steps are taken to reduce outages. Gartner’s four-step NCCM approach enables network staff to minimize infrastructure failure.
And, for those interested in designing WANs for availability:
Summary: In the developed world, the marginal cost of bandwidth is so low that rightsizing capacity has little impact on WAN cost. However, the cost of improving availability remains high and downtime is less acceptable, making rightsizing network availability the key goal for enterprise network designers.
And a little more about antifragile (and the Netflix Chaos Monkey)
How Antifragile Practices Can Make Your I&O Stronger (Ian Head)
Summary: Antifragile systems turn stress and adversity into advantage. Certain practices of Web-scale IT enterprises may be emulated by other IT organizations to enhance their antifragility, especially as part of their continual improvement, DevOps and digital business initiatives.
PS – If you have a really good and/or funny outage story, feel free to include it the comments, a prize will be sent to the best one…