———————————————————————————————————–
At a recent Business Continuity Management / Disaster Recovery Planning (BCM/DRP) workshop, IT Executives and program leaders expressed concerns about the usefulness of recovery “testing” to prove readiness. While the consensus of the group considered testing a best practice, many of the practioners concluded, “Testing approaches alone do not generate complete confidence” their systems and processes could recover successfully in a real world disaster. Their experience indicated that even when systems passed unit, system, integration, performance, volume and stress testing, “real world” stresses often created new issues and unforeseen problems.
———————————————————————————————————–
When asked why they had such low levels of confidence, the executive group listed these reasons from their experiences:
• Testing was often approached as a ‘check-off’ step, rather than an integral and important part of their culture – tests were often simplified and were therefore of limited utility.
• Tests may not reflect realistic, high-probability, or high impact operating conditions or failure scenarios – so an un(der)-tested high-probability function could result in major problems.
• Use of ‘sample data’, ‘conference room conditions’ and non-production-matched infrastructure create test conditions that are not realistic.
• Their own testing managers/teams are often not fully confident after passing tests but due to time and resource constraints discontinue testing.
Another concern indicated is the responsible team perceives testing as an ‘evaluation of performance’ instead of a process to ensure preparedness and successful implementation. One CIO stated he “agreed that testing was not the best approach but he felt it would be extremely difficult to change this situation, given constant and growing pressures on financials and delivery of technology.”
There was a consensus that organizations could be more successful and prepared if they focused on running ‘exercises’ rather than just executing tests. They defined an exercise as practicing real world scenarios, based on high probability and impact conditions, to see what works and what does not. In an exercise, the team does not expect everything to work perfectly and as one client leader noted, “in fact, failure is a measure of success in an exercise.” Failure provides the team with needed, useful information on how to improve the system, process or procedure. Most importantly, in an exercise, participants do not feel the same pressure to “pass” and can focus on improving performance iteratively.
CIO CALL TO ACTION
CIOs must apply training, exercise and selective application of testing if they want to be confident their systems and processes will meet real world business continuity needs. Best practice CIOs recommend that their peers ask the following questions to assess and optimize their business continuity readiness procedures:
How well do the tests represent real world situations?
Do the tests focus on high probability and high impact areas?
Has the team simplified the tests or process so they can pass and keep work moving?
Does successfully passing the tests generate high confidence of system owners?
How can an exercise approach increase promote learning, successful execution and increase confidence?
Who can serve as an effective coach to change culture, lead, guide and encourage the process?
How can the results of exercises be used to promote open exchange on what needs to be improved, avoiding pass or fail assessments?
BOTTOM LINE
Traditional software design and functionality testing and DRP testing often do not generate confidence among CIOs, business leads or system owners. A paradigm shift from pass/fail testing to running “exercises” can change enterprise culture and result in better systems design and implementation. Market-leading CIOs develop confidence in their systems’ ability to meet business requirements, and recover effectively and efficiently in disasters, by using an exercise driven testing approach.
Business Impact:
Successful BCM/DRP programs are a high priority for the board and senior management. Testing results alone are not sufficient measures to demonstrate readiness and identify recovery gaps. Recovering seamlessly from IT system failures is increasingly a brand reputation issue and thus a risk management imperative. Regular “exercises” and reporting results to the Board (probabilities, insurance, costs, etc) should be part of regular CIO reviews.
Additional Insights:
“Business Continuity Management Defined” (Research)
“How to Conduct a Disaster Recovery Management Self Assessment” (Research)
“How to Organize for Disaster Recovery Management” (Research)
“Key Issues for Software Quality and Testing” (Research)
“Standardize Definitions and Expectations for Testing Activities” (Research)
Please e-mail the authors with your comments and suggestions. We also invite you to participate in a case study.
Irving Tyler: irving.tyler@gartner.com
Marc Andonian: marc.andonian@gartner.com
