Roberta Witty

A member of the Gartner Blog Network

Roberta J. Witty
Research VP
11 years at Gartner
33 years IT industry

Roberta Witty is a research VP in Gartner Research, where she is part of the Compliance, Risk and Leadership group. Her primary area of focus is business continuity management and disaster recovery. Ms. Witty is the role specialty lead for… Read Full Bio

Coverage Areas:

Closing the Recovery Gap (from John Morency)

by Roberta J. Witty  |  February 22, 2010  |  1 Comment

John Morency here again. For many organizations, simply completing the recovery of a set of mission-critical applications and data over the course of a test exercise is sufficient for declaring success. The elapsed time consumed for recovery completion, if measured at all, is often a secondary consideration. However, this definition of success is often at odds with the expectations of the business, which requires increasingly shorter and more predictable recovery times. Gartner refers to this difference between the expected and the actual as the recovery gap.

A mistaken impression is that the use of disk-to-disk replication instead of disk-to-tape for production data will be sufficient by itself for closing the gap. While the use of disk-to-disk replication is certainly necessary, especially as the amount of production data to be tested is measured on the order of terabytes, it is by no means sufficient. The timely recovery and restart of web applications, which are often distributed across several different types of computing platforms and have increasingly complex software and data dependencies resulting from this distribution, can often represent an even more significant test execution hurdle. This challenge becomes even more daunting for web services-based applications, whose execution dependencies are dynamically determined on a transaction-by-transaction basis.

Unfortunately, there are no simple solutions for this problem. However, Gartner has seen that an increasing number of client organizations are putting more of their recovery test focus on ensuring recovery test completion time consistency with stated RTO and RPO targets for a more limited set of mission critical applications, typically some or all of those that constitute Recovery Tier 1. Depending upon the steps required to close these recovery gaps, there may be related technology, test process and/or staffing changes that need to be made.

However, prior to making any changes, a rigorous critical path analysis of the test process may be required in order to accurately determine where the bottlenecks do or do not exist. Although there are an increasing number of recovery management software products that support this analysis, it can often be the case that the use of a well-known software tool such as Microsoft Project may be all that is required, at least to support a first pass analysis.

If your recovery team is facing the challenge of closing one or more recovery gaps, Gartner believes that it is in the best interest of the business to clearly communicate the existence of these gaps, along with a recommended approach for closing some or all of them over a specifically defined period of time. In these challenging economic times, the recovery team cannot be all things to all people, so it is in the team’s best interests to define, communicate, exercise and evolve an approach that is practical, achievable and sustainable.

1 Comment »

Category: BCM and IT DRM Research Coverage     Tags:

1 response so far ↓

  • 1 Martin B. Beurling (asensus)   April 12, 2010 at 3:30 am

    John,
    Really interesting and valuable input to the Recovery Assurance debate. You highlight the need for discussing recovery gaps, recovery expectations and requirements with the business. Also, Roberta Witty has dedicated blog time to issues on management attention within the Recovery Assurance space.
    It’s my experience that crucial bottlenecks surface when IT management shift from gap planning to execution, i.e. actually testing if servers are recoverable and within expected RTO/RPO targets.
    In most of our enterprise engagements, it comes down to “How can our overworked IT staff increase test productivity on a smaller budget?”:
    A) Executing recoverability tests for more applications on a higher frequency (the annual test for a few systems proves insufficient in compliance-embraced organizations)
    B) Expanding recovery assurance testing to not only a limited set of Tier 1 servers, but also Tier 2 and Tier 3.
    C) Continuously Track & Document gap development over the course of the year
    Again, management attention and software automation for recovery assurance is key, increasing operational efficiency, eliminating manual recovery tests while mitigating business risks associated with insufficient testing.

    best regards
    Martin B. Beurling