Attend Your Local ACP Tabletop Exercise! You Will Learn A Lot (and in the safety of a non-workplace environment)
On Tuesday September 20, 2011, I attended my local CT ACP chapter’s annual tabletop exercise. Hosted at Northeast Utilities, the ACP management team in conjunction with NU’s BCM team, conducted a great two hour exercise that was filled with lots of changing conditions, misinformation, and even an Elvis impersonator! Also in attendance were an EMNS vendor providing real-time emergency notification services, two recovery services firms for the data center and workforce, two physical security services firms, and one BCM advisory firm that provided overall exercise planning support.
The Scenario: The Can-Do organization is a heavy construction equipment leasing headquartered in Madison, CT and with additional operating facilities in Virginia and Nevada. Their data centers are located in Madison, CT and Richmond, VA. To support the leasing business, they also have a credit company that has evolved into a full-service bank and insurance underwriter and broker located in the firm’s 600 retail centers. Banking applications were housed in the Richmond, VA data center.
The Plan: The plan we were given was called the “Emergency Response Plan” dated July 15, 1998 and prepared by the VP of Training, Cafeteria Services and Vehicle Maintenance. HA-HA.
Recovery teams included Operations, HR, Finance, IT and physical security. I was on the IT recovery team.
Although the plan stated that Can-Do ascribes to the U.S.’s National Incident Management System (NIMS), no one knew who was in charge of the incident nor how to communicate with the incident management team. It was even questioned as to why a command center was stood up on the first day of the hurricane warning; that foresight proved to be fortuitous as it turned out. The first thing we all needed to do was elect an incident commander (assigned to the Operations recovery team) and one for each recovery team. All recovery teams also assigned coordinating responsibilities to their sister recovery teams. The IT recovery team was smart (wink wink) and assigned a supply chain/vendor management coordinator and a scribe – guess who took that role (me).
We started the exercise by getting a hurricane warning for Madison, CT along with a warning that a mild flu pandemic was possible. The first step that the IT team performed was topping off the fuel tanks for the CT data center generators. We then sent out a message to the IT staff instructing them to test their work-at-home capability. We also checked with the VPN service provider our ability to add additional bandwidth if needed – 60% of the workforce was able to work from home, but looking into the future, we wanted to ensure that if we needed more bandwidth for additional staff we would have it available to us.
The scenario intensified throughout the exercise and we received additional information including:
- flu shot availability in the cafeteria (a red herring);
- a cat 4 Hurricane Mary was off the coast of Bermuda – Important!;
- a fuel tanker accident at the Richmond, VA data center which closed down the facility – major crisis and also on the evening news in VA;
- the CT data center lost all power because a security guard had an adverse reaction to some over-the-counter flu medication, passed out and hit his head on the emergency power shutoff button. Physical security was automatically notified and the police were called to secure the disabled data center,
- Richmond, VA schools were closing at 1 pm due to the hurricane warning.
The IT recovery team notified the Richmond, VA data center recovery service provider that recovery was required; the turnkey arrangement ensured that the data center was up and running within a few hours. The Madison, CT data center had an active-active arrangement with a data center service provider (which the 1998 plan did not identify), so operations were automatically cut over to it once the power went out due to the untimely power shutoff. Prior to the power going out at the CT facility, the Incident commander asked IT to update the employee and franchisee crisis portal with information about how to communicate with Can-Do if the hurricanes hit either location. This activity was not completed because of the power outage in CT.
The power outage resulted in an IT recovery ETA of two days. That information was based on the very old emergency response plan which did not contain the current IT configuration at either data center nor recovery procedures for the current configurations. In reality, there was no loss of IT – even through the power outage due to the prior arrangements made with the DR service providers.
Finance issued a communication requesting that they be notified of any needed hotel and travel arrangements and that all expenses incurred needed to be justified correctly as storm-related or fuel spill-related. Due to the escalating conditions, they subsequently upped the credit limits on all corporate credit cards.
An interesting twist to the exercise was that somehow the FDIC got into the facility and was snooping around asking about Can-Do’s recovery ability. The FDIC person faked her identity to the IT recovery team as it turns out. Though they were smart enough to ask who she was, the team was a bit disjointed and she was given the information she asked for.
Multiple emergency messages were being sent to employees from the various recovery teams throughout the exercise but not everyone was getting the messages. As it turned out, some cell phone towers were down in Richmond, VA due to Hurricane Mary hitting land.
Finally, staff in CT were sent home at 7 pm that day in preparation for the hurricanes hitting.
Hopefully you can sense the problems that arose during this exercise. Some observations:
- The plan needed to be updated ASAP! An old plan is worthless and you waste a lot of time trying to figure out what the current practices are for production processing and recovery of those practices.
- Some exercise participants weren’t aware that they needed to DO something – they sat at the team table and just talked about what they would do in an actual recovery rather than going over to the table of the team with whom they needed to communicate.
- The exercise showed that internal recovery team roles need to be defined in advance so that when an incident occurs, everyone knows their role.
- Communications between the teams was strained at first – no one knew who the coordinators were. Although Can-Do ascribed to NIMS, no crisis command procedures nor crisis communication procedures were part of the plan.
- It was not clear that there was an emergency/mass notification tool available. It took a good 30 minutes for the tool to start being used consistently. AND the first time recovery teams wanted to send a message, they were informed that they needed to have that message approved by the incident command center – also not in the recovery plan and obviously not tested. And rather much a thorn in the side of IT because we thought we should be able to send out our own IT messages – WAKE UP CALL that external communications must be reviewed by the right people internally AND that certain kinds of messages should be set up in advance so that when the interruption occurs, you have much of the messaging in place and approved.
- Releasing recovery status information to someone who turned out to be from the FDIC presented a loophole in the crisis communications process.
- Command center check-in calls with all team coordinators need to be scheduled on a regular basis.
- There was no conference bridge capability for the incident command center to use so everyone had to physically be present at the command center.
- An activity log of actions and tasks and their status needs to be created and updated throughout the incident.
- Finally, a best practice for recovery exercising is to bring in outside observers to identify things during the exercise that you can’t see for yourself.
We had a great event and we all learned a lot about how important it is to have a current plan, clear recovery role assignment, how we personally respond in a crisis and how communications is KEY to recovery success.
View Free, Relevant Gartner Research
Gartner's research helps you cut through the complexity and deliver the knowledge you need to make the right decisions quickly, and with confidence.Read Free Gartner Research
Tags: availability-risk backup-and-recovery banking bcm bcp bia business-continuity-management business-continuity-planning business-impact-analysis emergency-notification emergency-preparedness governance incident-management it-disaster-recovery mass-notification operational-risk-management pandemic-planning recovery-planning resiliency risk-assessment roberta-witty social-media supply-chain-risk-management workforce-continuity
Comments or opinions expressed on this blog are those of the individual contributors only, and do not necessarily represent the views of Gartner, Inc. or its management. Readers may copy and redistribute blog postings on other blogs, or otherwise for private, non-commercial or journalistic purposes, with attribution to Gartner. This content may not be used for any other purposes in any other formats or media. The content on this blog is provided on an "as-is" basis. Gartner shall not be liable for any damages whatsoever arising out of the content or use of this blog.