We often talk about what you should do before disaster strikes. All your preparations and testing determine how well you’ll weather the unexpected natural disaster, cyberattack, power outage or other disruption.
While a key part of your success will depend on the work you do before a disaster, what you do during and after disasters plays just as much of a role in your success.
Beyond the thorough preparation needed beforehand, the execution of a successful recovery also involves communication and coordination throughout the incident. Additionally, once you’ve returned to business as usual, you’ll need to take a hard look at your plans to see what adjustments you can make to ensure a smoother recovery process next time.
Here’s what you should be thinking about and doing before, during and after a disaster.
Before a disaster: How to make sure your DR plan is ready
Your goal is a constant state of readiness. If you’re sitting at home and a friend knocks on your door, would you invite them in? If you were expecting company, you might have made a little extra effort and tidied up a bit. But is your house ready at any moment for guests? Other priorities can get in the way of keeping a clean house, and an unexpected visitor inevitably exposes any unpreparedness.
Disasters act exactly same way. If you’ve neglected your disaster recovery (DR) planning, the unexpected natural disaster or cyberattack will reveal all your vulnerabilities and lack of readiness for the “unexpected visitor.”
On the other hand, if you’re in a constant state of readiness, it doesn’t matter when a disaster crosses your path. However, this can be tough to achieve, in part because the target keeps moving.
One of the biggest challenges to achieving a state of disasters readiness is dealing with change. Changes in production, changes in applications, changes in hardware and changes in cloud providers are just some of the issues that can trip up your DR plan.
Change management is immensely important to keep up with and should be part of your ongoing DR plan hygiene. Adjust your plan based on any changes, ensure configurations are up to date and have the latest data, and test any implemented changes to make sure your plan will work as intended.
While keeping in mind a good change management posture, there are several steps to take on an ongoing basis to ensure the readiness of your DR plan.
- Keep an eye on backups. You understand the need to back up data. But many organizations have been caught wrongly assuming data backups are happening correctly when they’re not, only to try to restore a backup that doesn’t exist or work. Double check that backups are happening on schedule and that you can recover from those backups.
- Keep an eye on procedures. You have to create a culture and mindset of preparation so that your DR plan becomes like muscle memory, and everyone is ready to go no matter when disaster hits. If you’re trying to define or update procedures when a hurricane is about to make landfall, it’s probably too late. Make sure everyone knows their roles, you’ve done tests and mock exercises, you’ve identified any gaps in your plan, and you’ve updated your plan based on any changes in your environment.
- Prepare employees. First, you need to know which employees will execute your DR plan and ensure each employee understands their roles and responsibilities well in advance of a disaster. Second, you need to know what employees will do in the event of disaster: Where will they work if your offices are shut down for an extended period of time?
- Test. How often you test your DR plan depends on the business criticality of applications, your level of maturity as an organization and other factors. You should test your DR plan at least twice a year. For your most critical business applications, you should ideally test more frequently.
By performing these steps regularly throughout the year, your DR plan will be ready when you need it. If you’re in an area impacted by a particular disaster season, whether it’s hurricanes, wildfires or something else, take time a month or two before the season begins to review your plan, test it and make sure all employees are on the same page. Disaster seasons bring higher risk exposure, but they also offer a deadline for making sure your systems are resilient.
During a disaster: How to handle decision-making and communication
There are two pieces to weathering a disaster effectively:
- Deciding whether to declare and set your DR plan in motion
Whether to declare is a big decision to make in the midst of an actual or impending disaster. As part of your DR planning, make sure you have clear criteria and have determined who makes that call. Don’t forget to assign a backup decision-maker as well in case the primary one is unavailable. With the overhead of failing back after the disaster, declaring is a decision that shouldn’t be taken lightly.
Communication is critical and can be difficult in the middle of a natural disaster. The regular means of communications don’t always work and there can be chaotic events to handle, but you need a way to make sure all employees are notified of updates.
One approach is to send messages via multiple communication channels – such as e-mail, text and phone – all at once. Employees might not have access to e-mail, but they still have a phone on them or vice versa. At the risk of over-communicating, this keeps everyone up to speed on what’s happening and if there are any actions that need to be taken. Obviously, you need to make a list of people to notify and their relevant contact details before the disaster hits.
At the end of the day, your people need to be the most important focus during a disaster. Not only to make sure they’re OK, but also to ensure they have a way to return to work once the danger has passed. If you have your business applications up and running but no employees, you’re still out of luck.
After a disaster: Fail back to production and assessing your plan
If you’ve done your DR planning and execution correctly, you’ll be up and running quickly with minimal down time and data loss or you’ll have never missed a beat in the aftermath of a natural disaster, cyberattack or other disruption.
If you’re running in DR mode, the question becomes how to fail back. We often don’t talk about failback, but planning for it is important and something to think through before disaster strikes.
One of the reasons failback isn’t often talked about is that can be complicated, and there are many variants for how it needs to be performed. The common mentality is to worry about it when it happens, since there are so many different permutations.
How it works depends on what has happened in your production systems. If your office building is rubble, you have a big challenge: how to procure a new production setup, where to put it and so on – a much longer process.
On the other end of the spectrum, you might have faced a power outage, shut down your systems and now you can just bring them back up and do a fairly straightforward failback. There are automated technologies today that can handle the complexities of data synchronization, however it still requires careful planning and execution.
Think through and prepare for various scenarios. If your building is destroyed, do you have procurement contracts in place? Is insurance helping out? If you need to replace servers, how long will it take to get them? If you’re using a regional supplier and you’re recovering from a hurricane, for example, everyone else in your area might be looking to them for the same thing, creating delays.
After you’ve failed backup
Once your production systems are back up and running as usual, take stock of how your DR plan played out. Did you hit your RPOs and RTOs? Did everything work as designed or did you have to create workarounds to meet your objectives? What could you improve?
Another consideration is whether your business can run effectively in DR mode. If you have an online transactional application, for example, you might not be running at exactly the same level of performance as production, but did you still meet your objectives?
If you were running in DR mode for an extended period of time, did you have backups in place for regulatory reasons? Were you monitoring systems, and did you have everything working as it should in production?
How close were you running to production? Was it so seamless you didn’t even know you were in DR mode? Or was it clear you were only running at 80 percent?
Taking a hard look at your DR plan
A DR plan should be an ever-evolving process. It changes before a disaster strikes based on updates to your environment and testing that can uncover gaps or issues. It changes as you assess what happened during the disaster and how well the process went.
If you treat it as an evolving process, focusing on change management, risk mitigation, testing, communication and your RPO and RTO measures of success, you can create a DR plan that carries your business’s most critical applications seamlessly through any disaster.
Joseph George is a highly experienced technology product management leader with a strong understanding of technology and extensive business management experience. He serves as vice president of global recovery services product management at Sungard AS.