Practically speaking, high availability (HA) and disaster recovery (DR) are two distinct elements of resilience that make up a greater whole. The key differentiating factor is that one is pre-emptive, while the other is reactive.

While high availability focuses on preventing downtime, disaster recovery is a reactive process intended to restore normal operations as quickly as possible following a significant failure. It presumes disruptions will happen and prepares for swift recovery when needed.

Together, they form a comprehensive approach to IT service continuity management.

High availability – pre-empting disruption

High availability refers to a system’s ability to remain operational and always-on, leveraging built-in redundancy and fault tolerance. HA is pre-emptive, involving strategies and technologies that detect potential failures and take immediate action to prevent downtime.

Key components of high availability:

  • Hardware redundancy: Redundant storage, error-correcting memory, backup power sources and other technology ensures a single hardware failure doesn’t bring down the entire system.
  • Software redundancy: Techniques such as clustering, load balancing and self-healing systems distribute workloads across multiple servers, enhancing the system’s ability to withstand failures.
  • Environmental redundancy: Utilizing data centers that are geographically or virtually dispersed ensures localized issues don’t affect the entire system.

By incorporating these redundancies, HA systems provide the resilience needed to keep critical applications and services running without interruption.

Disaster Recovery – responding to disruption

Disaster recovery encompasses the tools and procedures designed to enable the recovery or continuation of vital systems and infrastructure. As the final port of call in a disaster, some form of DR should always be in place – no matter the availability of existing systems.

DR strategies typically involve maintaining secondary systems that can be failed over from either another data center or the cloud.

Core concepts of disaster recovery:

  • Recovery time objective (RTO): The maximum acceptable delay between the interruption of service and its restoration. It represents the targeted duration for recovery after a disruption (e.g. the recovery takes 15 mins).
  • Recovery point objective (RPO): The maximum acceptable amount of data loss measured in time. It determines how far back in time data recovery should go (e.g. we recover to 4 hours ago).

Two halves of resilience

To minimize interruption – be it from a minor fault or major disaster – organizations must first ensure their infrastructure meets all of their needs for availability. This ensures the impact from minor faults or disruptions is not felt by end users or your customers.

But in the event of more significant disruption, procedures for disaster recovery should be in place to meet exact requirements for both RTO and RPO.

Changing priorities

Historically, architecting a system for high availability was an expensive undertaking, but it solved most causes of IT downtime. Forward thinking and uptime-focused organizations invested heavily in HA, and as a result, reduced their spend on DR.

In many instances, DR for these organizations simply meant recovering from backups, rather than failing over to replicated systems with recovery points prior to infection. But emerging cyber threats have highlighted the flaws in this approach.

Today, cyber is the leading cause of downtime and data loss, and organizations which have invested disproportionately into HA are unprotected against system-wide attacks and breaches.

Security, availability and recoverability are not absolutes – there is a risk-cost balance to be struck. With even the best designed HA systems subject to long recovery times in the event of an attack, the question is how best to allocate resources to maintain business continuity.

To this end, IT service continuity budgets must either be rebalanced or increased to accommodate the adoption or improvement of dedicated solutions for disaster recovery.

ABOUT THE AUTHOR

James Watts

James Watts is the managing director of Databarracks, the business and technology resilience specialists.

The State of Disaster Recovery Preparedness in 2022
Our 2022 Disaster Recover Preparedness survey showed disaster recovery (DR) continues to be a top concern, with many DR programs...
READ MORE >
Anticipating the Ransomware Attack
Many highly regarded public and private organizations, theoretically prepared, have not escaped being victims of ransomware attacks. This can become...
READ MORE >
Disaster Recovery Investments Grow Revenue, Not Just Cut Costs
Today’s organizations are navigating unique business challenges amidst the increased cost and frequency of data breaches. Security incidents are overwhelmingly...
READ MORE >
How to Defend Against Cyberattacks That Take Over Admin Accounts
In July, 45 high-profile Twitter accounts tweeted variations of the same offer: Send me any amount of Bitcoin, and I’ll...
READ MORE >