Companies have an ever-increasing dependence on technology. A fact that is both exciting, but also worrisome. Secure access and robust, minute to minute performance from IT is an expectation, and a competitive necessity. Downtime is not an option, both operationally and economically, in support of providing a consistently seamless customer experience.
So, with cloud adoption booming, one of the key objectives of a transformation strategy is to become as close to 100% virtualized as possible and construct an active-active environment for as much of your IT estate as possible. After all, your peers are trending in that direction: managing hybrid workloads and selecting the right ‘as a service’ eco-partners are today’s IT fundamentals.
And, if we start to selectively outsource key services and virtualize our own IT estate, then we have solved the challenge of both being highly available and resilient in production, and that automatically solves our disaster recovery requirements!
This is an interesting perspective, however, before we construct an ‘ideal’ future state in more detail, let’s delve a little deeper into the current state.
Having a resilient operational posture is becoming mandatory for a growing segment of companies across disparate verticals. Previously, it had been only banks, healthcare, and some utilities that would spend the dollars necessary to construct an active-active, failover capability for mission-critical workloads.
Today, the pursuit of being highly available for operational resiliency has a false narrative that ‘recoverability’ is automatically built-in. Workloads can just failover and then failback, simply ‘Hit the button and we are done!’ Not only that, but we also have very effective SLA’s to boot.
Not quite…
AWS, Google, Azure etc., have similar capabilities: Disparate regions and availability zones (infrastructure, security, storage, network), leveraging high-speed, low latency, secure networks. Workloads can be balanced so an Availability Zone (AZ) outage causes degradation, not hard down scenarios. And assuming the customer has architected auto-scaling and elastic load balancing capabilities to reinstate affected workloads, the recovery effort is automated. If the ability to auto-scale within the provider’s AZ configurations is not proactively architected, customers retain the responsibility to initiate runbook execution to recover workloads in the AZ’s.