Given the current climate, the last thing most of us want to think about is another disaster, but we can’t afford not to. Working from home is “flattening the curve” but it cannot prevent the multitude of natural and man-made misfortunes that can strike down a network. Network downtime is extremely expensive under normal circumstances. According to Gartner, one minute of downtime is equivalent to $5,600 or $93.333 per second.
According to recent data, almost seven out of 10 knowledge workers are working from home right now, magnifying the importance of network uptime. Companies are focused on keeping their employees on the payroll and serving customers, budgets are stretched, significant downtime can damage the business and likely cost someone like a CTO their job.
To mitigate this, many savvy businesses are moving to the cloud for enhanced redundancy, reliability, and improved uptime. While this is a smart move, it’s important to recognize that not all clouds are created equal.
How Many 9s Do You Need?
There are two basic cloud architectures for disaster recovery, active-active which can deliver >99.999% uptime and active-passive architecture which struggles to offer 99.99% uptime. When I was in school, 99.99% was a fantastic grade. However, in terms of business uptime, the difference between four 9’s of reliability and five 9s could be millions in revenue and untold losses in brand reputation and customer loyalty.
In this article, we’ll explore the basic how-tos and the benefits of implementing an active-active cloud architecture. In short, we’ll look at an architecture that helps prevent disasters rather than recover from them. Keep in mind that this is one piece of the puzzle when it comes to deploying a complete disaster recovery plan. Nothing is fail-safe so no matter how robust your architecture is or how well you’ve planned for disasters, you still may have to recover from one. It’s a good practice to have your team run disaster recovery drills at least once a quarter.
When Disaster Strikes
We often take for granted our data is available to us at all times, via our iPhone for example. Everything is available to us from the cloud but there is a datacenter somewhere that is hosting and mirroring that data. What happens when a natural disaster strikes and the datacenter housing that data is affected by a flood or a tornado? In these cases, you could lose all your information including data, photos, videos, emails, etc.
Now think of this in a business sense. Would you be able to complete your job without access to all that data and content on your laptop? It’s crucial to have at least two redundant data centers available in completely different locations in the case of a disaster so you can continue to remain productive and get work done without any downtime. This can be achieved by implementing an active-active cloud architecture.
Active-Active vs. Active-Passive Architecture
Most clouds do have redundancy so when one datacenter goes down (in the case of a natural disaster for example), you can still access the mirrored datacenter to stay productive. However, these connections are usually passive, meaning someone has to manually move all users to the alternate site, which causes more downtime for the customer. In the case of an active-active architecture, both sites are active-active – this means if one site goes down, users are automatically moved to the other site without any downtime. This is a much more modern approach, where the network is self-healing and self-monitoring without the need for manual attention. An active-active is also a mobile solution where users can stay online and productive without experiencing the latency associated with accessing datacenters around the world. When customers travel, they are automatically connected to the nearest healthy datacenter for a seamless experience.
An active-active architecture also affects the recovery time objective (RTO), which is a targeted duration of time a defined service level must be restored after a disaster or disruption – as well as the recovery point objective (RPO), which is the maximum amount of time data may be lost in the event of an incident. In an active-active scenario, both metrics are eliminated completely, as both will equal zero seconds, meaning any time or data loss after an incident or a disruption is eliminated and end-users will be unaffected.
Steps to Building an Active-Active Cloud Architecture
To build an active-active cloud architecture, you must create at least two physically separated networks that mirror each other. Then, implement content and data synchronization between the sites:
1. Data Replication
In any organization, there are large amounts of data being created constantly. Continuous access to this data is crucial for keeping the business up and running. Data Replication ensures redundancy and automatically reroutes users to the functioning network(s) if one site is down.
2. Application Aware Architecture
In addition to data replication, the applications must be redesigned to ensure awareness of the active-active cloud architecture ensuring they respond appropriately during any disruption. IT teams should carefully evaluate which aspects of the cloud architecture should be designed as active-active and which are more fault tolerant and appropriate for an active-passive architecture.
3. Geo Proximity
Geo proximity means a user is connected to the nearest, healthy data center location automatically, no matter where they are. Also, if one site goes down, the user is automatically connected to the next nearest healthy site for a seamless experience. This is similar to how cell phone towers work. This ensures highly reliable, low-latency performance.
4. Testing 1, 2, 3…
Always test your cloud architecture with a syndicated outage to determine if the process is successful. Don’t wait for that first hurricane to hit before you know what you’re dealing with.
Why Implement an Active-Passive Architecture
An active-active architecture is the best option for preventing downtime and preventing data loss, the reality is that many organizations defer to an active-passive for one reason – because it’s easier. An active-active architecture is difficult to maintain, requires extensive coding on the backend, and large amounts of time, costs and resources, which will vary by specific architectures. The truth is that many organizations choose the easy road by implementing an active-passive approach because they believe it’s “good enough” to get by and serve their customers adequately.
However, if an organization has a real goal is to provide a seamless, always-on experience for the customer, the organization must take a hit in terms of time and resources to make this a reality. After all, the real success of a company comes down to the success of their customers and in the end, every second of downtime matters.
The best way to achieve >99.999% availability is to deploy an active-active cloud architecture. But in reality, an active-active is only as good as the quality of the software. Building two datacenters is the easy part, but the best user experience really comes down to the software – this is the real secret sauce and will determine the long-term success of any solution. As businesses strive to find their new normal, keep employees engaged while working remotely and maintain customers the last thing they need is unreliable technology, data loss, or downtime. Determine which areas of your network are mission-critical and begin moving them to an active-active architecture as soon as possible. As we’ve learned all too well, the world can change in an instant and you need to be prepared.