Not long ago, cloud computing was dismissed as a fad by some prominent figures in the tech world. That opinion seems absurd today when you consider nearly 95% of organizations rely on the cloud to run their IT services and infrastructure, and of those Gartner says 81% operate multi-cloud environments. The reasons are legion as use of cloud resources minimizes the cost of buying, managing, and maintaining traditional IT infrastructure and makes accessing services and compute capacity faster and easier than on-premises buildouts.

But offloading the responsibility for IT management to a third party is not without its risks. As more organizations become dependent on the cloud and other third-parties for their IT services and infrastructure, the risk of a service outage affecting operations is one that must be acknowledged and managed. This was made clear when a distributed denial of service (DDoS) attack affected Azure in July of 2024, knocking some services offline. The resulting disruption to operations was a hard lesson for those organizations without a disaster recovery plan and whose effects were measured in the resulting loss of customer confidence and to the bottom line.

Learning from experience, it is incumbent on cloud-dependent organizations to take steps to mitigate against the (some will argue inevitable) risk of a cloud outage by adopting a high-availability failover strategy that can ensure business-critical operations remain online even if one vendor’s cloud infrastructure goes down. The same approach can also accommodate failures of on-premises and private cloud services in the event something goes wrong.

Not Someone Else’s Problem

No one should be surprised that cloud services occasionally go offline. If you think of the cloud as “someone else’s computer,” then you recognize there are servers and software behind it all. Someone else is doing their best to keep the lights on in the face of events like human error, natural disasters, and DDoS and other types of cyberattacks. Someone else is executing their disaster response and recovery plan. While the cloud may well be someone else’s computer, when there is a cloud outage that affects your operations, it is your problem. You are at the mercy of someone else to restore services so you can get back online.

It doesn’t have to be that way. Cloud-dependent organizations can adopt strategies that allow them to minimize the risk someone else’s outage will knock them offline. One such strategy is to take advantage of hybrid or multi-cloud architecture to achieve operational resiliency and high availability through service redundancy through SANless clustering.

Normally a storage area network (SAN) uses local storage to configure clustered nodes on-premises, in the cloud, and to a disaster recovery site. It’s a proven approach, but because it is hardware dependent, it is costly in terms of dollars and computing resources, and comes with additional management demands. In contrast, SANless clustering is a software-based approach that uses real-time (synchronous or asynchronous), block level replication to enable seamless failover for critical applications across clouds and synchronizes local storage. SANless clustering is more cost-efficient and easier to manage, while functioning in the same way traditional SAN-based clusters do.

Standing Out for Flexibility

Where SANless clusters stand out is in giving IT management the flexibility to configure nodes based on the organization’s needs. Whether operating geographically distributed data centers or cloud availability zones, SANless clustering allows you to protect single-site, multi-site, cloud, or mixed environments with immediate access to current operational data during failover. By distributing mission-critical workloads across multiple providers using SANless clustering you introduce system redundancy and high availability, eliminating the risk of a single point of failure. This means that, in the event your cloud services provider experiences an outage, those services immediately shift to a secondary cloud service, keeping you up and running.

Of course, this high-availability strategy assumes your organization is among the 81% of enterprises that have already adopted a multi-cloud approach to IT, but the odds are in my favor. For the 19% that have not yet made the decision to go multi-cloud, this may help bolster your argument to take the plunge. But whether you are already multi-cloud or in the process of considering it, you should take stock of what your incumbent and secondary providers do best when planning your move to SANless clustering.

Best Practices for Maximal Performance

Aligning those strengths to your needs and priorities can help you to maximize performance and minimize costs (while also giving you the opportunity to eliminate any lock-in leverage your primary provider may have when it comes time to renegotiate your agreement. Here are a few best practices to take into consideration when conducting that evaluation:

  • Work with your legal/compliance team to address how changes to data management and communication policies affect security, existing service level agreements (SLAs), and other compliance considerations before implementation.
  • Identify and calculate the likely impact of applicable egress fees charged by cloud providers for data moving out of their cloud in the event of a failover/failback or manual switchover/switchback for testing.
  • Be sure elements like SLAs, networking architecture, and patch schedule are defined and can be aligned with current operations before choosing a secondary cloud provider, and determine whether synchronous vs. asynchronous replication works better in the context of region, availability zones, disaster recovery, etc.
  • Following the establishment of new configurations (active-passive or active-active), processes, and policies, test, evaluate, and retest implementation to ensure systems failover as intended to establish cross-cloud redundancy.

It is important to remember not all clouds are the same. Even your most seasoned cloud management personnel will need training to ensure they become as familiar with the operational nuances of the new platform as they are with the incumbent. Additionally, there are tools available to support staff by automating functions like failover policy enforcement. I strongly recommend investing in these complementary technologies.

Conclusion

With 95% cloud adoption and 81% running multi-cloud environments, it’s safe to say cloud computing is here to stay. While the cloud you use may well be someone else’s computers, that doesn’t absolve you of the responsibilities of investing in a disaster response and recovery plan that considers the likelihood of a cloud outage. That plan must not rely on your cloud provider, but instead take the necessary steps to be ready for a worst-case contingency. That is why achieving high availability using SANless clustering for service failover should be a consideration to that plan.

ABOUT THE AUTHOR

Dave Bermingham

Dave Bermingham is the senior technical evangelist at SIOS Technology. He is recognized within the technology community as a high availability expert and has been honored by his peers by being elected to be a Microsoft MVP in Clustering six times and seven times as a Cloud and Datacenter MVP. Bermingham is a frequent speaker at technical conferences, including SQL Saturdays, Pass Summit, and MSSQL Tips, and is the author of Clustering for Mere Mortals blog. Bermingham holds numerous technical certifications and has more than 30 years of IT experience, including in finance, healthcare, and education.

How Today’s Data Centers Can Do a Better Job Monitoring Power and Power Quality
Data Center Power Quality Challenges Put Businesses at Risk; It’s Time to Fix That While disaster recovery and risk management...
READ MORE >
Backup Power: How Innovation is Keeping the Lights on at Data Centers
The sheer number of moving pieces and parts in a data center is staggering. Cabinets, servers, CRACs, CRAHs, chillers, UPS,...
READ MORE >
How Data Centers Help Embattled Enterprises in the Fight Against Ransomware
The recent Colonial Pipeline ransomware disruption highlights the fact that it’s not likely a question of “if” your organization will...
READ MORE >
Embracing First-Party Data in a Cookie-Alternative World
The digital marketing landscape is undergoing a seismic shift as third-party cookies, which have served as the backbone of online...
READ MORE >