drj logo
  • This field is for validation purposes and should be left unchanged.

Already have an account? Log in

drj logo

Welcome to DRJ

Already registered user? Please login here

Login Form

Register
Forgot password? Click here to reset

Create new account
(it's completely free). Subscribe

Skip to content
Disaster Recovery Journal
  • EN ESPAÑOL
  • SIGN IN
  • SUBSCRIBE
  • THE JOURNAL
    • Digital Edition
    • Article Submission
    • DRJ Annual Resource Directories
    • Article Archives
    • Career Spotlight
  • EVENTS
    • DRJ Spring 2023
    • DRJ Fall 2023
    • Other Industry Events
    • Schedule & Archive
  • WEBINARS
    • Upcoming Webinars
    • On Demand
  • MENTOR PROGRAM
  • DRJ ACADEMY
    • DRJ Academy
    • Beginner’s Guide to BC
  • RESOURCES
    • New to Business Continuity?
    • White Papers
    • DR Rules and Regs
    • Planning Groups
    • Business Directory
    • Business Resilience Decoded
    • DRJ Glossary of Business Continuity Terms
    • Careers
  • ABOUT
    • Advertise with DRJ
    • DE&I
    • Board and Committees
      • Executive Council Members
      • Editorial Advisory Board
      • Career Development Committee
      • Glossary Committee
      • Rules and Regulations Committee
  • Podcast

Creating Cloud Applications That Survive the Big One

by Jon Seals | January 10, 2023 | | 0 comments

By Andrew Oliver, Senior Director of Product Marketing at MariaDB

The world is changing and becoming more perilous. Hurricanes can hit New York. California burns yearly in addition to earthquakes. In 2022, both Germany and Seoul flooded while the UK experienced a record heatwave that caused fatalities, not to mention power grid disruption. There is hope on the horizon as deploying renewable energy has become more affordable than even maintaining coalfire plants, but regardless of progress, data infrastructure must be more resilient than ever.

In disaster recovery or high availability, redundancy is the whole ball game. Modern applications, especially in the cloud, are no different. It is the volume, business and user expectations, and boundary of possibilities that have changed. The tape backups of yore could never keep up. For high-volume applications, even cross-region replication (WAN replication) usually does not keep up with the transaction rate. Fortunately, a bevy of new technologies and capabilities are changing the game.

Field of Expectations

While the world has become more perilous, user expectations for reliability have grown. In the late 1990s and early 2000s, website outages were rather frequent and accepted. Now, if even a major retail site goes down, not only is it a major business event for that company, it makes headlines. To some degree, outages are like crime; while they have actually gone down, awareness and, in many cases, the impacts have gone up. Users expect absolute reliability even if a hurricane has taken out a major data center – which, with multiple power and network redundancies, is not as likely as it once was. 

Natural disasters are less common than more immediate problems caused by human fault. When Facebook went down in October 2021, it was not a fire, hurricane, flooding, or earthquake but a network misconfiguration. Meanwhile, software developers are under increased pressure to deliver early and often. Fast development cycles mean less time for quality assurance and more bugs. 

If all of that was not enough, business on the Internet is “spikey.” One day can be a total lag, and the next day could require almost unbounded capacity. What does a user who sees an hourglass (or spinning orb) do? Why, refresh of course, and double the number of failed requests. Luckily the cloud and modern virtualization technologies do give the ability to scale at least hardware on demand – at least up to a point.

Availability Zones

One of the best things about modern cloud providers is that they provide a sophisticated framework for high availability, simplified into the concept of availability zones. Where is the network redundant and the facility separate but not far away? Where this used to be a complicated issue and a series of agreements, now it is just different availability zones in the same region. This has greatly simplified facility, network, and hardware concerns. 

With virtualization and orchestration technologies like Kubernetes, it is possible to develop and deploy services that spin up seamlessly across multiple zones. However, storage is always the biggest issue. Amazon Web Services EBS does not stripe across zones. This requires storage reliability to be handled at a higher level.

The most obvious examples of this are databases. Until recently, the market had a choice between a client-server relational database that handled data integrity well but could only scale vertically, and so-called “NoSQL” databases that often lacked the features, sophistication, and transactional reliability of an Oracle or DB2. This has changed. NoSQL databases often do supply transactional idioms. Relational databases can now handle unstructured and JSON data, and there are now distributed SQL offerings (including MariaDB Xpand) that meet all of the common expectations of a relational database while scaling horizontally, including across availability zones (rack awareness).

It is now possible to deploy a distributed database that scales by just adding nodes and can even “scale back” by removing nodes. These databases have redundant copies of data and enable virtually unlimited scale both in data size and user capacity. When a node or availability zone is lost, these databases continue and reintegrate and catch up nodes if they return quickly or automatically restore redundancy if they don’t. Combined with load balancers, routing protocols and redundancies over DNS, it is possible to deploy applications and infrastructure that tolerates multiple faults across a cloud region, even at the data layer.

Global Regions

Surviving the loss of a datacenter or two in a region is nice. However, when the big mega hurricane takes out us-east-1a, us-east-1b, and us-east-1c (aka Virginia), it would be nice to be able to failover to us-east-2 (aka Ohio). Again with modern routing protocols and DNS, that is not so hard. However, again the database is the lynchpin. 

In order to maintain some level of failure tolerance, the database must replicate in real or near-real-time. Fortunately, most databases support some form of cross-region replication. This capability is usually asynchronous with eventual consistency to avoid impacting regular operational performance. For financial and other high applications with a high write volume, a key issue is whether the database (Virginia) can write to the replica cluster (Ohio) even at peak load. To answer this requirement, distributed SQL databases such as MariaDB Xpand have added parallel replication to utilize the full processing and abundant network capacity of all the nodes on the source and target clusters, enabling even distant replication to scale with the workload.

For some applications, performance is more important than whether every last drop of data made it over during “the big one.” Data can be recovered or restored by other means. In other applications, the cost of latency is worth ensuring both sides are up to date. Some distributed SQL databases can span clusters across regions. While data redundancy and replication comes with a continuous performance impact, it can be critical for applications that need to failover with no potential for data loss and minimal impact.

When deciding on cross-region failover, it is important to understand the costs associated with ongoing operations. Cloud providers charge for data transfer. Data ingress is usually free, but egress usually costs money. The cost of replicating to a second region is not just the cost of the storage or compute but IOPS and egress.

Online Backup

Despite the best security, multiple redundancies, and best practices, bad things can still happen. Consider that human folly could change data in a way that was not intended. This would replicate throughout the infrastructure. Nothing about the multiple redundancies would necessarily fix that. Despite all of our best efforts toward fail proof infrastructure, online backups are still necessary. 

It is no longer practical to take the system down for backups. High-volume databases need a backup solution that does not interrupt the system while ensuring the data is captured correctly. Distributed databases offer parallel backup and restore. This capability captures the current state, and ensures full transactional integrity in an active system under load, and shortening recovery time in the event it’s required.

Configuration Management

Remember that Facebook was not a natural disaster, security failure or possibly even a change of source code. Instead, it was a configuration mistake. While modern devops has made administration and change management easier, it has also made it more complex. Where there is complexity, there is the opportunity for human error. 

There are new tools available to reduce the chance of this error. One of the most promising technologies is one called GitOps. GitOps deploys an agent-based architecture to the various components of a system. Changes to the configuration are not pushed to the various nodes of the system but pulled by the agents and applied. Administrators check in changes to a revision control repository. Moreover, changes can be automatically rolled back in the event communication is lost. 

The Big One

Surviving the big one is about preparation requiring multiple redundancies, backups, and configuration management including at the database layer. For cloud applications it means deploying new technologies including distributed databases with multiple availability zones and cross region replication. However, the “Big One” is often no act of god, but a human error. For these it requires both backups and a way to manage and apply configuration as well as rollback changes that do not work. Those that apply these practices can not only perform and scale, but carry on even when bad things happen.

Author Bio

Andrew C. Oliver

Andrew C. Oliver is a columnist and software developer with a long history in open source, database, and cloud computing. He founded Apache POI and served on the board of the Open-Source Initiative. Oliver also helped with marketing in startups including JBoss, Lucidworks, and Couchbase. He is currently the Senior Director of Product Marketing for MariaDB Corporation.

Follow Andrew on LinkedIn here and on Twitter here.

Related Content

  1. Disaster Recovery Journal
    Continuous Application Availability: Strategy for Business Resiliency
  2. Disaster Recovery Journal
    Tips to Making the Best Cloud Backup Decision
  3. Disaster Recovery Journal
    Going Cloud Native with Your Cloud Backup Strategy

Recent Posts

Virtana Research: 94% of IT Leaders Report Cloud Storage Costs Are Rising; 54% Confirm Storage Spend is Growing Faster Compared to Overall Cloud Costs

January 31, 2023

Exabeam Survey Finds Organizations’ Security Priorities are Wrong as Breaches Continue to Rise

January 31, 2023

FEMA Releases New Tools for State Mitigation Planning Programs

January 31, 2023

EU Directive NIS 2 in Force: Companies Under Pressure on Cybersecurity

January 31, 2023

InterVision Expands Its Managed Cloud and Security Services Portfolio

January 31, 2023

CPSI to Webcast Its Fourth Quarter and Year-end 2022 Conference Call

January 31, 2023

Archives

  • January 2023 (1350)
  • December 2022 (1144)
  • November 2022 (1595)
  • October 2022 (1574)
  • September 2022 (1571)
  • August 2022 (1581)
  • July 2022 (1365)
  • June 2022 (1711)
  • May 2022 (1651)
  • April 2022 (1618)
  • March 2022 (1924)
  • February 2022 (1549)
  • January 2022 (1472)
  • December 2021 (1446)
  • November 2021 (1835)
  • October 2021 (1777)
  • September 2021 (1697)
  • August 2021 (1661)
  • July 2021 (1566)
  • June 2021 (1768)
  • May 2021 (1666)
  • April 2021 (1798)
  • March 2021 (1907)
  • February 2021 (1038)
  • January 2021 (554)
  • December 2020 (30)
  • November 2020 (35)
  • October 2020 (48)
  • September 2020 (57)
  • August 2020 (52)
  • July 2020 (40)
  • June 2020 (72)
  • May 2020 (46)
  • April 2020 (59)
  • March 2020 (46)
  • February 2020 (28)
  • January 2020 (36)
  • December 2019 (22)
  • November 2019 (11)
  • October 2019 (36)
  • September 2019 (44)
  • August 2019 (77)
  • July 2019 (117)
  • June 2019 (106)
  • May 2019 (49)
  • April 2019 (47)
  • March 2019 (24)
  • February 2019 (37)
  • January 2019 (12)
  • ARTICLES & NEWS

    • Business Continuity
    • Disaster Recovery
    • Crisis Management & Communications
    • Risk Management
    • Article Archives
    • Industry News

    THE JOURNAL

    • Digital Edition
    • Advertising & Media Kit
    • Submit an Article
    • Career Spotlight

    RESOURCES

    • White Papers
    • Rules & Regulations
    • FAQs
    • Glossary of Terms
    • Industry Groups
    • Business & Resource Directory
    • Business Resilience Decoded
    • Careers

    EVENTS

    • Spring 2023

    WEBINARS

    • Watch Now
    • Upcoming

    CONTACT

    • Article Submission
    • Media Kit
    • Contact Us

    ABOUT DRJ

    Disaster Recovery Journal is the industry’s largest resource for business continuity, disaster recovery, crisis management, and risk management, reaching a global network of more than 138,000 professionals. Offering weekly webinars, the latest industry news, rules and regulations, podcasts, the industry’s only official mentoring program, a quarterly magazine, and two annual live conferences, DRJ is leading the way to keep professionals up-to-date and connected in an ever-changing world.

    LEARN MORE

    TWITTER

    Disaster Recovery Journal is the leading publication/event covering business continuity/disaster recovery.

    Follow us for daily updates @drjournal

    Newsletter

    The Journal, right in your inbox.

    Be informed and stay connected by getting the latest in news, events, webinars and whitepapers on Business Continuity and Disaster Recovery.

    Subscribe Now
    Copyright 2023 Disater Recovery Journal
    • Terms of Use
    • Privacy Policy