As an industry professional, you're eligible to receive a printed copy of the journal.

Fill out your address below.






Please reset your password to access the new DRJ.com
Reset my password
Welcome aboard, !
You're all set. We've send you an email confirmation to
just to confirm you're you.

Welcome to DRJ

Already registered user? Please login here

Existing Users Log In
   

Create new account
(it's completely free). Subscribe

DR Object Stores Evolve Beyond the “Cold Data Tier”

The cost-efficient scalability of object storage makes it an ideal resource for data protection, especially when delivered through a cloud-based service. However, current approaches to data protection with object storage typically employ the object store as a “cold data tier” that is tightly coupled to the disaster recovery (DR) platform’s datastore in the recovery site. It’s essentially an archive layer for the DR platform, which continues to run with redundant compute, storage and network resources, waiting idly for a failover operation.

A New Role for Object Storage in DR

An alternative approach is to make the object store the primary target for all replicated data. As data is written to primary storage in the protected environment, it is concurrently replicated directly to the object store. There are two critical requirements here: First, data must be continuously intercepted and replicated to the object store from the moment it is written to primary storage in the protected site; and second, data must be presented to the appropriate container in the object store directly, as a continuous stream of objects rather than blocks or files. In this manner, the object store can serve as the exclusive data repository for a cloud-based continuous data protection (CDP) platform.

In response to an actual disaster recovery incident, the objects in the object store container must be self-describing so that their “rehydration” — the extraction and deployment of systems and data into a recovery environment — has no dependencies on either the protected site (which may not be accessible at all) nor the destination recovery environment (which may be selected on the fly during the execution of the failover). In other words, everything required to re-create the protected cluster must be accessible from the object store container. This includes virtual machines (VMs) along with their operating systems, applications and data, but it also includes configuration information, which may include network configurations, virtual resource allocations, access permissions, and so on.

Integrated and Decoupled

When a DR software in the protected environment has the ability to connect directly to a remote cloud object store — and stream data to it in compressed objects — taking full advantage of the object store’s security, scalability and multi-tenancy features, we say that the DR software is integrated with the object store. When the object store does not need to receive data through intermediary DR software in its own environment, and the protected systems, data and configuration information can be extracted wholly from the object store for deployment into any recovery environment with no additional metadata required, we say that the DR software is decoupled from the object store. Hence, we have the concept of a relationship between the DR software and the object store in which the two are simultaneously integrated and yet decoupled.

In an integrated/decoupled DR service offering based on an object storage platform, the service provider has the advantage of maintaining recovery copies of their customers’ systems and data in a secure, highly scalable, multi-tenant and cost-efficient object store and rehydrating protected systems and data into a recovery runtime environment “on demand” — that is, at the time of a failover operation. Importantly, the object store and recovery runtime environment can be located in physically separate locations — and even maintained by different service providers. This is a true multi-cloud model for DR services, capable of supporting a variety of deployment models.

Garbage Collection in the Object Store

Two key requirements of this operational model are garbage collection and protection domain management. Let’s look at garbage collection first. In the primary storage environment, data is constantly being overwritten and deleted. But in the object store, data is just continuously appended in new objects. Without some means of deleting objects containing obsolete or invalidated data, the object store containers would grow infinitely. However, our model calls for no DR-specific intelligence coupled to the object store. So how to manage the growth of the data in the object store? The solution is to monitor the data that is overwritten and/or deleted in the primary storage environment and request deletion of their corresponding objects. In cases where an object’s data is “mostly” obsolete, the still-valid data may be written into a new object that is sent to the object store before the “stale” object is deleted. In this manner, garbage collection in the object store may be executed from the DR software running in the protected site.

Protection Domains and Recovery

A protection domain (or just “domain”) is a set of VMs that are protected together. The VMs that share a domain typically have the same level of criticality, they may be connected (e.g., vApps), and they share a common datastore. Additionally, all VMs and data in a protection domain are replicated to a single, dedicated container within the object store. In the event of a disaster event, the protection domain is the unit of failover granularity. VMs in the same domain will all fail over together.

A key objective for the DR solution is that following a failover, the continuous protection of the VMs must remain uninterrupted, regardless of the condition of the protected site, including a “whole site failure,” in which the protected site cannot be reached in any way. This introduces several requirements. First, the DR software should also be available in the recovery environment and should be able to continue replication into the object store. Second, the protection domain should obtain the information necessary to locate and authenticate access to its container from special “domain information objects” that are obtained from the container along with the objects containing the VMs and their data, etc. Finally, following failover, the recovery site should take over exclusive ownership of the protection domain and its access to the container.

Domain Ownership

Changing domain ownership is straightforward if the protected site fails completely and the rehydration into the recovery environment is not interrupted. However, conflicts can occur in which the ownership of the protection domain is contested. For example, if the failure is partial, the domain in the protected site may continue to try to update the container. Also, if the protected site is recovered quickly before the failover has completed, the protected site may attempt to reclaim ownership as it’s being requested from the recovery site. The domain should be owned by only one site at any time, but in the decoupled model, sites do not communicate with each other, only the container. Therefore, domain ownership and status are included in the domain information objects in the container. When ownership or status change, the object is updated appropriately. When a change in ownership is requested, the request is granted or denied based on the metadata obtained from domain information objects.

Conclusion

Object storage has proved to be a secure, scalable and cost-effective resource for data protection based on traditional backups. With advances in DR software, it can also bring these advantages to solutions for continuous data protection and disaster recovery. By applying an approach in which the DR software is integrated with yet decoupled from the object store, service providers have increased options for providing DR services for their customers.

ABOUT THE AUTHOR

Serge Shats

Serge Shats, Ph.D., is co-founder and CTO of JetStream Software. He has more than 25 years’ experience in system software development, storage virtualization and data protection. Previously co-founder and CTO of FlashSoft, acquired by SanDisk in 2012, Shats has served as a chief architect at Veritas, Virsto and Quantum. He earned his Ph.D. in computer science at the Russian Academy of Science in Moscow, Russia. For more information, please visit www.jetstreamsoft.com/, and follow the company at www.linkedin.com/company/jetstream-software-inc/ and @JetStreamSoft.

10 Key Elements of a Disaster Recovery Plan Companies Often Overlook

Technology can sometimes give organizations a false sense of security. If you have the technology and automation in place, you might believe you can simply press a button and recover if you experience a disaster, right?

READ MORE
Backup Software’s New Countermeasures to Detect, Protect, and Recover from Ransomware
Scarcely a month goes by without some organization publicly reporting its need to recover from a ransomware attack. This continuing...
READ MORE
DRJ Mentor Program Kicks Off at DRJ Fall 2019
While many have spent the summer enjoying much-needed downtime, we’ve been hard at work at DRJ. Right now, we’re finishing up plans for our most amazing conference yet. 

DRJ Fall 2019 will be held at the award-winning and captivating JW Marriott Desert Ridge Resort & Spa in Phoenix. The resort, complete with breathtaking mountain views, beautifully maintained grounds, excellent dining options, and superb service, is the perfect place for our conference.

We’re working hard now to plan DRJ Fall 2019 to include more than 65 sessions, 85 speakers, 80 exhibitors, and more spread over four fabulous days, from Sept. 29-Oct. 2.

READ MORE
Business Continuity and Disaster Recovery Strategy in the Age of IoT
As a technology professional, I like to tinker in home automation and smart devices. I have two WiFi networks at...
READ MORE