In today’s cloud era, the ability to bounce back after downtime can make or break your business. Disaster recovery (DR) capabilities should therefore be a key consideration when choosing a cloud platform. Leveraging the cloud as a secondary data center for DR is often the first step in cloud adoption, and disaster recovery as a service (DRaaS) offerings from various cloud service providers underline this fact.
Azure packs a punch with multiple DR options for services like VMs, storage, databases, and containers. In this blog post, I’ll explore these options and discuss how you can develop a robust business continuity and disaster recovery (BCDR) strategy for your workloads hosted in Azure.
What to Consider When Creating Your DR Plan in Azure
Contrary to popular belief, applications hosted in the cloud are not foolproof—failures happen. Since application downtime can be disastrous for your business, you need a well-defined DR strategy to be prepared to handle failures. This strategy should cover the entire application stack, not just the services you think are important.
You might need to manually trigger the DR process yourself in order to differentiate between transient failures and actual downtimes. However, the failover process in Azure should be automated as much as possible. Configure alerts so you can stay informed about failures and take necessary actions to trigger your DR plan.
With Azure, you can choose to deploy application components across Azure regions to protect from regional failures. If applications are regional, you can deploy them in availability zones (physically separated zones within a region) to protect from data center failures. Your choice will depend on the type of resiliency you want to deliver for your application. In addition to a DR strategy that protects from cateroscopic failures, you should have a backup strategy for preventing unavailability due to data corruption or application configuration.
Your DR strategy should also clearly define the DR process, which activities will be completed when the plan is triggered, and who will be responsible for executing the plan. However, a detailed DR strategy won’t really help unless you test and fine-tune it regularly. This is where services that offer non-disruptive DR testing, such as Azure’s DR solution, come into play. Similarly, executing a regular test restore of backups in a test environment will help avoid surprises during an eventuality.
Disaster Recovery in Azure: What Are Your Options?
Before developing a DR strategy, you should be clear about the recovery point objective (RPO) and recovery time objective (RTO) for your workloads. For example, if a bit of downtime is okay with you (i.e., non-prod and test environments), a complete redeployment of applications is a good choice. You can also choose to adopt an active/passive or warm-spare approach, where a scaled-down secondary service is ready to take over in the event of a failure. It’s most effective to use an active/active or hot-spare architecture, where instances of the application are available in multiple regions in order to accept production traffic.
Azure offers native capabilities built into most of its services, which can be leveraged to develop a well-rounded DR strategy. Note that it’s important to start from the ground up (i.e., covering infrastructure, if applicable, as well as data and application layers) in order to develop a comprehensive solution. Below I’ll explore the DR options for common Azure services.
Azure Site Recovery, Azure’s DRaaS offering, helps protect your VMs from outages by continuously replicating them to a different paired region. In the event of a disaster, the VMs can be failed over to the secondary region, and you can enable access from there. You can also fail back to the primary region once the outage is over. Organizations often use Azure Site Recovery to leverage Azure as their DR site, as it supports replicating VMs in VMware/Hyper-V or in physical machines to Azure.
Azure Backup is another solution you can include in the BCDR strategy for your VMs. You can use this cloud-based backup service to take point-in-time copies of data in the VMs. The backup copies can then be restored to bring your application back online in the event of data loss or corruption. For the highest level of availability and resiliency from failure, use a multi-region architecture, in which both primary and secondary regions are factored into the design.
An Azure Storage account can be deployed as geo-redundant, allowing data in the storage account to be replicated to the secondary region asynchronously. In case of an outage that renders the primary end point unavailable, you can initiate an account failover for Azure Storage. The failover process will cause the secondary endpoint to become the primary one so that applications can continue to use the storage.
For Azure Blob, you can use snapshots to create read-only point-in-time copies of the data. Azure files can be protected through a scheduled Azure backup. You can also use the snapshot feature to create point-in-time copies of the data, similar to Azure Blob. If your application is utilizing Azure Table storage, use the AzCopy tool to copy the data to a different storage account in another Azure region for DR purposes.
Your DR strategy for databases will depend on whether you are using IaaS or PaaS as the deployment approach. For SQL Server and SAP HANA databases hosted in VMs, you can use the integrated Azure Backup feature to discover and configure regular backup without deploying any additional infrastructure.
There are also managed databases like Azure SQL, MySQL, PostgreSQL, and Cosmos DB, delivered as PaaS services. For those databases, Azure offers an automated backup service that takes regular snapshot-based backups of the database to a separate storage account. If you need the backups to be retained for a longer period of time, Azure SQL offers a long-term backup retention feature that allows you to store your backup copies in a storage account for up to 10 years.
Azure provides a robust ecosystem of services to support container-based workloads, including:
- Azure Kubernetes Service (AKS)
- Azure Container Instances (ACI)
- Azure App Service
- Azure Container Registry (ACR)
AKS uses VM scale sets that can protect your workloads from node failures. However, to protect from regional outages, you should consider multi-region deployments that leverage Azure Traffic Manager to route traffic to available regions.
It’s also important to segregate the process of recovering your application and data. You can leverage Azure Storage solutions like disks and file shares to create persistent volumes for applications hosted in containers, then protect that data using Azure Backup. ACR’s geo-replication feature allows you to access your container images from a secondary region, should the primary endpoint go down due to a regional outage.
In addition, you should have a well-defined DevOps process for redeploying infrastructure to a different region through IAC, and for redeploying applications through a CI/CD process, should there be a downtime due to a cloud outage.
Azure App Service
For Azure App Service, multi-region deployment is the best way to minimize application downtime. You can also leverage the backup and restore feature of Azure App Service, which automatically creates a backup of your application configuration, file content, and databases connected to the app. In case of regional outages, applications hosted in Azure App Service will be placed in DR mode. In this mode, you can restore your app contents to a destination app in a different Azure region.
With a mature DevOps practice in place, you can also restore the application by redeploying the code targeting the new destination app. For serverless apps like Azure Functions and microservices-based deployments, it’s best to separate the configuration from the code in cloud-scale deployments. You can use Azure App Configuration to store configuration information that can be accessed during runtime. This approach also helps fast-track the redeployment process of applications during a disaster.
The modern cloud-scale applications deployed in Azure offer multiple options for DR. Be it end-to-end replication using Azure Site Recovery for VMs, leveraging CI/CD pipelines for redeployment, or the more traditional backup/restore approach for services like Azure apps, databases, and containers, the best solution for you will depend on your RPO and RTO. In most cases, you can create an effective solution using Azure-native tools and services and by integrating elements of DR into your application architecture.