Data disasters come in a variety of forms – from natural events like fires or floods that damage your data centre to system failures that disrupt operations. Disaster Recovery (DR) is all about preparing for and effectively recovering from disasters.
Relying on traditional IT infrastructures means procuring, duplicating, and managing that hardware and software to make sure they have enough additional capacity if something goes wrong. The AWS cloud gives you maximum flexibility to enable faster recovery and optimize resources during a DR event. With AWS, you can scale up or down on AWS’ fast, reliable, and secure infrastructure on an as-needed basis – ensuring continuity before, during, and after a disaster.
1. DO Choose the Right Tools and Techniques to Streamline Disaster Recovery
When deciding what AWS services and components to use, keep in mind two things:
- Recovery Time Objective (RTO): This is the time between when services are lost and when they are restored. If, for example, a disaster happens at 12pm (noon) and your RTO is 8 hours, the DR process should restore service to acceptable levels by 8pm.
- Recovery Point Objective (RPO): This measures in time an acceptable amount of data loss. If your RPO is 1 hour and a disaster occurs at 12pm, the DR should recover all data in the system prior to 11am.
Remember, your RTO and RPO should reflect the needs of your organization. If your database is constantly fluctuating – like an e-commerce store – you’ll likely need a low RPO. If your data is relatively stationary, however, you can go with more intermittent reinforcement.
It’s important to create modular application recovery groups or stacks. All applications are not created equal; best practices for disaster recovery imply that you define and implement recovery groups or application stacks so that you can sequence the recovery of business critical applications prior to recovering less important application services.
AWS offers services to streamline deployments and DR processes including:
- AWS Import/Export uses portable storage devices to move data into and out of AWS, accelerating the movement of large data loads. AWS Import/Export uses Amazon’s high-speed internal network to bypass the internet and transfer data directly onto/off of storage devices.
- Amazon Elastic Cloud Compute (Amazon EC2) gives you elastic capacity in the AWS cloud. You can create EC2 instances within minutes and retain complete control before, during, and after a DR event.
- Amazon Simple Storage Service (Amazon S3) is engineered for primary and mission critical data storage. Objects are stored on multiple devices across a number of facilities, providing 99.999999999% durability. AWS ensures further protection through Multi-Factor Authentication (MFA), Identity and Access Management (IAM), versioning, and bucket policies.
- Amazon Elastic Block Store (Amazon EBS) lets you generate point-in-time volume data snapshots. Snapshots will be stored in Amazon S3, ensuring long-term protection for your data. EBS volumes give you off-instance storage that will persist independently and replicate across multiple servers, preventing data loss caused the failure of a single component.
- Amazon Relational Database Service (RDS) makes it easy to set up, operate, and scale a relational database in the AWS cloud. It provides cost-efficient and resizable capacity while managing time-consuming database administration tasks, freeing you up to focus on your applications and business. Amazon RDS can be used as a replication target for your business critical databases, providing cost-effective assurance for your critical business data.
- Amazon Direct Connect helps you establish a dedicated connection from your on-premises data center to the AWS cloud. This can increase bandwidth throughput, provide a consistent network experience, and drive down costs.
2. DO Back Up Everything Before a Disaster Occurs
Data isn’t the only thing you need to worry about recovering after a disaster. You also need to be able to restore application settings, infrastructure modules, and more.
Amazon provides services that enable easy, reliable backup including:
- Amazon S3 lets you transfer data to and from Amazon S3 through the network and is available from any location. It’s especially good for storing primary data.
- Amazon Glacier gives you cost-effective storage options for data backup and archiving. Glacier optimizes objects (or archives) for infrequent access and is as durable as Amazon S3.
- AWS Storage Gateway connects on-premises software appliances with cloud-based storage, providing secure and seamless integration. AWS Storage Gateway can support gateway-cached volumes (providing cost savings and low-latency access to data you access often), gateway stored volumes (offering durable, inexpensive off-site backups that can be recovered locally or from Amazon EC2), and gateway virtual tape libraries (which gives you almost limitless access to virtual tapes).
- Amazon Virtual Private Cloud (Amazon VPC) lets you set up an isolated section of the AWS cloud, creating a VPN connection between your VPC and your data center. This can help you recover enterprise applications, typically housed on your internal network.
3. DO Test Your Data Recovery and Disaster Response Frequently
After you’ve set up your DR solution, you need to test it. One of the advantages of deploying on AWS is that you can test your DR response as frequently as you need. Use AWS CloudFormation to deploy full AWS environments using a template, which describes the resources, dependencies, and parameters required to create a complete environment.
Differentiating tests is crucial to ensure you’re covered against a range of disasters including:
- Power cuts to a set of servers or sites.
- Losing ISP connectivity to single or multiple sites.
- Viruses impacting core services or affecting multi-sites.
- User errors causing data loss and requiring point-in-time recovery.
4. DO Set Up Disaster Monitors and Alerts
If your DR environment is affected by a server failure or application issue, you need to know right away. Amazon CloudWatch gives you access to pre-provisioned and custom metrics about AWS resources. You can set up Amazon Simple Notification Service (SNS) to alert you if unusual behaviour is detected and continue to use any existing alerting and monitoring tools.
5. DO Automate Disaster Recovery
Automating the deployment of services and applications onto AWS lets you manage changes and interruptions with ease:
- AWS CloudFormation works with other tools to automatically provision services as necessary. Use AWS OpsWorks or AWS Elastic Beanstalk to achieve even greater levels of abstraction, automating instances as much as you can.
- Use Autoscaling to guarantee your pool of instances will be appropriately sized to meet demand. During a DR event, the solution will dynamically scale up to meet increased demand and scale back down as usage decreases.
6. DON’T Wait for Disaster to Happen
The worst possible time to find out what vulnerabilities your system has is during or after a disaster has already happened. Not being able to restore your systems quickly and easily can stall operations and negatively impact your bottom line.
To make sure you’re prepared, work with AWS experts to architect, implement, and manage disaster recovery solutions. Investing early in tools and technologies will protect your company’s assets if something goes wrong.
- Workshop Summary: Fast Track DevOps Adoption by Leveraging a Cloud-Enabled Toolchain - April 27, 2018
- Must-Read Best Practices for Migrating to Amazon Aurora - August 23, 2016
- 6 Dos and Don’ts of Disaster Recovery in AWS - July 29, 2016