Production Readiness in the AWS Cloud
As more customers start to understand the potential for quickly and cost effectively developing applications for the AWS public cloud, there is an increasing demand for best practices associated with production readiness i.e. getting your application ready to be deployed in AWS. In this blog post, I will outline some of the key considerations for getting your application production ready along with some links to relevant components of the AWS platform.
It starts with the VPC
Effective production deployment on AWS means gaining a measure control over your network design. The Amazon Virtual Private Cloud or VPC, is a private, logically isolated section of the AWS cloud which features a virtual network topology you can deploy and customize. This gives you complete control of your networking architecture, network security controls and data security / privacy.
Amazon VPC essentially gives customers a private data center that you can build out and control on AWS. Some customers create and operate multiple VPCs for a variety of business and technical reasons or opt for a single VPC for all of their applications; there are a number of considerations for either approach, many of them rooted in requirements for security boundaries or administrative controls.
Scaling and Availability
The classic principles of scaling up and out still apply on AWS but how you do it is completely different. For example, a relational database is usually a focal point for scaling up – sharding techniques aside – as depending on requirements for transactional performance or data capacity, once a production workload starts to test the boundaries of system design, you may need to change the underlying system type.
In the traditional on premise world, this might entail a time consuming migration to newer or more expensive hardware; in AWS, changing the EC2 instance type or the type of EBS storage is just a console command or an API call away. This means that system architects can stop guessing about capacity and instead focus on system reliability and availability and / or consider some of the very capable NoSQL options on AWS including MongoDB or DynamoDB.
High Availability on the other hand, implies planning for failure in the cloud. Compared to their on premise equivalents, certain AWS platform components are more susceptible to failure. For example EC2 instances can fail, network connections can get congested, Availability Zones can go down, necessitating the use of failover / resiliency techniques across multiple AZs and some cases, across multiple Regions. For EC2 instances, you might to consider a feature such as Auto Recovery, for RDS databases Multi-AZ deployment and for front end web servers, ELB / Autoscaling should be considered.
Scripting and Automation
Moving applications to AWS implies that you will be able to simplify your operations and decrease the costs of deploying and maintaining your application fleet. Nothing defeats this concept more than not taking advantage of the programmable nature of AWS. Just like robots dragged car manufacturing out of the dark ages of hand built cars, scripting and automation has the potential to:
- Make your IT operations more efficient
- Reduce human errors
- Experiment without incurring significant business or technical risks
However significant these advances may be, the programmable aspect of AWS is not a panacea. It still takes knowledge of “how to fly the plane” i.e. deep and broad hands on experience to enable continuous advances. Typically improvements are informed by meticulous root cause analysis when outages or significant failures occur.
Security is a shared responsibility between the cloud provider and the customer (a National Institute for Standards & Technology (NIST) study titled NIST Cloud Computing Reference Architecture provides a high level overview of the shared responsibility approach, see Section 2.7 of the document at the following link: http://www.nist.gov/customcf/get_pdf.cfm?pub_id=909505).
Prime customer considerations for AWS cloud security include multi-tenancy, API security, business continuity, reliability / denial of service and encryption; ultimately it comes down to a level of trust which gets established initially by reputation and due diligence. While some trust can be mitigated through the use of technology (e.g. encryption) it is critical to understand and document the division of security responsibilities.
Increasingly, customers are examining various Security As A Service offerings to help mitigate risk and decrease the costs of maintaining and managing security operations and infrastructure.
Backup and Recovery
Traditional enterprise backup and recovery strategies typically take an agent-based approach whereby the entire contents of a server are backed up over either the local area network (LAN) or the storage area network (SAN). Traditional architectures have required this approach because replacing failed components is complex, time consuming, and operationally intensive. This has, in turn, created a backup environment that is complex to manage and resource intensive to operate—requiring technologies such as data de-duplication and virtual tape libraries to cope with ever-increasing workloads.
The AWS platform enables a far more lightweight approach to backup and recovery due, in part, to the following characteristics:
- Computers are now virtual abstract resources instantiated via code rather than being hardware-based.
- Capacity is available at incremental cost rather than up-front cost.
- Resource provisioning takes place in minutes, lending itself to real-time configuration.
- Server “images” are available on-demand, can be maintained by an organization, and can be activated immediately.
These characteristics offer you opportunities to recover deleted or corrupted data with less infrastructure overhead.
The Amazon Elastic Compute Cloud (Amazon EC2) service enables the backup and recovery of a standard server, such as a web server or application server, so that you can focus on protecting configuration and stateful data—rather than the server itself. This set of data is much smaller than the aggregate set of server data, which typically includes various application files, operating system files, temporary files, and so on. This change of approach means that regular nightly incremental or weekly full backups can take far less time and consume less storage space.
Traditional Backup Approach
- Amazon EC2 Backup Approach
When a compute instance is started in Amazon EC2, it is based upon an Amazon Machine Image (AMI) and can also connect to existing storage volumes—for example, Amazon Elastic Block Store (Amazon EBS). In addition, when launching a new instance, it is possible to pass “user data” to the instance that can be accessed internally as dynamic configuration parameters, a process is known as boot strapping.
The new capabilities and rapid innovation in AWS allows IT architects to fundamentally alter how they view Disaster Recovery for mission critical workloads. AWS allows you to alter many of the underlying fundamentals of Disaster Recovery:
- Unprecedented capabilities to implement DR sites
- Easily setup DR sites on different geographic regions
- Cut down DR site cost by up to 70%
- Substantial savings on software licenses
There are three major approaches to implementing DRaaS on AWS:
|Approach / Method||RPO / RTO||Cost|
|Backup / Recovery||RPO = 30 minutes, RTO = 1 hour||Low|
|Pilot Light||RPO = < 5 minutes, RTO = 15 minutes||Moderate|
|Warm Standby||RPO = < 5 minutes, RTO = 5 minutes||Higher|
iTMethods helps customers understand which DR technique is appropriate depending on the business requirements from a business continuity perspective.
Latest posts by Taylor Graham (see all)
- iTMethods Renews Status as Amazon Web Services (AWS) Managed Service Provider - May 25, 2017
- News Release: iTMethods Recognized as a Member of AWS Service Delivery Program for Aurora and Database Migration Service - November 29, 2016
- iTMethods Recognized as an Inaugural Member of Amazon Web Services’ (AWS’) Public Sector Partner Program - November 29, 2016