What is the AWS Well-Architected Framework?

A management prof I knew had a habit of using a semantic trick to extract a definition from any phrase — flip the words around and hope for a eureka moment. Let’s see if it works out.

The Well Architected Framework is a framework to architect… well.

That was the first, and last, management class I took.

To get a bit more in depth, let’s explore what the Well-Architected Framework isn’t. It’s not a step by step guide to becoming an AWS infrastructure guru. It’s also not a guide for cloud developers to know which cloud services they should be using. You won’t get implementation details or architectural patterns.

What you will get is a set of questions and practices organized in five separate pillars. They’re meant to be kept in mind while developing and architecting for AWS. They’re meant to be a benchmark against which you can evaluate your infrastructure. It is the AWS prescribed way of making sure your cloud is compliant with best practices.

Let’s go over the pillars, and preview a few of the practices found within each.

Security

The Security pillar encompasses the ability to protect information, systems, and assets while delivering business value through risk assessments and mitigation strategies.

  • AWS Security Pillar White Paper, May 2017

Pretty self-explanatory. You don’t want unauthorized users gaining access to your infrastructure. You don’t want to unwittingly leak thousands (millions?) of users’ data. You don’t want your company on the news and have to lobby Congress to pass a law to prevent consumers from suing you.

Example best practice:

Identity and Access Management (IAM) — Do use an assumed identity, don’t use the root account for daily operations. If you really need to use the root account, use Multi Factor Authentication always.

Simple Storage Service (S3) — Do restrict access to your buckets to only the resources that need it. Don’t enable unrestricted access to the public or any logged in AWS user.

Reliability

The Reliability pillar encompasses the ability of a system to recover from infrastructure or service disruptions, dynamically acquire computing resources to meet demand, and mitigate disruptions such as misconfigurations or transient network issues.

  • AWS Reliability Pillar White Paper, November 2016

AWS is pretty reliable. 99.9%+ uptime is promised and delivered. However, for mission-critical resources, even a few hours of downtime a year can be devastating. Following the rules of this pillar will have you design your infrastructure to limit the impact “Act of God” failures can have on your workloads and users, and make sure your apps never go down.

Example best practice:

Relational Database Service (RDS) — Do set up automated backup of your RDS instances. Don’t let admin errors or catastrophes affect your data availability.

Elastic Load Balancer (ELB) — Do associate a minimum of two EC2 instances per ELB. Don’t let downtime on an EC2 instance affect user experience.

Elastic Cloud Compute (EC2) — Do spread out your EC2 instances across all Availability Zones. Don’t put all your eggs in one basket, on the off chance AWS drops it. 🐣

Performance Efficiency

The Performance Efficiency pillar focuses on the efficient use of computing resources to meet requirements, and maintaining that efficiency as demand changes and technologies evolve.

  • AWS Performance Efficiency Pillar White Paper, November 2016

The AWS platform changes extremely frequently, and at an ever increasing rate. (see graph) This means the services you know and love are constantly improving. An infrastructure created in 2011 will not be as efficient as one made in 2017, even if you were following best practices at the time. This pillar is meant for you to make sure that you are delivering a consistently speedy experience to your users, by upgrading your resources to the most up to date versions released by AWS, and following performance best practices.

![graphic image](https://lh5.googleusercontent.com/hxod4bLj9PatZoDGDmPcCAurcc1tpdeR7TiYgQ5xd4sYbIdjQLmkTV2Vl7zn267_BJGjRjXHhns3WmCsGmiGpxeT3KLR8Rty1M7JCJVJVUou1HgiV15fv_1DyrV7aVTmuztMP0RS “Graphic borrowed from: @acloudguru AWS Developer Certification Course” =602x339)

Example best practice:

Elastic Cloud Compute (EC2) — Do make sure to upgrade EC2 instances that are overutilized. Don’t let CPU utilization average above 90%.

Elastic Cloud Compute (EC2) — Do ensure that all of your servers are running the latest version of EC2 instances. Don’t run servers using legacy products.

Cost Optimisation

The Cost Optimization pillar is used to assess your ability to avoid or eliminate unneeded costs or suboptimal resources, and use those savings on differentiated benefits for your business.

  • AWS Cost Optimization Pillar White Paper, November 2016

We consistently see AWS infrastructures running unused or underused resources. This can lead to wasted spending in the 50+% range. You’d be surprised by how common it is for AWS developers to spin up a resource for testing purposes, and then forget about it for months while it accumulates hundreds in useless spend. By following the principles and practices of this pillar, you’ll reduce your monthly spend and clean up unnecessary resources found within your infrastructure.

Example best practice:

Elastic Block Store (EBS) — Do ensure you’re running only EBS instances that you’re actually using. Don’t let idle EBS volumes balloon your costs unnecessarily.

Elastic Cloud Compute (EC2) — Do downsize underutilized EC2 instances to your usage level. Don’t needlessly pay for compute capacity that you’re not using.

Operational Excellence

The Operational Excellence pillar includes operational practices and procedures used to manage production workloads. This includes how planned changes are executed, as well as responses to unexpected operational events. Change execution and responses should be automated. All processes and procedures of operational excellence should be documented, tested, and regularly reviewed.

  • AWS Well-Architected Framework White Paper, November 2016

Large enterprises aren’t managed like startups, and large enterprise infrastructures shouldn’t be managed like startup infrastructure. Process should be documented and rigorously applied wherever necessary. Resource provisioning should be automated, developer activity should be monitored and logged for auditing purposes, and infrastructure design should be dummy-proof. If you’re relying on staff to manage your deployments to production environments, you’re inevitably going to experience issues.

Example best practice:

AWS Certificate Manager (ACM) — Do delete expired ACM certificates. Don’t risk deploying expired certificates to resources that will complain and cause your app to crash.

CloudFormation — Do turn on and use CloudFormation scripts to automatically manage your infrastructure. Don’t waste time manually configuring your infrastructure in the AWS console.

Implementing all of the AWS Well-Architected Framework best practices in your infrastructure is a monumental task. There will always be too many improvements for you and your team to do. You will never have enough resources on hand to follow the Well-Architected Framework to the letter. You will need to prioritize the remediation of your configuration failures, starting with the services that are most critical to your business. Focus on one pillar at a time, one service at a time. This will make infrastructure compliance an achievable goal for a team of any size. In many cases, we have seen that S3 is the critical service that organisations are failing to secure. Stay tuned for the next edition of this series, where you’ll learn how each pillar of the Well-Architected Framework applies to S3

Cloud Conformity can help you to implement the AWS Well-Architected Framework, through its easy and simple native integration with your accounts at read-only access. Cloud Conformity runs over 500 checks against your environments and shows, in risk severity order, the priority of those that have failed. Users are able to filter these checks further according to service so that it’s easier to get compliant.

Understanding that each business has its own unique processes and requirements, Cloud Conformity also offers high customization on all of its rule. Users have full control when setting the severity levels if they differ to the best practice norm, and the communication alerts related to each rule.

Try out this continuous assurance tool with its roots in the AWS Well-Architected Framework, using the free 14-day trial giving full access to all features including API, auto-remediation and real-time monitoring.