AWS Summit 2014
Architecting Highly Available Applications on AWS
Alex Sinner Solutions Architect @alexsinner
Architecting Highly Available Applications on AWS
• ME: Alex Sinner – AWS Solutions Architect • YOU: Here to learn more about running highly
available, scalable Applications on AWS • TODAY: about best practices and things to think
about when building for large scale
Going from 1 User to >10 Millions
So how do we scale?
Hi, I have NO IDEA what I am doing!!
a lot of things to read
not where we want to start
a lot of things to read
Auto Scaling is a tool. It’s not the single thing that
fixes everything.
What do we need first?
Some basics…
Regions US-WEST (Oregon)
EU-WEST (Ireland) ASIA PAC (Tokyo)
US-WEST (N. California)
SOUTH AMERICA (Sao Paulo)
US-EAST (Virginia)
AWS GovCloud (US)
ASIA PAC (Sydney)
ASIA PAC (Singapore)
CHINA (Beijing)
Availability Zones US-WEST (Oregon)
EU-WEST (Ireland) ASIA PAC (Tokyo)
US-WEST (N. California)
SOUTH AMERICA (Sao Paulo)
US-EAST (Virginia)
AWS GovCloud (US)
ASIA PAC (Sydney)
ASIA PAC (Singapore)
CHINA (Beijing)
Compute Storage & Content Delivery
AWS Global Infrastructure
Database
App Services
Deployment & Administra=on
Networking
Service Reference Model
Compute Storage & Content Delivery
AWS Global Infrastructure
Database
App Services
Deployment & Administra=on
Networking
Amazon CloudSearch
Amazon SQS
Amazon SNS
Amazon Elastic
Transcoder
Amazon SWF Amazon SES
Amazon DynamoDB
Amazon RDS
Amazon ElastiCache
Amazon RedShift
AWS Storage Gateway
Amazon S3
Amazon Glacier
Amazon CloudFront
Amazon CloudWatch AWS IAM AWS
CloudFormation Amazon Elastic
Beanstalk AWS Data
Pipeline
AWS OpsWorks
AWS CloudTrail
Amazon EC2
Amazon EMR
Amazon VPC
Amazon Route 53
AWS Direct
Connect
Amazon Kinesis
So let’s start from day one, user one ( you )
Day One, User One
• A single EC2 Instance – With full stack on this host
• Web app • Database • Management • Etc.
• A single Elastic IP • Route53 for DNS
EC2 Instance
Elastic IP
Amazon Route 53
User
“We’re gonna need a bigger box”
• Simplest approach • Can now leverage PIOPs • High I/O instances • High memory instances • High CPU instances • High storage instances • Easy to change instance sizes • Will hit an endpoint eventually
i2.4xlarge
m3.xlarge
m1.small
“We’re gonna need a bigger box”
• Simplest approach • Can now leverage PIOPs • High I/O instances • High memory instances • High CPU instances • High storage instances • Easy to change instance sizes • Will hit an endpoint eventually
i2.4xlarge
m3.xlarge
m1.small
Day One, User One: • We could potentially get
to a few hundred to a few thousand depending on application complexity and traffic
• No failover • No redundancy • Too many eggs in one
basket
EC2 instance
Elastic IP address
Amazon Route 53
User
Day One, User One: • We could potentially get
to a few hundred to a few thousand depending on application complexity and traffic
• No failover • No redundancy • Too many eggs in one
basket
EC2 instance
Elastic IP address
Amazon Route 53
User
Day Two, User >1: First, let’s separate out our single host into more than one: • Web • Database
– Make use of a database service?
Web instance
Database instance
Elastic IP address
Amazon Route 53
User
Self-Managed Fully-Managed
Database server on Amazon EC2
Your choice of
database running on Amazon EC2
Bring Your Own License (BYOL)
Amazon DynamoDB
Managed NoSQL database service
using SSD storage
Seamless scalability Zero administration
Amazon RDS
Microsoft SQL, Oracle, MySQL or PostgreSQL as a managed service
Flexible licensing BYOL or License
Included
Amazon Redshift
Massively parallel,
petabyte-scale, data warehouse service
Fast, powerful and
easy to scale
Database Options
But how do I choose what DB technology I need? SQL? NoSQL?
Some people won’t like this. But…
Start with SQL databases
Why start with SQL? • Established and well-worn technology • Lots of existing code, communities, books, background,
tools, etc. • You aren’t going to break SQL DBs in your first 10 million
users. No really, you won’t*. • Clear patterns to scalability * Unless you are manipulating data at MASSIVE scale; even then, SQL will have a place in your stack
AH HA! You said “massive amounts”, I will have massive amounts!
If your usage is such that you will be generating several TB of data in the
first year OR have an incredibly data-intensive workload… you might
need NoSQL
Regardless, why NoSQL? • Super low latency applications • Metadata driven datasets • Highly non-relational data • Need schema-less data constructs* • Massive amounts of data (again, in the TB range) • Rapid ingest of data ( thousands of records/sec ) • Already have skilled staff *Need != “it is easier to do dev without schemas”
But back to the main path… Let’s see how far SQL at the core
can grow
User >100 First let’s separate out our single host into more than one • Web • Database
– Use RDS to make your life easier
Web Instance
Elastic IP
RDS DB Instance
Amazon Route 53
User
User > 1000 Next let’s address our lack of failover and redundancy issues • Elastic Load Balancing • Another web instance
– In another Availability Zone
• Enable Amazon RDS multi-AZ
Web Instance
RDS DB Instance Active (Multi-AZ) Availability Zone Availability Zone
Web Instance
RDS DB Instance Standby (Multi-AZ)
Elastic Load Balancing
Amazon Route 53
User
Scaling this horizontally and vertically
will get us pretty far ( 10s-100s of thousands )
User >10 ks–100 ks
RDS DB Instance Active (Multi-AZ)
Availability Zone Availability Zone
RDS DB Instance Standby (Multi-AZ)
Elastic Load Balancing
RDS DB Instance Read Replica
RDS DB Instance Read Replica
RDS DB Instance Read Replica
RDS DB Instance Read Replica
Web Instance
Web Instance
Web Instance
Web Instance
Web Instance
Web Instance
Web Instance
Web Instance
Amazon Route 53
User
Shift some load around: Let’s lighten the load on our web and database instances: • Move static content from the
web instance to Amazon S3 and CloudFront
• Move dynamic content from the load balancer to CloudFront
• Move session/state and DB caching to ElastiCache or Amazon DynamoDB
Web instance
RDS DB Instance Active (Multi-AZ) Availability Zone
Elastic Load Balancer
Amazon S3
Amazon CloudFront
Amazon Route 53
User
ElastiCache
Amazon DynamoDB
Now let’s revisit the beginning of our talk…
Auto Scaling!
Automatic resizing of compute clusters based on demand
Trigger auto-scaling policy
Feature Details
Control Define minimum and maximum instance pool sizes and when scaling and cool down occurs
Integrated to Amazon CloudWatch
Use metrics gathered by CloudWatch to drive scaling
Instance types Run Auto Scaling for On-‐Demand and Spot Instances; compa=ble with VPC
aws autoscaling create-‐auto-‐scaling-‐group -‐-‐auto-‐scaling-‐group-‐name MyGroup -‐-‐launch-‐configuration-‐name MyConfig -‐-‐min-‐size 4 -‐-‐max-‐size 200 -‐-‐availability-‐zones us-‐west-‐2c
Auto Scaling Amazon
CloudWatch
Auto Scaling can scale from one instance to thousands
and back down
User >500k+:
Availability Zone
Amazon Route 53
User
Amazon S3
Amazon CloudFront
Availability Zone
Elastic Load Balancing
Amazon DynamoDB RDS DB Instance
Read Replica
Web instance
Web instance
Web instance
ElastiCache RDS DB Instance Read Replica
Web instance
Web instance
Web instance
ElastiCache RDS DB Instance Standby (Multi-AZ)
RDS DB Instance Active (Multi-AZ)
ARCHITECTING DATA-DRIVEN MASS PRODUCED VIDEO
AWS SUMMIT 2014 | JUNE 10, 2014
JASPER JAGER SENIOR DEVELOPER AND AWS ARCHITECT REDNUN, AMSTERDAM
THIS IS WHAT WE DO
‣ Automatically mass produce data-driven, personalised or profiled video
‣ ING, KLM, Essent België, T-Mobile
‣ Run everything in AWS
‣ Small campaign, 25.000 personalised videos
‣ Self hosted 3x 8 core Xserves with 96GB RAM
‣ 300 videos an hour
HOW WE STARTED
PROBLEMS WITH THE OLD IN-HOUSE SETUP‣ 250.000 videos would take us 35 days
‣ Or we would have to buy more hardware
‣ Systems which would idle most of the time
‣ Storing and serving all videos - HELP
REBUILD REDNUN IN THE CLOUD‣ Ability to start 100’s of machines, based on preconfigured AMI
‣ High availability for our campaign sites, behind load balancers
‣ Big campaign, big Dutch lottery, 1.200.000
‣ Batched, pre-rendered videos, stored on S3
‣ Took us just a couple of days
SECOND INFRASTRUCTURE
AUTOSCALE EVERYTHING
‣ Automated daily flows, welcome video, birthday video etc.
‣ API, videos can be produced on the fly
‣ Autoscaling based on Cloudwatch metrics for web and app servers
‣ Custom autoscaling scripts for video rendering
‣ Use spot instances when available
LOOSE COUPLING
‣ Decoupled components
‣ Use SQS as a buffer
‣ Continuous monitoring and adjusting
AUTOMATE EVERYTHING
‣ Cloudformation and Opsworks
‣ Flexibility to start environment in different region
‣ Dev and QA environments
CURRENT INFRASTRUCTURE
THE THINGS WE’VE LEARNED‣ AWS service limits
‣ Autoscale on Cloudwatch or custom metrics
‣ Automate your infrastructure
‣ AWS can help you scale with ease
[email protected] WWW.REDNUN.NL
On Tools: Managing your infrastructure will become an ever increasing important part of your time. Use tools to automate repetitive tasks. • Tools to manage AWS resources – AWS CloudFormation • Tools to manage software and configuration on your
instances – AWS OpsWorks • Automated data analysis of logs and user actions
User >500k+: You’ll potentially start to run into issues with speed and performance of your applications: • Have monitoring/metrics/logging in place
– If you can’t build it internally, outsource it! (3rd party SaaS) • Pay attention to what customers are saying works well • Squeeze as much performance as you can out of each
service/component
HOST LEVEL
METRICS
AGGREGATE LEVEL
METRICS
LOG ANALYSIS
EXTERNAL SITE
PERFORMANCE
Not having proper monitoring/metrics is like flying a plane
with an eye mask on in a thunderstorm.
Oh, and your wing is on fire.
AWS Marketplace & Partners Can Help • Customer can find, research,
and buy software
• Simple pricing, aligns with Amazon EC2 usage model
• Launch in minutes
• AWS Marketplace billing integrated into your AWS account
• 1300+ products across 20+ categories
Learn more at: aws.amazon.com/marketplace
There are further improvements to be
made in breaking apart our web/app layer
SOA = Service Oriented Architecture
SOA’ing Move services into their own tiers/modules. Treat each of these as 100% separate pieces of your infrastructure and scale them independently. Amazon.com and AWS do this extensively! It offers flexibility and greater understanding of each component.
Loose coupling sets you free! • The looser they're coupled, the bigger they scale
– Independent components – Design everything as a black box – Decouple interactions – Favor services with built-in redundancy and scalability rather than
building your own
Controller A Controller B
Controller A Controller B
Q Q
Tight coupling
Use Amazon SQS for buffers
Loose coupling
Loose coupling + SOA = winning
Examples: • Email • Queuing • Transcoding • Search • Databases • Monitoring • Metrics • Logging
Amazon CloudSearch
Amazon SQS Amazon SNS
Amazon Elastic Transcoder
Amazon SWF Amazon SES
In the early days, if someone has a service for it already, opt to use that instead of building it yourself. DON’T RE-INVENT THE WHEEL
On re-inventing the wheel… If you find yourself writing
your own: queue, DNS server, database, storage system,
monitoring tool
Take a deep breath and stop it. Now.
Back to SOA
Users > 1 Million
RDS DB Instance Active (Multi-AZ)
Availability Zone
Elastic Load Balancer
RDS DB Instance Read Replica
RDS DB Instance Read Replica
Web Instance
Web Instance
Web Instance
Web Instance
Amazon Route 53
User
Amazon S3
Amazon Cloudfront
Amazon DynamoDB
Amazon SQS
ElastiCache
Worker Instance
Worker Instance
Amazon CloudWatch
Internal App Instance
Internal App Instance
Amazon SES
The next big steps
From 5 to 10 Million Users You may start to run into issues with your database around contention on the write master. How can you solve it?
• Federation - splitting into multiple DBs based on function
• Sharding - splitting one data set up across multiple hosts
• Moving some functionality to other types of DBs (NoSQL)
…and there you have it. 10 Million
A Quick Review
Review • Multi-AZ your infrastructure • Make use of self-scaling services
– Elastic Load Balancing, Amazon S3, Amazon SNS, Amazon SQS, Amazon SWF, Amazon SES, etc.
• Build in redundancy at every level • Most likely start with SQL • Cache data both inside and outside your
infrastructure • Use automation tools in your infrastructure
Review (cont) • Make sure you have good metrics/monitoring/
logging tools in place • Split tiers into individual services (SOA) • Use Auto Scaling when you’re ready for it • Don’t reinvent the wheel • Move to NoSQL when it really makes sense but
do your best not to administer it
Putting all this together means we should now
easily be able to handle 10+ million users!
To infinity…..
Thank You!
AWS EXPERT? GET CERTIFIED! aws.amazon.com/certification
Alex Sinner Solutions Architect @alexsinner
Top Related