Architecting for the cloud cloud providers

55
© Matthew Bass 2013 Architecting for the Cloud Len and Matt Bass Cloud Providers

description

This is a lecture on cloud providers from the course "Architecting for the Cloud"

Transcript of Architecting for the cloud cloud providers

Page 1: Architecting for the cloud cloud providers

© Matthew Bass 2013

Architecting for the Cloud

Len and Matt Bass

Cloud Providers

Page 2: Architecting for the cloud cloud providers

© Matthew Bass 2013

IaaS Providers

• There are several primary providers

– Amazon: Amazon Web Services (AWS)

– Microsoft: Azure

– Google: Google Compute Engine

– …

• Each of these are set up a bit differently with slightly different internal decisions and associated services

Page 3: Architecting for the cloud cloud providers

© Matthew Bass 2013

Goals

• The goals for this talk is not to give you a definitive how to for each provider

• It’s meant to give you just an introduction

• The idea is that you’ll see how the concepts that we talked about in the course map to specific providers

• We’ll look primarily at Amazon (with some details from others thrown in)

• We’ll go through both the overall structure and look at specific services

Page 4: Architecting for the cloud cloud providers

© Matthew Bass 2013

Amazon Elastic Compute Cloud

• Amazon EC2 provides compute capacity in the cloud

• You can select the machine image with a given OS and specified capability

• You can resize the capacity as needed

• Takes minutes to spin up a new VM

• You can specify multiple instances and select where they will run – Region & availability zones

• You pay per usage/hour depending on the capability of the instance and if it’s a reserved instance (dedicated)

Page 5: Architecting for the cloud cloud providers

© Matthew Bass 2013

Regions • Amazon has divided their cloud offerings into multiple regions. Each region

should be thought of as a separate cloud – I.e. there is no automatic copying of data from one region to another.

Page 6: Architecting for the cloud cloud providers

© Matthew Bass 2013

Current AWS Regions

• North America: – US East (5 availability zones) – US West Oregon (3 availability zones) – US West Northern California (3 availability zones) – USGov Cloud (2 availability zones)

• South America – Sao Paulo (2 availability zones)

• Europe – Ireland (2 availability zones)

• Asia Pacific – Sydney (2 availability zones) – Singapore (2 availability zones) – China (1 availability zone) – Tokyo (3 availability zones)

Page 7: Architecting for the cloud cloud providers

© Matthew Bass 2013

AWS and Services

• Amazon Web Services offers a number of services

• These services are things like: – Storage

– Database

– Network capabilities

– Monitoring

– …

• Not all services are available at all regions – https://aws.amazon.com/about-aws/globalinfrastructure/regional-

product-services/

Page 8: Architecting for the cloud cloud providers

© Matthew Bass 2013

Amazon Availability Zones

• Amazon has a notion of availability zones

• Engineered to be insulated from failures in other availability zones

• Availability zones are locations within a region

• Amazon has not announced the details of an availability region but presumably they are – Physically separate data centers

– Have independent networks

– Have independent power delivery

– …

Page 9: Architecting for the cloud cloud providers

© Matthew Bass 2013

Amazon Service Level Agreement

• Amazon guarantees 99.95% availability for each region

• IaaS consumers are free to deploy their applications: – Within an availability zone

– Across availability zones but within a region

– Across regions

• Amazon does not make any claim about the availability of their availability zones (that I could find)

Page 10: Architecting for the cloud cloud providers

© Matthew Bass 2013

All-in-one Single Server

Page 11: Architecting for the cloud cloud providers

© Matthew Bass 2013

Basic 4-server Setup

Page 12: Architecting for the cloud cloud providers

© Matthew Bass 2013

Multiple Availability Zones

Page 13: Architecting for the cloud cloud providers

© Matthew Bass 2013

Multiple Regions

Page 14: Architecting for the cloud cloud providers

© Matthew Bass 2013

Elastic Compute Cloud (EC2) & Redundancy

• EC2 supports different levels of redundancy

– It is up to the customer to determine how much redundancy they wish to have and how much they wish to pay for it

• Redundant elements can be:

– Within an availability zone

– Across availability zones

– Across regions

Page 15: Architecting for the cloud cloud providers

© Matthew Bass 2013

Microsoft Azure Regions

• North America – US Central (Iowa) – US East (Virginia) – US East 2 (Virginia) – US North Central (Illinois) – US South Central (Texas) – US West (California)

• Europe – Europe North (Ireland) – Europe West (Netherlands)

• Asia Pacific – East (Hong Kong) – Southeast (Singapore)

• Japan – Japan East (Saitama) – Japan West (Osaka)

• Brazil – Sao Paulo

Page 16: Architecting for the cloud cloud providers

© Matthew Bass 2013

Fault Domains in Azure

• In Azure there is the concept of Fault Domains

• A Fault Domain is essentially a rack in a given datacenter

• A consumer is not able to define which fault zones the application are distributed to

– Unlike an availability zone

• As a result the fault zone is really an internal structure

Page 17: Architecting for the cloud cloud providers

© Matthew Bass 2013

Upgrade Domains in Azure

• An upgrade domain is similar to a fault domain

• Essentially an upgrade domain will be upgraded at one time

– When Microsoft upgrades their internal infrastructure they do so a domain at a time

• In order to guard against failures within a fault domains and upgrades you need to replicate across both fault and upgrade domains

• This is called an availability set

Page 18: Architecting for the cloud cloud providers

© Matthew Bass 2013

Azure Availability Sets

Page 19: Architecting for the cloud cloud providers

© Matthew Bass 2013

Amazon Auto Scaling

• Auto Scaling works in conjunction with Cloudwatch (Amazon’s monitoring service)

• The idea is the monitoring service monitors the metrics – CPU utilization – Latency – Memory consumption

• The Auto Scaling solution establishes the rules – Add instances when utilization exceeds 70% – Remove instances when utilization falls below 10%

• You can specify things like a “cooling off” period – Where no action is taken until the system has a chance to stabilize

Page 20: Architecting for the cloud cloud providers

© Matthew Bass 2013

Amazon Elastic Load Balancer

• This is Amazon’s load balancing solution – Recall the push/pull architecture discussion

• It tracks the status and location of instances

• Routes requests to healthy instances based on criteria that you establish

• Can be used in conjunction with Auto Scaling – When new instances are added or removed they are registered with the ELB

• Can use in conjunction with Amazon’s DNS (route 53) – You can use DNS failover to move from one region to another

– The DNS will route traffic to the ELB in the target region

Page 21: Architecting for the cloud cloud providers

© Matthew Bass 2013

Amazon Simple Queue Service

• SQS is Amazon’s queuing service

– Again recall the push/pull architecture discussion

• It’s a service that supports message queues

• Recall it can be used in conjunction with Auto Scaling to manage the elasticity of your application

• Pricing is per million requests handled

Page 22: Architecting for the cloud cloud providers

© Matthew Bass 2013

Amazon Storage Solutions

• Amazon has several storage solutions – Elastic Block Store (EBS) – Simple Storage Solution (S3) – Glacier

• These provide raw unmanaged storage • This is useful for:

– Disaster recovery – Backup – Archiving – Persistence for your own database solution

Page 23: Architecting for the cloud cloud providers

© Matthew Bass 2013

Amazon Elastic Block Store

Amazon Elastic Block Store (EBS) is Amazon’s data file system. Some of its features are

• Data is persisted independently from instances

• EBS data is placed in a specific availability zones and can be attached to instances in the same availability zone

• EBS data is automatically replicated within availability zone

• There are two networks that connect EBS instances – A high speed network to provide coordination among instances and move data between

instances.

– A lower speed network used as backup for coordination.

• $0.05 per million I/O requests

Page 24: Architecting for the cloud cloud providers

© Matthew Bass 2013

Amazon Simple Storage Solution (S3)

• S3 is a scalable storage solution

• Good for content storage and distribution

• Good for backup, archiving, and disaster recovery

• Costs $0.03 per GB of data

• More expensive but faster than Glacier

• Not as fast for I/O as EBS

Page 25: Architecting for the cloud cloud providers

© Matthew Bass 2013

Amazon Glacier

• Low cost storage solution

• Good for off site archival of Enterprise data

• Good for backup and data archiving

• Good for large volumes of data

• Costs $0.01 per GB of data

Page 26: Architecting for the cloud cloud providers

© Matthew Bass 2013

Amazon Database Solutions

• Amazon has a number of fully managed database solutions

• These are built on top of one of Amazon’s storage solutions

• They include:

– DynamoDB

– Relational Data Store (RDS)

– Redshift

– ElastiCache

Page 27: Architecting for the cloud cloud providers

© Matthew Bass 2013

DynamoDB

• Key Value data store

• Uses a throughput oriented pricing model (rather than a storage oriented model)

• Uses solid state drives

• Guarantees single digit read latencies

• You pay a flat hourly rate based on capacity that you reserve

– Costs $0.0065 per hour for every 10 units of write capacity

– Costs $0.0065 per hour for every 10 unites of read capacity

Page 28: Architecting for the cloud cloud providers

© Matthew Bass 2013

Relational Data Store

• A distributed relational web service that provides a relational database for use in applications

• It provides access to MySQL, Oracle, SQL Server, or PostgreSQL

• It simplifies installation, patching, and backup related issues

• Priced per hour according to db type, size, and number

Page 29: Architecting for the cloud cloud providers

© Matthew Bass 2013

Redshift

• Redshift is Amazon’s data warehousing solution

• Integrates with other storage solutions

• Priced at either $0.25 per hour on the low end

• $1000/year per terabyte per year

Page 30: Architecting for the cloud cloud providers

© Matthew Bass 2013

ElastiCache

• A Web Service that enables an in memory data cache

• Supports:

– Memcached

– Redis

• Improves latency and throughput for read heavy applications

• Prices are per Cache node/hour

Page 31: Architecting for the cloud cloud providers

© Matthew Bass 2013

Amazon CloudFront

• Amazon’s content delivery network

• Provides edge services

– Competes with companies such as Akamai

• This service will allow you to locate content closer to users

– Reduces latency

• You specify the edge location and point it to the origin

• You can route DNS to the edge location if you want

Page 32: Architecting for the cloud cloud providers

© Matthew Bass 2013

Amazon Elastic IP Addressing

• Amazon provides elastic IP addressing

• The IP address is associated with your account – not with an instance

• You can programmatically map the elastic IP to any instance in your account

• In this way you make the deployment configuration transparent to the user/application

– Remember the virtual network discussion?

Page 33: Architecting for the cloud cloud providers

© Matthew Bass 2013

Many Other Services Available

• Authentication services

• Analytics

• Elastic Map Reduce

• Real time data streaming and processing

• Business process automation services

• Email services

• Notification services

• …

Page 34: Architecting for the cloud cloud providers

© Matthew Bass 2013

Comparison to Other Providers

• Other major providers (Google, Microsoft, Rackspace) offer similar services

• Google doesn’t have as many services but has different pricing model

– Charges in 10 minute increments rather than one hour increment

• Microsoft has similar services

• Rackspace also provides comparable options

Page 35: Architecting for the cloud cloud providers

© Matthew Bass 2013

Outages

• In Amazon (and others) there are some kinds of outages that are specific to the structure of the provider

• We will now look at some of these outages

Page 36: Architecting for the cloud cloud providers

© Matthew Bass 2013

Zone Failure

• All of the IaaS providers have some notion of an “availability zone”

• An availability zone (or fault domain in Azure) has it’s own switch, router, and rack

• These availability zones are isolated from each other in a way that nodes within an availability zone are not

Page 37: Architecting for the cloud cloud providers

© Matthew Bass 2013

Zone Failure Modes

• A zone can fail in different ways

Zone 1 Zone 2 Zone 3

Region

Page 38: Architecting for the cloud cloud providers

© Matthew Bass 2013

Complete Failure

• If for example you have a power outage you’ll have a complete failure

• If you try to route traffic to any of these machines you’ll get a “no route to host”

– This happens quickly – fast fail

• You’ll know the zone is out

• You can then spin up a new zone elsewhere

Page 39: Architecting for the cloud cloud providers

© Matthew Bass 2013

Zone Failure Modes

• You could have a network failure

Zone 1 Zone 2 Zone 3

Region

Page 40: Architecting for the cloud cloud providers

© Matthew Bass 2013

Network Failure

• If you have a network failure it’s typically not a complete failure

• The machines are still working but the network is having trouble

• There is often still a route to host but your data isn’t reaching the host

• As a result you don’t get a fast fail

– You’ll get long timeouts

Page 41: Architecting for the cloud cloud providers

© Matthew Bass 2013

Network Failure

• With the long timeouts your system will start to back up

• It’s difficult to tell the difference between this issue and other issues that result in latency lags

• This problem can be intermittent as some of the routers might be down but not all

Page 42: Architecting for the cloud cloud providers

© Matthew Bass 2013

Zone Failure Modes

• You could have a failure of some zone service

Zone 1 Zone 2 Zone 3

Region

Page 43: Architecting for the cloud cloud providers

© Matthew Bass 2013

Zone Service Failure

• This is some when a service fails that the zone is dependent on

– It could be something that is part of the platform as a service (e.g. EBS)

– It could also be a central service in your application

• This causes cascading failures

• Difficult to figure out what is going on

Page 44: Architecting for the cloud cloud providers

© Matthew Bass 2013

Region Failure

• It’s rare but a Region can fail as well

• Both complete and partial failures have happened

• Typically this starts with isolated issues that cascade

• There might be an issue with a few nodes or with a single availability zone

• Other zones become impacted (often due to additional traffic) and fail

– It can be difficult to determine the scope of the issue while it’s occurring

Page 45: Architecting for the cloud cloud providers

© Matthew Bass 2013

Regional Failure Modes

• You could loose network access to a region

Zone 1 Zone 2 Zone 3

Region

Page 46: Architecting for the cloud cloud providers

© Matthew Bass 2013

Regional Outage

• This is often caused by

– a DNS issue

– Router issues

– Network capacity overload

• Causes you to loose access to a region

Page 47: Architecting for the cloud cloud providers

© Matthew Bass 2013

Regional Failure Modes

• Local failures can cause a control plane overload

Zone 1 Zone 2 Zone 3

Region

Page 48: Architecting for the cloud cloud providers

© Matthew Bass 2013

Data Store Failure

• As with the other portions of the system the data store can become unresponsive

• The remedy for this is typically to mark this node as bad and attempt to bring a new node online

• If the issue is more pervasive it can result in:

– Disrupted availability

– Loss of persistent data

Page 49: Architecting for the cloud cloud providers

© Matthew Bass 2013

Backup Failure

• Systems will often have a backup data mechanism

• This is often a key component in disaster recovery

• This can also fail

– It can become temporarily or permanently unavailable

Page 50: Architecting for the cloud cloud providers

© Matthew Bass 2013

Upgrades

• Cloud providers need to upgrade their software as well

• When they do this the nodes that are being upgraded experience an outage

• If your software is running on these nodes you might experience an outage as well

Page 51: Architecting for the cloud cloud providers

© Matthew Bass 2013

Utilizing AWS

• You can utilize AWS in many ways

– You can host your entire application in the cloud

– You can host a specific portion of your application in the cloud

– You can use the cloud for a specialized need

Page 52: Architecting for the cloud cloud providers

© Matthew Bass 2013

Hosting Your Application

• You can have a system that is fully deployed in the cloud • You’ll need to figure out how to structure the application to achieve both functional and quality

attribute needs • You’ll want to first consider quality attribute concerns such as:

– Scalability – Availability – Security – …

• Utilize the techniques we talked about to determine the needs – Fault modeling (considering the cloud specific faults) – Threat modeling – Understanding the anticipated load and desired throughput and latency

• Come up with a gross structure that achieves your objectives – Think about partitioning of the system to support testing, degraded modes of operation and independent

deployment

Page 53: Architecting for the cloud cloud providers

© Matthew Bass 2013

Partial Hosting

• You might want to leverage the cloud for a specific portion of your system e.g. – Supporting mobile applications

– Databases

– Analytics

– Delivery of particular content

– Hosting your front end

– …

• This is typically going to be driven by cost and quality attribute needs (e.g. scalability)

Page 54: Architecting for the cloud cloud providers

© Matthew Bass 2013

Backup and Recovery

• Many organizations utilize the cloud for bulk storage, archiving, or back up and recovery

• In the past external services were used for such needs

– They often stored data on tape in separate physical locations

• It can be cheaper and more convenient to utilize cloud services

• As a result many organizations use the cloud for such storage needs

Page 55: Architecting for the cloud cloud providers

© Matthew Bass 2013

Summary

• Many services are available in the cloud

– Storage

– Network

– Compute related services

– …

• These services provide different levels of service at different pricing levels

• Utilizing the cloud appropriately and efficiently takes an explicit understanding of both your needs and the services available