Clean Up Role WCF Role Web Site Role’ Cache Build Role Create as many roles as you need...

41
Scalability, Caching and Elasticity Name Title Microsoft Corporation

Transcript of Clean Up Role WCF Role Web Site Role’ Cache Build Role Create as many roles as you need...

Scalability, Caching and Elasticity

NameTitleMicrosoft Corporation

Outline

ScalabilityAchieving linear scaleScale Up vs. Scale Out in Windows AzureChoosing VM Sizes

CachingApproaches to cachingCache storage

ElasticityScale out, scale backAutomation of scaling

A Primer on Scale

Scalability is the ability to add capacity to a computing system to allow it to process more work

A Primer On Scalability

Vertical Scale UpAdd more resources to a single computation unit i.e. Buy a bigger boxMove a workload to a computation unit with more resourcese.g. Windows Azure Storage moving a partition.

Horizontal Scale OutAdding additional computation units and having them act in concertSplitting workload across multiple computation units

Vertical vs. Horizontal

For small scenarios scale up is cheaper

Code ‘just works’For larger scenarios scale out only solution

Massive diseconomies of scale1 x 64 Way Server >>>$$$ 64 x 1 Way Servers.

Shared resource contention becomes a problem

Scale out offers promise of linear, infinite scale

Computation Units

Th

rou

gh

pu

t

Roughly Linear Scalei.e. Additional throughput achieved by each additional unit remains constant

Non Linear Scalei.e. Additional throughput achieved by each additional unit decreases as more are added

Scalability != Performance

Often you will sacrifice raw speed for scalabilityFor example; ASP.NET session state

In Process ASP.NET Session State

SQL Server ASP.NET Session State

Achieving Linear Scale Out

Reduce or Eliminate Shared ResourcesMinimize reliance on transactions or transactional type behaviourHomogenous, Stateless computation nodes

We can then use simple work distribution methodsLoad balancers, queue distributionLess reliance on expensive hardware H/A

Units of Scale

Clean Up Role WCF RoleWeb Site Role’

Cache Build Role

Create as many roles as you need ‘knobs’ to adjust scale

Loss of an instance results in 50% capacity loss in web site.Queue Drive

Role

Web Driven Role

Loss of an instance results in just 25% capacity loss in web site.

Consolidation of Roles provides more redundancy for same cost

VM Size in Windows Azure

Windows AzureSupports Various VM Sizes~800mb/s NIC shared across machineSet in Service Definition (*.csdef).All instances of role will be equi-sized

<WorkerRole name=“myRole" vmsize="ExtraLarge">

Size CPU Cores Network RAM Local Storage

Cost

Small 1 Shared 1.7GB 250GB 1 x

Medium 2 Shared 3.5GB 500GB 2 x

Large 4 Shared 7GB 1000GB 4 x

Extra large

8 Dedicated 15GB 2000GB 8 x

Remember:

If it doesn’t run faster on multiple cores on your

desktop … It’s not going to run faster

on multiple cores in the cloud!

Choosing Your VM Size

Don’t just throw big VMs at every problemScale out architectures have natural parallelismTest various configurations under loadSome scenarios will benefit from more cores

Where moving data >$ parallel overheadE.g. Video processing

Stateful servicesDatabase server requiring full network bandwidth

Caching

Caching

Caching can improve both performance and scalability

Moving data closer to the consumer (Web/Worker) improves perfReducing load on the hard to scale data tier

Caching Is The Easiest Way To Add Performance and Scalability To Your

Application

In Windows Azure: Caching Will Save You Money!

Caching Scenario: Website UI ImagesWebsite UI Images

Largely static dataIncluded in every page

Goal: A Better UIServe content onceAvoid round trip unless content changesMinimise traffic over the wireFewer storage transactionsLower load on web roles

Caching Scenario: RSS Feeds

Regular RSS FeedData delivered from database/storageLarge content payload>1mbData changes irregularlyCost determined by client voracity

Goal: A Better RSS Feed

Minimise traffic over the wireFewer storage transactionsLess hits on database

Caching Strategies

Client Side CachingStatic Content Generation

Client Side Caching

BLOBsQueues Tables SQL Azure

Web RolesWorkerRoles

Client

Client Caching - ETags

ETag == Soft CachingHeader added on HTTP ResponseETag: “ABCDEFG”Client does conditional HTTP GET

If-None-Match: “ABCDEFG”Returns content if ETag no longer matches

Implemented natively by Windows Azure Storage

Supports client side cachingAlso used for optimistic concurrency control

Client Caching - ETags

BenefitsPrevents client downloading un-necessary dataOut of the box support for simple ‘static content’ scenarios.

ProblemsStill requires round trip to serverMay require execution of server side code to re-create ETag before checking

string etag = Request.Headers["If-None-Match"];

if(String.Compare(etag, GetLastBlogPostIDAzTable()) == 0) {

Response.StatusCode = 412;return;

}

Client Caching – Cache-Control

Cache-Control: max-age == Hard Caching

Header added on HTTP ResponseCache-Control: max-age=2592000

Client may cache file without further request for 30 days

Client will not re-check on every request

Very useful for static filesheader_logo.png

Used to determine TTL on CDN edge nodesSet this on Blob using

x-ms-blob-cache-control

Client Caching – Cache-Control

BenefitsPrevents un-necessary HTTP requestsPrevents un-necessary downloads

ProblemsWhat if files do change in the 30 days?

<img src=http://*.blob.*/Container/header_logo.png ?random=<rnd>/>

<img src=http://*.blob.*/Containerv1.0/header_logo.png /><img src=http://*.blob.*/Containerv2.0/header_logo.png />

<img src=http://*.blob.*/Container/header_logo.png ?snapshot=<DT1>/><img src=http://*.blob.*/Container/header_logo.png ?snapshot=<DT2>/>

Windows Azure Technique:Put static files in Blob storage use Cache-Control + URL FlippingSimple randomization == simple but no versioningContainer level flipping == simple but more expensiveSnapshot level flipping == more complex but lower cost

Static Content Generation

BLOBs QueuesTables SQL Azure

Web RolesWorkerRoles

Static Content Generation

Generate Content Periodically in Worker Role

Can spin up workers just for generationGenerate as triggered async operation

Content May BeFull pagesResources (CSS Sprites, PDF/XPS, Images etc…)Content fragments

Push static content into Blob storageServe direct out of Blob storageMay also be able to use persistent local storage

Static Content Generation

BenefitsReduce load on web rolesPotentially reduce load on data tierResponse times improvedCan combine with Cache-Control and ETags

ProblemsNeed to deal with stale data

Manage/RefreshIgnore

A Better RSS Feed?

Build standard RSS Feed in Web RoleGenerate content dynamically from storageSerialize as RSS using Feed FormattersPlace on obfuscated (hidden) URL

Build a worker role to poll hidden RSS feedRetrieve RSS content at certain intervals or on eventPush content into a Blob if changed

Serve RSS to users from Blob storageTake advantage of E-TagsZero load on database or RSS tables to serve content

BLOBs vs. Compute Instances

BLOB StorageDisk Based

15c/GB/Month1c/10,000 requests

Compute InstancesRAM and Disk Based

12c/hrper 1GB RAMper 250GB disk

Dedicated compute cache roles must serve at least 120,000 cache requests per hour to be cheaper than Windows Azure storage

Outside USA and Europe: use CDN for caching due to much lower bandwidth costs

Elastic Scale Out

Usage

Com

pu

te

Time

Average

Inactivity

Period

“On and Off “

On & off workloads (e.g. batch job)Over provisioned capacity is wasted Time to market can be cumbersome

Com

pu

te

Time

“Unpredictable Bursting“

Average Usage

Unexpected/unplanned peak in demand Sudden spike impacts performance Can’t over provision for extreme cases

Average Usage

Com

pu

te

Time

“Growing Fast“

Successful services needs to grow/scale Keeping up w/ growth is big IT challenge Cannot provision hardware fast enough

Com

pu

te

Time

Average Usage

“Predictable Bursting“

Services with micro seasonality trends Peaks due to periodic increased demandIT complexity and wasted capacity

Elastic Cloud Workflow Patterns

Dealing with Variable Load

1. Dealing with variable load takes two forms

2. Maintaining excess capacity or headroom• Costs: paying for unused capacity• Faster availability• Async work pattern can provide buffer

3. Adding/Removing additional capacity• Takes time to spin up• Requires management- human or

automated• Pre-emptive or metric driven

Head Room in Windows Azure

Web RolesRun additional web rolesHandle additional load before performance degrades

Worker RolesIf possible just buffer into queuesWill be driven by tolerable level of latencyStart additional roles only if queues not clearingUse generic workers to pool resources

Head Room in Windows Azure Services

Windows Azure StorageStorage nodes serve many partitionsPartition served by a single storage nodeFabric can move to a different storage nodeOpaque to the Windows Azure customer

SQL AzureNon-deterministic throttle gives little indicationRun extra instances – requires DB sharding

Adding Capacity in Windows Azure

Web Roles/Worker RolesEnable more instances (API or *.config)Editing instance count in config leaves existing instances runningChange to using larger VMs- will require redeploy.

Windows Azure StorageOpaque to userPartition aggressivelyCan ‘heat up’ a partition to encourage scale up

Adding Capacity in SQL Azure

SQL AzureAdd more databases (more partitions)Very difficult to achieve mid-stream

Requires moving hot dataMaintaining consistency across multiple DBs without DTC

Will depend on partitioning strategy

Rule Based Scaling

Use Service Management and Diagnostics APIsOn/Off and Predictable Bursting

Time based rules

Unpredictable demand and Fast GrowthMonitor metrics and react accordingly

Diagnostics & Management APIs

Monitor InputsHistorical DataTransactionsPerf CountersBusiness KPIs

Evaluate Biz RulesLatency too

high/lowHow much $

spentAre we at limitPredicted load

Action+/- instance count

Deploy new service

Increase queuesSend notifications

Monitor metrics

Primary metrics (actual work done)Requests per SecondQueue messages processed / interval

Secondary metricsCPU UtilizationQueue lengthResponse time

Derivative metricsRate of change of queue lengthUse ‘historical’ data to help predict requirements

Gathering Metrics

Use Microsoft.WindowsAzure.Diagnostics.*Capture various metrics via Management API

Diagnostics Infrastructure LogsEvent LogsPerformance CountersIIS Logs

May need to smooth/average some measuresRemember the cost of gathering data

Both performance and financial costsWould you use Perf Counters 24/7 on a production system? http://tinyurl.com/perfmon-overhead

Evaluating Business Rules

Are requests taking too long?Do I have too many jobs in my queue?How much money have I spent this month?

Could write these into code.Could build some sort of rules engine.Could use the WF rules engine.

Take Action

Add/Remove InstancesUse Service Management APIDon’t forget billing window is 1hr

Change role sizeRequires change to *.csdefMost suited to Worker Roles

Send notificationsEmailIM

Manage momentumBe careful not to overshoot

Summary

Designing for multiple instances provides

Scale outAvailabilityElasticity options

Caching should be a key component of any Windows Azure applicationVarious options for variable load

Spare capacityScale Out/BackAutomation possible

© 2010 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.

The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after

the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.