Predictable Data Centers · Job-based pricing [Bazaar, SOCC 12] Request N VMs. Each VM can send...

28
Predictable Data Centers Hitesh Ballani, Paolo Costa, Fahad Dogar, Keon Jang, Thomas Karagiannis, and Ant Rowstron Systems & Networking Microsoft Research, Cambridge http://research.microsoft.com/datacenters/

Transcript of Predictable Data Centers · Job-based pricing [Bazaar, SOCC 12] Request N VMs. Each VM can send...

Page 1: Predictable Data Centers · Job-based pricing [Bazaar, SOCC 12] Request  N VMs. Each VM can send and receive at B Mbps. Bazaar Enables predictable performance and cost

Predictable Data Centers

Hitesh Ballani, Paolo Costa, Fahad Dogar, Keon Jang,Thomas Karagiannis, and Ant Rowstron

Systems & NetworkingMicrosoft Research, Cambridge

http://research.microsoft.com/datacenters/

Page 2: Predictable Data Centers · Job-based pricing [Bazaar, SOCC 12] Request  N VMs. Each VM can send and receive at B Mbps. Bazaar Enables predictable performance and cost

Predictable Data Centers

Goal: Enable predictable application performance in multi-tenant data centers

Multi-tenant data center is a data center with multiple (possibly competing) tenants

Private data centers► Run by organizations like Facebook, Microsoft, Google, etc► Tenants: Product groups and applications

Cloud data centers► Amazon EC2, Microsoft Azure, Rackspace, etc.► Tenants: Users renting virtual machines

Page 3: Predictable Data Centers · Job-based pricing [Bazaar, SOCC 12] Request  N VMs. Each VM can send and receive at B Mbps. Bazaar Enables predictable performance and cost

Unpredictability

Often cited as a key hindrance to cloud adoption

Root cause: Shared resources

In multi-tenant data centers, resources like the network and storage are shared amongst users

Variable resourceperformance

Unpredictable performance forapplications and services

Page 4: Predictable Data Centers · Job-based pricing [Bazaar, SOCC 12] Request  N VMs. Each VM can send and receive at B Mbps. Bazaar Enables predictable performance and cost

Dimensions of unpredictability

Performance► No throughput or latency guarantees► Private data centers: SLA violations, starvation► Public data centers: Impossible to provide SLAs

Costs► Absence of performance guarantees implies unpredictable

costs► Location-dependent

Fairness► Same payment may not always translate to same performance

Page 5: Predictable Data Centers · Job-based pricing [Bazaar, SOCC 12] Request  N VMs. Each VM can send and receive at B Mbps. Bazaar Enables predictable performance and cost

Outline

Public cloud► Dealing with performance & cost unpredictability

Private data centers► Meeting SLAs

► Reducing completion time

Page 6: Predictable Data Centers · Job-based pricing [Bazaar, SOCC 12] Request  N VMs. Each VM can send and receive at B Mbps. Bazaar Enables predictable performance and cost

Performance Unpredictability

Ken Enterprise

Map Reduce

Job

Results

Data analytics on an isolated clusterCompletion

Time4 hours

Data analytics in the public cloud

Ken

Map Reduce

Job

Results

Azure data center

CompletionTime

4-8 hours

Page 7: Predictable Data Centers · Job-based pricing [Bazaar, SOCC 12] Request  N VMs. Each VM can send and receive at B Mbps. Bazaar Enables predictable performance and cost

Performance Unpredictability

Ken Enterprise

Map Reduce

Job

Results

Data analytics on an isolated clusterCompletion

Time4 hours

Data analytics in the public cloud

Ken

Map Reduce

Job

Results

Azure data center

CompletionTime

4-8 hours

Variable tenant costsExpected cost (based on 4 hour completion time) = $100

Actual cost = $100-200

Page 8: Predictable Data Centers · Job-based pricing [Bazaar, SOCC 12] Request  N VMs. Each VM can send and receive at B Mbps. Bazaar Enables predictable performance and cost

Cost unpredictability

CPU-bound jobs Network-bound jobs

Job Cost = $ 𝑘 ⋅ 𝑁 ⋅ 𝑇(e.g.., k= $0.085 /hour)

Job Cost = $ 𝑘 ⋅ 𝑁 ⋅ 𝑇

but … 𝑇 =𝐿

𝐵, hence..

Job Cost = $ 𝑘 ⋅ 𝑁 ⋅𝐿

𝐵

Bandwidth (B)VM

cos

t / u

nit

tim

ek

Simple and intuitive Cost depends on B (high variability) No incentive for the provider

Today’s Price

Page 9: Predictable Data Centers · Job-based pricing [Bazaar, SOCC 12] Request  N VMs. Each VM can send and receive at B Mbps. Bazaar Enables predictable performance and cost

Cost unpredictability

CPU-bound jobs Network-bound jobs

Job Cost = $ 𝑘 ⋅ 𝑁 ⋅ 𝑇(e.g.., k= $0.085 /hour)

Job Cost = $ 𝑘 ⋅ 𝑁 ⋅ 𝑇

but … 𝑇 =𝐿

𝐵, hence..

Job Cost = $ 𝑘 ⋅ 𝑁 ⋅𝐿

𝐵

Bandwidth (B)VM

cos

t / u

nit

tim

ek

Simple and intuitive Cost depends on B (high variability) No incentive for the provider

Today’s Price

Location-dependent pricing!

Page 10: Predictable Data Centers · Job-based pricing [Bazaar, SOCC 12] Request  N VMs. Each VM can send and receive at B Mbps. Bazaar Enables predictable performance and cost

Towards a predictable cloud

Performance

► Guarantee network throughput

► Virtual Network Abstractions [Oktopus, SIGCOMM 11]

Extend the tenant-provider interface to account for the network

Tenant

Request

# of VMs and network demands

Page 11: Predictable Data Centers · Job-based pricing [Bazaar, SOCC 12] Request  N VMs. Each VM can send and receive at B Mbps. Bazaar Enables predictable performance and cost

Towards a predictable cloud

Performance

► Guarantee network throughput

► Virtual Network Abstractions [Oktopus, SIGCOMM 11]

Extend the tenant-provider interface to account for the network

Tenant

Request

# of VMs andnetwork demands

VM1 VM2 VMN

Virtual Network

Key Idea: Tenants are offered a virtual network

with bandwidth guarantees

This decouples tenant performance from provider

infrastructure

Page 12: Predictable Data Centers · Job-based pricing [Bazaar, SOCC 12] Request  N VMs. Each VM can send and receive at B Mbps. Bazaar Enables predictable performance and cost

Oktopus

Two main components

► Management plane: Allocation of tenant requests

► Allocates tenant requests to physical infrastructure

► Accounts for tenant network bandwidth requirements

► Data plane: Enforcement of virtual networks

► Enforces tenant bandwidth requirements

► Achieved through rate limiting at end hosts

Request <N, B>N VMs. Each VM

can send and receive at B Mbps

Page 13: Predictable Data Centers · Job-based pricing [Bazaar, SOCC 12] Request  N VMs. Each VM can send and receive at B Mbps. Bazaar Enables predictable performance and cost

Towards a predictable cloud

Performance► Guarantee network throughput► Virtual Network Abstractions

[Oktopus, SIGCOMM 11]

Pricing► Dominant resource pricing

[HotNets 11]

Request <N, B>N VMs. Each VM

can send and receive at B Mbps

Page 14: Predictable Data Centers · Job-based pricing [Bazaar, SOCC 12] Request  N VMs. Each VM can send and receive at B Mbps. Bazaar Enables predictable performance and cost

How to combine the two pricing models?

Bandwidth (B)

k

Bandwidth (B)

kb

VM

cos

t / u

nit

tim

e

Occupancy Usage

CPU-bound jobs

Network-bound jobs

CPU-bound jobs

Network-bound jobs

Page 15: Predictable Data Centers · Job-based pricing [Bazaar, SOCC 12] Request  N VMs. Each VM can send and receive at B Mbps. Bazaar Enables predictable performance and cost

How to combine the two pricing models?

Dominant Resource Pricing (DRP)

Bandwidth (B)

kv

Bandwidth (B)

kv=

, kb

kb

max

kv/kb

VM Cost / unit time

= 𝑘𝑣 if 𝐵 <𝑘𝑣

𝑘𝑏

= 𝑘𝑏 ⋅ 𝐵 if 𝐵 ≥𝑘𝑣

𝑘𝑏

Job Cost

= $ 𝑘𝑣 ⋅ 𝑁 ⋅ 𝑇 if 𝐵 <𝑘𝑣

𝑘𝑏(occupancy)

= $ 𝑘𝑏 ⋅ 𝑁 ⋅ 𝐿 if 𝐵 ≥𝑘𝑣

𝑘𝑏(usage)

VM

cos

t / u

nit

tim

e

Bandwidth (B)

Page 16: Predictable Data Centers · Job-based pricing [Bazaar, SOCC 12] Request  N VMs. Each VM can send and receive at B Mbps. Bazaar Enables predictable performance and cost

Towards a predictable cloud

Performance► Guarantee network throughput► Virtual Network Abstractions

[Oktopus, SIGCOMM 11]

Pricing► Dominant resource pricing

[HotNets 11]

Change the cloud model!► Job-based pricing

[Bazaar, SOCC 12]

Request <N, B>N VMs. Each VM

can send and receive at B Mbps

Page 17: Predictable Data Centers · Job-based pricing [Bazaar, SOCC 12] Request  N VMs. Each VM can send and receive at B Mbps. Bazaar Enables predictable performance and cost

BazaarEnables predictable performance and cost

Tenant

Job Request

Perf/Costconstraints Provider

Bazaar

ResourcesRequired

VMs andnetwork

Resource UtilizationJob Cost

Today’s pricing: Resource-basedBazaar enables job-based pricing!

Page 18: Predictable Data Centers · Job-based pricing [Bazaar, SOCC 12] Request  N VMs. Each VM can send and receive at B Mbps. Bazaar Enables predictable performance and cost

Outline

Public cloud► Dealing with performance & cost unpredictability

Private data centers► Meeting SLAs

► Reducing completion time

Page 19: Predictable Data Centers · Job-based pricing [Bazaar, SOCC 12] Request  N VMs. Each VM can send and receive at B Mbps. Bazaar Enables predictable performance and cost

Application SLAs

Component SLAsSLAs for components at each

level of the hierarchy

Network SLAsDeadlines on communications

between components

Today’s transport protocols:Deadline agnostic and strive for fairness

SLA violations: User-facing online services

Agg.

Aggregator

Agg. Agg.

Request Response

Worker Worker Worker

200 ms

100 ms

45 ms

5ms 5ms

4ms4ms4ms4ms

5ms

25ms 35ms 25ms

15ms

30ms 22ms

25ms

Flow Deadlines

A flow is useful if and only if it satisfies its deadline

Worker

Page 20: Predictable Data Centers · Job-based pricing [Bazaar, SOCC 12] Request  N VMs. Each VM can send and receive at B Mbps. Bazaar Enables predictable performance and cost

Flows get bandwidth in accordance to their deadlinesDeadline awareness ensures both flows satisfy deadlines

Limitations of Fair Sharing

Flow f1, 20ms

Flow f2, 40ms

Time

Flo

ws f1

f2

20 40

X

Flo

ws f1

f2

20 40

Time

Flows f1 and f2 get a fair share of bandwidthFlow f1 misses its deadline (incomplete response to user)

Status Quo Deadline aware

Case for unfair sharing:

Page 21: Predictable Data Centers · Job-based pricing [Bazaar, SOCC 12] Request  N VMs. Each VM can send and receive at B Mbps. Bazaar Enables predictable performance and cost

Limitations of Fair Sharing

6 flows with 30ms deadline

Flo

ws

X

Time

30

XXXXX

Flo

ws

Time

30

Insufficient bandwidth to satisfy all deadlinesWith fair share, all flows miss the deadline (empty response)

With deadline awareness, one flow can be quenchedAll other flows make their deadline (partial response)

Case for flow quenching:

Page 22: Predictable Data Centers · Job-based pricing [Bazaar, SOCC 12] Request  N VMs. Each VM can send and receive at B Mbps. Bazaar Enables predictable performance and cost

Predictability in private data centers

Deadline-driven flow schedulingD3 [SIGCOMM 11]

► Prioritize flows based on deadlines

► Expose flow deadlines to the network

► Explicit rate control

Task aware data centers► Reducing task completion times

► Amazon: extra 100ms costs 1% in sales

Page 23: Predictable Data Centers · Job-based pricing [Bazaar, SOCC 12] Request  N VMs. Each VM can send and receive at B Mbps. Bazaar Enables predictable performance and cost

Task Oriented Applications

Typical DC apps perform “tasks”

► Unit of work that can be linked to a waiting user

Answering a user’s search queryGenerating a user’s wall

Examples

From the network’s perspective, tasks generate rich workflows

Page 24: Predictable Data Centers · Job-based pricing [Bazaar, SOCC 12] Request  N VMs. Each VM can send and receive at B Mbps. Bazaar Enables predictable performance and cost

Task Oriented Applications

Typical DC apps perform “tasks”

► Unit of work that can be linked to a waiting user

Answering a user’s search queryGenerating a user’s wall

Examples

From the network’s perspective, tasks generate rich workflows

Flows of tasks traverse different parts of the network at different times

To reduce task completion times it is important to optimize at the task level

Page 25: Predictable Data Centers · Job-based pricing [Bazaar, SOCC 12] Request  N VMs. Each VM can send and receive at B Mbps. Bazaar Enables predictable performance and cost

Flow vs. Task aware optimizations

Goal: Minimize Task Completion Times

3

time

Shortest Flow First (SFF) Task Aware

6

6

Link B

Link A

time

Link B

Link A

3

3

6

63

Task 1 Task 2

(3A , 6B) (6A , 3B)

Page 26: Predictable Data Centers · Job-based pricing [Bazaar, SOCC 12] Request  N VMs. Each VM can send and receive at B Mbps. Bazaar Enables predictable performance and cost

Flow vs. Task aware optimizations

Goal: Minimize Task Completion Times

3

time

Shortest Flow First (SFF) Task Aware

6

6

Link B

Link A

time

Link B

Link A

3

3

6

63

Task 1 Task 2

(3A , 6B) (6A , 3B)

Page 27: Predictable Data Centers · Job-based pricing [Bazaar, SOCC 12] Request  N VMs. Each VM can send and receive at B Mbps. Bazaar Enables predictable performance and cost

Task aware data centers

Designing a practical task-aware scheduling system► Policy – order in which tasks should be processed

► Decentralized mechanisms to prioritize tasks

Benefits► 65% reduction in task completion time

Page 28: Predictable Data Centers · Job-based pricing [Bazaar, SOCC 12] Request  N VMs. Each VM can send and receive at B Mbps. Bazaar Enables predictable performance and cost

Summary

Unpredictability

►Key hindrance to cloud adoption

►Root cause: Shared resources

►Several challenges: performance, cost, fairness

http://research.microsoft.com/datacenters/