Capsule Placement in the Service Platform
description
Transcript of Capsule Placement in the Service Platform
Capsule Placement in the Service Platform
Bhuvan Urgaonkar
Timothy Roscoe
Systems Group, Sprint ATL
Service Platform: an overview
Processors
High speed interconnect
Internet
Management/Control Unit
Service Platform: Goals
• Sell the platform’s resources
• Manage the resources efficiently
• Provide performance guarantees to customers
• Start or stop services within minutes
Services and Capsules
• Services: – web/game/streaming servers– service provider pays the platform
• Capsules– Def: Component of a service that should run on
a single node– e.g.: consider a replicated web server
Nucleus
• Node specific control/management software:– Capsule creation, destruction– Health information (process liveness)– Resource parameters (memory, CPU, network
bandwidth etc.)
Control Plane
• Capsule Placement
• Flow Placement
• Node, network, service monitoring
• Deployed Service Database
• Billing
Outline of this talk
• Service Platform: an overview
• Quality of Service
• Capsule Placement
• Design of the Placement unit
• Conclusions and future work
QoS Representations
• Application level– e.g., 50 transactions per sec
• Contract level– e.g., “something like a 300 MHz Pentium II”
• Platform level– e.g., ?
• Node level– e.g., weights, priorities etc.
Translation between QoS levels
• Application level => Contract level– Application specific, customer’s problem
• Contract level => Platform level– More a business problem
• Platform level => Node level– OS dependent
Capsule Placement: Desirables
• Maximize revenue!
• Aware of the “importance” of services.
• Overbooking.
• Exploit known workload characteristics.
• Adapt to changes in workload?
• Fast.
Stages in hosting a service
• Requirement specification
• Placement
• Deployment
• Activation
Requirement Specification
• Contract level representation– Many possibilities: 300 MHz PII, best effort or a CPU
instruction token bucket.
• Platform level representation – Must be uniform across the platform.
– (rate, burst, ovb tolerance, arch, OS)
Translation to Node level
• Reservation based scheduler– map (rate, burst) to (period, slice)– bigger burst => bigger period
• Proportional share scheduler– burst ?– weight in proportion to rate
• Priority based scheduler– no easy mapping
Placement
• Find the set of feasible nodes– Compatible architecture and OS– No overbooking tolerances violated
• Pick one node from this set– Best Fit– Worst Fit– Random Select– Close Overbooking
Placement: Example
capsules
nodes
a b c
N1 N4N3N2
One possible placement: (a, N1), (b, N2), (c, N3)
1030 20
30 1020 10
Deployment and Activation
• Deployment: The process of preparing a capsule for execution on a node.– Why ?
• e.g., need to download some files before starting
– the control plane sends all information to deploy the capsule
• Activation: Starting a deployed service
Capsule State Diagram
deployedundeployed
deploying
undeploying
activating
deactivating
active
Example Message Exchange
Control Plane Nucleus
deployed svc cap
state svc cap deployed
deployed svc cap
Instruct nucleus to deploy a capsule, start timer
No response! Send again
Starts deploying the capsule
Still deploying
Done deploying, send status messageDeployed before
timeout, instruct nucleus to activate
activated svc cap Starts activating the capsule
. . .
Placement Unit Architecture
Listen for new requests
Event Queue Message Queue
Dispatch Events Listen to nuclei
Events due to new requestsEvents due to msgs from nuclei Messages from nuclei
Database Consistency
• Transactions and exceptions– e.g:
try: transaction_begin ()
deploy_service (svc): transaction_commit () except: transaction_abort ()
Performance
• Time to compute placement: 1-2 sec => time to deploy usually much larger
• Comparison of heuristics– experiments with following workloads
• 1-3 capsules, CPU requirement 0-10%, wide range of overbooking tolerances
– Random Select admitted most # services, Best Fit admitted least
– But … more investigation needed
Summary
• QoS representation for CPU requirements of services.
• Implementation of placement unit.
• Some simple experiments to deploy and activate services.
Unfinished ...
• Experiments:– heuristics better suited to specific workloads.– Scalability and efficiency of the system.
• Integration of placement unit with rest of the Control Plane
• Handling various failures
• Extend to multiple resources - much harder than a single resource!