ElastMan: Autonomic Elasticity Manager for Cloud-Based Key...
Transcript of ElastMan: Autonomic Elasticity Manager for Cloud-Based Key...
ElastMan: Autonomic Elasticity Manager forCloud-Based Key-Value Stores
Ahmad Al-Shishtawy
KTH Royal Institute of TechnologyStockholm, Sweden
Doctoral School Day in Cloud ComputingLouvain-la-Neuve, Belgium, 20 Nov 2012
IntroductionElasticity Controller for Cloud-Based Key-Value Stores
EvaluationConclusions and Future Work
Outline
1 Introduction
2 Elasticity Controller for Cloud-Based Key-Value Stores
3 Evaluation
4 Conclusions and Future Work
ElastMan: Autonomic Elasticity Manager for Cloud-Based Key-Value Stores (A. Al-Shishtawy) 2/33
IntroductionElasticity Controller for Cloud-Based Key-Value Stores
EvaluationConclusions and Future Work
Autonomic ComputingCloud Computing and Elastic ServicesAutomation of ElasticityFeedback versus Feedforward Control
Outline
1 IntroductionAutonomic ComputingCloud Computing and Elastic ServicesAutomation of ElasticityFeedback versus Feedforward Control
2 Elasticity Controller for Cloud-Based Key-Value Stores
3 Evaluation
4 Conclusions and Future Work
ElastMan: Autonomic Elasticity Manager for Cloud-Based Key-Value Stores (A. Al-Shishtawy) 3/33
IntroductionElasticity Controller for Cloud-Based Key-Value Stores
EvaluationConclusions and Future Work
Autonomic ComputingCloud Computing and Elastic ServicesAutomation of ElasticityFeedback versus Feedforward Control
Dealing with Complexity
ProblemAll computing systems need to be managed
ElastMan: Autonomic Elasticity Manager for Cloud-Based Key-Value Stores (A. Al-Shishtawy) 4/33
IntroductionElasticity Controller for Cloud-Based Key-Value Stores
EvaluationConclusions and Future Work
Autonomic ComputingCloud Computing and Elastic ServicesAutomation of ElasticityFeedback versus Feedforward Control
Dealing with Complexity
ProblemAll computing systems need to be managed
ElastMan: Autonomic Elasticity Manager for Cloud-Based Key-Value Stores (A. Al-Shishtawy) 4/33
IntroductionElasticity Controller for Cloud-Based Key-Value Stores
EvaluationConclusions and Future Work
Autonomic ComputingCloud Computing and Elastic ServicesAutomation of ElasticityFeedback versus Feedforward Control
Dealing with Complexity
ProblemComputing systems are getting more and more complex
ElastMan: Autonomic Elasticity Manager for Cloud-Based Key-Value Stores (A. Al-Shishtawy) 4/33
IntroductionElasticity Controller for Cloud-Based Key-Value Stores
EvaluationConclusions and Future Work
Autonomic ComputingCloud Computing and Elastic ServicesAutomation of ElasticityFeedback versus Feedforward Control
Dealing with Complexity
ProblemComplexity means higher administration overheads
ElastMan: Autonomic Elasticity Manager for Cloud-Based Key-Value Stores (A. Al-Shishtawy) 4/33
IntroductionElasticity Controller for Cloud-Based Key-Value Stores
EvaluationConclusions and Future Work
Autonomic ComputingCloud Computing and Elastic ServicesAutomation of ElasticityFeedback versus Feedforward Control
Dealing with Complexity
ProblemComplexity poses a barrier on further development
ElastMan: Autonomic Elasticity Manager for Cloud-Based Key-Value Stores (A. Al-Shishtawy) 4/33
IntroductionElasticity Controller for Cloud-Based Key-Value Stores
EvaluationConclusions and Future Work
Autonomic ComputingCloud Computing and Elastic ServicesAutomation of ElasticityFeedback versus Feedforward Control
Dealing with Complexity
SolutionThe Autonomic Computing initiative by IBM
ElastMan: Autonomic Elasticity Manager for Cloud-Based Key-Value Stores (A. Al-Shishtawy) 4/33
IntroductionElasticity Controller for Cloud-Based Key-Value Stores
EvaluationConclusions and Future Work
Autonomic ComputingCloud Computing and Elastic ServicesAutomation of ElasticityFeedback versus Feedforward Control
Dealing with Complexity
SolutionSelf-Management: Systems capable of managing themselves
ElastMan: Autonomic Elasticity Manager for Cloud-Based Key-Value Stores (A. Al-Shishtawy) 4/33
IntroductionElasticity Controller for Cloud-Based Key-Value Stores
EvaluationConclusions and Future Work
Autonomic ComputingCloud Computing and Elastic ServicesAutomation of ElasticityFeedback versus Feedforward Control
Dealing with Complexity
SolutionUse Autonomic Managers
Monitor
Analyze Plan
Execute
Autonomic Manager
Knowledge Monitor
Analyze Plan
Execute
Autonomic Manager
Knowledge Monitor
Analyze Plan
Execute
Autonomic Manager
Knowledge Monitor
Analyze Plan
Execute
Autonomic Manager
Knowledge
ElastMan: Autonomic Elasticity Manager for Cloud-Based Key-Value Stores (A. Al-Shishtawy) 4/33
IntroductionElasticity Controller for Cloud-Based Key-Value Stores
EvaluationConclusions and Future Work
Autonomic ComputingCloud Computing and Elastic ServicesAutomation of ElasticityFeedback versus Feedforward Control
The Autonomic Computing Architecture
A Generic SolutionManaged Resource
Touchpoint (Sensors & Actuators)Autonomic Manager
MonitorAnalyzePlanExecute
Knowledge Source
Communication
Manager Interface
Monitor
Analyze Plan
Execute
Touch Point
Autonomic Manager
Managed Resource
Knowledge
Managed Resource
Touch Point
Manager
Interface
ElastMan: Autonomic Elasticity Manager for Cloud-Based Key-Value Stores (A. Al-Shishtawy) 5/33
IntroductionElasticity Controller for Cloud-Based Key-Value Stores
EvaluationConclusions and Future Work
Autonomic ComputingCloud Computing and Elastic ServicesAutomation of ElasticityFeedback versus Feedforward Control
Self-* Properties
Inspired by the autonomic nervous systemof the human body
Feedback control from Control TheorySelf-management along four main axes(self-* properties):
self-configurationself-optimizationself-healingself-protection
ElastMan: Autonomic Elasticity Manager for Cloud-Based Key-Value Stores (A. Al-Shishtawy) 6/33
IntroductionElasticity Controller for Cloud-Based Key-Value Stores
EvaluationConclusions and Future Work
Autonomic ComputingCloud Computing and Elastic ServicesAutomation of ElasticityFeedback versus Feedforward Control
Web 2.0 Applications
Increasing popularity of Web 2.0 applications (e.g., wikis, socialnetworks, blogs)Pose new challenges on the underlying provisioninginfrastructure
ScaleHighly dynamic workloadData centric
Traditional solutions does not work anymore
Vertical vs. horizontal scalability
ElastMan: Autonomic Elasticity Manager for Cloud-Based Key-Value Stores (A. Al-Shishtawy) 7/33
IntroductionElasticity Controller for Cloud-Based Key-Value Stores
EvaluationConclusions and Future Work
Autonomic ComputingCloud Computing and Elastic ServicesAutomation of ElasticityFeedback versus Feedforward Control
Cloud Computing
Provide the illusion of infinite resources (e.g., VMs)
Pay-as-you-go pricing modelElastic services (Horizontal scalability)
High load: Allocate more VMs to improve performanceLow load: Release VMs to save money
ElastMan: Autonomic Elasticity Manager for Cloud-Based Key-Value Stores (A. Al-Shishtawy) 8/33
IntroductionElasticity Controller for Cloud-Based Key-Value Stores
EvaluationConclusions and Future Work
Autonomic ComputingCloud Computing and Elastic ServicesAutomation of ElasticityFeedback versus Feedforward Control
The Need for Elasticity
Web 2.0 applications frequently experience high workloadsA service can become popular in just an hour
The high level load does not last for long and keeping resourcesin the Cloud costs moneyThis has led to Elastic Computing
Ability of a system to grow and shrink at run-time in response tochanges in workloadCloud computing allows on-the-fly requesting and releasing VMsMeet SLOs at a minimal cost
ElastMan: Autonomic Elasticity Manager for Cloud-Based Key-Value Stores (A. Al-Shishtawy) 9/33
IntroductionElasticity Controller for Cloud-Based Key-Value Stores
EvaluationConclusions and Future Work
Autonomic ComputingCloud Computing and Elastic ServicesAutomation of ElasticityFeedback versus Feedforward Control
Elasticity versus Static Provisioning
ElastMan: Autonomic Elasticity Manager for Cloud-Based Key-Value Stores (A. Al-Shishtawy) 10/33
IntroductionElasticity Controller for Cloud-Based Key-Value Stores
EvaluationConclusions and Future Work
Autonomic ComputingCloud Computing and Elastic ServicesAutomation of ElasticityFeedback versus Feedforward Control
Automation of Elasticity
Elasticity can be controlled either manually (by the sys-admin) orautomatically (by a autonomic manager)Automation of elasticity can be achieved by providing an ElasticityController
Helps to avoid SLO violations while keeping the cost lowAutomatically adds/removes VMs (servers, service instances) inresponse to changes in some SLO metrics, e.g., request latency,caused by changes in workload
Can be built using elements of Control Theory and/or MachineLearning
Feedback-loop (a.k.a. closed-loop) controlModel Predictive Control (MPC)
ElastMan: Autonomic Elasticity Manager for Cloud-Based Key-Value Stores (A. Al-Shishtawy) 11/33
IntroductionElasticity Controller for Cloud-Based Key-Value Stores
EvaluationConclusions and Future Work
Autonomic ComputingCloud Computing and Elastic ServicesAutomation of ElasticityFeedback versus Feedforward Control
Automation of Elasticity
Pay-as-you-go + Elastic services + Web 2.0 dynamic workload==> Elasticity Controller
ElastMan: Autonomic Elasticity Manager for Cloud-Based Key-Value Stores (A. Al-Shishtawy) 12/33
IntroductionElasticity Controller for Cloud-Based Key-Value Stores
EvaluationConclusions and Future Work
Autonomic ComputingCloud Computing and Elastic ServicesAutomation of ElasticityFeedback versus Feedforward Control
Feedback Control
SystemFeedbackController
+-SystemOutput
ControlInput
ErrorDesiredValue
The system’s output (e.g., response time) is being monitored
Controller calculates the control error
Controller changes the control input (e.g., number of servers toadd or remove) according to the amount and sign of the controlerror
Advantage: controller can adapt to disturbance
Disadvantages: oscillation, overshoot, possible instability
ElastMan: Autonomic Elasticity Manager for Cloud-Based Key-Value Stores (A. Al-Shishtawy) 13/33
IntroductionElasticity Controller for Cloud-Based Key-Value Stores
EvaluationConclusions and Future Work
Autonomic ComputingCloud Computing and Elastic ServicesAutomation of ElasticityFeedback versus Feedforward Control
Feedforward Control
SystemFeedforwardController
SystemOutput
ControlInput
SystemState
The system’s output is not monitored
Other system states and variables are monitored
Controller relies on a model of the system to calculate necessarychange
Advantages: faster and avoids oscillation and overshoot
Disadvantages: sensitive to unexpected disturbances that are notmodeled
ElastMan: Autonomic Elasticity Manager for Cloud-Based Key-Value Stores (A. Al-Shishtawy) 14/33
IntroductionElasticity Controller for Cloud-Based Key-Value Stores
EvaluationConclusions and Future Work
Problem DefinitionChallengesElastMan
Outline
1 Introduction
2 Elasticity Controller for Cloud-Based Key-Value StoresProblem DefinitionChallengesElastMan
3 Evaluation
4 Conclusions and Future Work
ElastMan: Autonomic Elasticity Manager for Cloud-Based Key-Value Stores (A. Al-Shishtawy) 15/33
IntroductionElasticity Controller for Cloud-Based Key-Value Stores
EvaluationConclusions and Future Work
Problem DefinitionChallengesElastMan
Target System
P
A
DElasticityController
PP P
A A
DDD D
Presentation Tier
Application Tier
Data Tier
P
P
P
A
A
A
D
D D
D
D
C
P
A
A
D
D
AA
Multi-Tier Web 2.0 Application
Each server is a Virtual Machine running ona physical machine in a Cloud environment
Horizontal Scalability(add more servers)
Deployed in aCloud Environment
Public / Private Cloud Environment
PhysicalMachine
VirtualMachineHostinga Server
ElastMan: Autonomic Elasticity Manager for Cloud-Based Key-Value Stores (A. Al-Shishtawy) 16/33
IntroductionElasticity Controller for Cloud-Based Key-Value Stores
EvaluationConclusions and Future Work
Problem DefinitionChallengesElastMan
Minimum Requirements
Horizontal scalability
Sensors to monitor workload and performance (e.g., read latency)
Actuators to add/remove storage servers
Rebalancing
ElastMan: Autonomic Elasticity Manager for Cloud-Based Key-Value Stores (A. Al-Shishtawy) 17/33
IntroductionElasticity Controller for Cloud-Based Key-Value Stores
EvaluationConclusions and Future Work
Problem DefinitionChallengesElastMan
Elasticity Controller
Objective
Control the elasticity of Cloud-based key-value stores byadding/removing resources in order to meet SLOs at a minimal cost
SLO examples
Average read latency in one minute interval is less than 10ms
99% of reads in one minute interval are performed in less than10ms per read
ElastMan: Autonomic Elasticity Manager for Cloud-Based Key-Value Stores (A. Al-Shishtawy) 18/33
IntroductionElasticity Controller for Cloud-Based Key-Value Stores
EvaluationConclusions and Future Work
Problem DefinitionChallengesElastMan
Challenges to Design an Elasticity Controller for Storage
Actuator delays due to data movement (rebalancing)
Interference with applications and sensor measurements
Discrete storage units
Nonlinearity due to diminishing reward of adding a storage unitwith increasing scale
99th percentile is a relatively noisy signal
VM performance is difficult to model and predict
Highly dynamic workload that is composed of both gradual(diurnal) and sudden (spikes) variations
ElastMan: Autonomic Elasticity Manager for Cloud-Based Key-Value Stores (A. Al-Shishtawy) 19/33
IntroductionElasticity Controller for Cloud-Based Key-Value Stores
EvaluationConclusions and Future Work
Problem DefinitionChallengesElastMan
ElastMan: Combining Feedback and Feedforward Control
FeedbackMonitor 99th percentile
Classical PI controller
We use it to handle diurnalworkloads
Can tolerate and adapt tomodeling errors
FeedforwardMonitor workload
A binary classifier usinglogistic regression
We use it to handle spikes inworkload
Allows us to smooth the noisy99th percentile signal
Changes in workload can be:
Slow day-night changes (diurnal)
Rapid changes (spikes)
ElastMan: Autonomic Elasticity Manager for Cloud-Based Key-Value Stores (A. Al-Shishtawy) 20/33
IntroductionElasticity Controller for Cloud-Based Key-Value Stores
EvaluationConclusions and Future Work
Problem DefinitionChallengesElastMan
Modeling the Store
Typical approach
Key-ValueStore
System Output99th percentile of reads
Control InputNumber of Servers
Workload(noise)
Non linear (1+1 vs. 100+1)
Workload treated as noise
ElastMan approach
Key-ValueStore
System Output99th percentile of reads
Control InputAverage Workload
per Server
Control the number of servers indirectly by controlling theaverage workload per server
Relies on near linear scalability of key-value stores
ElastMan: Autonomic Elasticity Manager for Cloud-Based Key-Value Stores (A. Al-Shishtawy) 21/33
IntroductionElasticity Controller for Cloud-Based Key-Value Stores
EvaluationConclusions and Future Work
Problem DefinitionChallengesElastMan
Feedback and Feedforward Controllers
+ _
SLO (Desired
99th Percentile
of Read Latency)
Error PI
Controller
New Average
Throughput
per ServerActuator
Key-Value
Store
New Number
of Nodes
Smoothing
Filter
Measured 99th Percentile
of Read Latency
Target System
FF
Binary
Classifier
New Average
Throughput
per ServerActuator Key-Value
Store
New Number
of NodesMeasured 99th
Percentile of
Read Latency
Measured Average
Throughput
per Server
Target System
ElastMan: Autonomic Elasticity Manager for Cloud-Based Key-Value Stores (A. Al-Shishtawy) 22/33
IntroductionElasticity Controller for Cloud-Based Key-Value Stores
EvaluationConclusions and Future Work
Problem DefinitionChallengesElastMan
Binary Classifier
0
500
1000
1500
2000
0 200 400 600 800 1000 1200 1400 1600 1800 2000 2200
Write
Thro
ughput (o
ps/s
econd/s
erv
er)
Read Throughput (ops/second/server)
Training Data and Model
Violate SLOSatisfy SLO
Model
ElastMan: Autonomic Elasticity Manager for Cloud-Based Key-Value Stores (A. Al-Shishtawy) 23/33
Combined Feedback and Feedforward Flow Chart
Measure
Average Throughput per server (tp)
and 99 percentile of read latency (R99p)
Error in
dead-zone
Yes
Rebalancing
No
Large change
in Throughput
Use Feedforward
FF(tp)
Binary Classifier
Yes
No
Use Feedback
FB(fp99)
PID Controller
No
Filter rp99
fR99p=f(R99p)
Do nothing!
If storage supports
rebalance restart then
use Feedforward controller
designed for the store
in rebalance mode FFR(tp)
Yes
Start Controller
Calculate number of
new VMs
new_VMs = total_throughput/new_throughput_per_server
subject to:
replication_degree <= new_VMs <= max_VMs
Start rebalance instance
rebalance(n)
End
IntroductionElasticity Controller for Cloud-Based Key-Value Stores
EvaluationConclusions and Future Work
Outline
1 Introduction
2 Elasticity Controller for Cloud-Based Key-Value Stores
3 Evaluation
4 Conclusions and Future Work
ElastMan: Autonomic Elasticity Manager for Cloud-Based Key-Value Stores (A. Al-Shishtawy) 25/33
IntroductionElasticity Controller for Cloud-Based Key-Value Stores
EvaluationConclusions and Future Work
Evaluation
Implemented ElastMan Elasticity Controller
Evaluated with LinkedIn’s Voldemort key-value store
Deployed in our private Cloud based on OpenStack
ElastMan: Autonomic Elasticity Manager for Cloud-Based Key-Value Stores (A. Al-Shishtawy) 26/33
IntroductionElasticity Controller for Cloud-Based Key-Value Stores
EvaluationConclusions and Future Work
Elasticlity Controller for the Voldemort Key-Value Store
ElastMan: Autonomic Elasticity Manager for Cloud-Based Key-Value Stores (A. Al-Shishtawy) 27/33
Voldemont performance with fixed number of servers (18)
0
2
4
6
8
10
0 200 400 600 800 1000 1200 1400
Elasticity Controller
Read p99 (ms)Desired
0
10
20
30
40
50
60
70
0 200 400 600 800 1000 1200 1400
VMs
Total Throughput (K requests/sec)Number of Servers
Voldemont performance with fixed number of servers (18)
0
2
4
6
8
10
0 200 400 600 800 1000 1200 1400
Elasticity Controller
Read p99 (ms)Desired
0
10
20
30
40
50
60
70
0 200 400 600 800 1000 1200 1400
VMs
Total Throughput (K requests/sec)Number of Servers
Diurnal Workload Spikes
ElastMan controller performance under gradual (diurnal)workload
0
2
4
6
8
10
0 100 200 300 400 500 600 700 800 900
Elasticity Controller
Read p99 (ms)Desired
0
10
20
30
40
50
60
70
80
0 100 200 300 400 500 600 700 800 900
VMs
Total Throughput (K requests/sec)Number of Servers
ElastMan controller performance with rapid changes (spikes)in workload
0
2
4
6
8
10
900 1000 1100 1200 1300 1400 1500
Elasticity Controller
Read p99 (ms)Desired
0
10
20
30
40
50
60
70
80
900 1000 1100 1200 1300 1400 1500
VMs
Total Throughput (K requests/sec)Number of Servers
IntroductionElasticity Controller for Cloud-Based Key-Value Stores
EvaluationConclusions and Future Work
Outline
1 Introduction
2 Elasticity Controller for Cloud-Based Key-Value Stores
3 Evaluation
4 Conclusions and Future Work
ElastMan: Autonomic Elasticity Manager for Cloud-Based Key-Value Stores (A. Al-Shishtawy) 31/33
IntroductionElasticity Controller for Cloud-Based Key-Value Stores
EvaluationConclusions and Future Work
Conclusions
ElastMan addresses the challenges of the variable performanceof Cloud VMs, dynamic workload, and stringent performancerequirements
ElastMan combines and leverages the advantages of bothfeedback and feedforward control
feedforward control quickly respond to rapid changes in workload
feedback controller handle diurnal workload and to correctmodeling errors in the feedforward control
Evaluation results show the feasibility of our approach
ElastMan: Autonomic Elasticity Manager for Cloud-Based Key-Value Stores (A. Al-Shishtawy) 32/33
IntroductionElasticity Controller for Cloud-Based Key-Value Stores
EvaluationConclusions and Future Work
Future Work
Investigate the controllers needed to control all tiers of a Web 2.0application and the orchestration of the controllers in order tocorrectly achieve their goals
Provide a feedforward controller for the store during rebalancing
Distributed version of ElastMan
ElastMan: Autonomic Elasticity Manager for Cloud-Based Key-Value Stores (A. Al-Shishtawy) 33/33