Friendly Virtual Machines Zhang,Bestavros etc., Boston Univ. ACM/USENIX VEE 2005
description
Transcript of Friendly Virtual Machines Zhang,Bestavros etc., Boston Univ. ACM/USENIX VEE 2005
Friendly Virtual Machines
Zhang,Bestavros etc., Boston Univ.ACM/USENIX VEE 2005
CSE 598c April 17, 2006
Bhuvan Urgaonkar
Problem Setting• Growing trend of hosting applications at
third-party platforms• Two challenges
• Isolation, security to co-located applications• Efficient and fair resource allocation
• Virtualization seen as a promising approach for isolation
• What about resource allocation?
Challenge - Resource Allocation in Hosting
Environments• Traditional solutions
• Over-provisioning => wasteful• Fair schedulers in the OS, dynamic provisioning,
admission control• Complex• Deprive the application of meaningfully adapting
its behavior to match available resources• Against the famous end-to-end argument
developed in the networking community
End-to-end Argument• Clark et. al
• A functionality should be pushed to the higher layer whenever possible• IP network implements packet forwarding, leaving
congestion control to end systems• When applied to hosting platforms
• Let the applications decide how many resources they need
How do VMs make end-end idea realizable?
• In a traditional hosting system, applications would have to be modified• Always undesirable, often impossible
• In a virtualized hosting system• VMM is like OS, guest OS is like application• Guest OS modification not so unacceptable
• E.g., Xen, Denali• Main idea: It is possible to achieve good efficiency
and fairness using “friendly” virtual machines
Outline• Motivation• Approach• Implementation• Evaluation• Conclusions
Friendly Virtual Machine• Not malicious• Dynamically adapts its resource needs to
system conditions• Inspiration: AIMD congestion control in TCP• Gradually increase resource requirements, back-
off when resource contention increases• How a TCP researcher would approach the
resource management problem in data centers
System Goals• Efficiency
• Resources should not be overloaded• E.g., Heavy paging during overload => low
throughput• Resources should not be unnecessarily
underutilized• Fairness
• Each VM is allocated a proportional share of the bottleneck resource for that VM
Overload Detection• Unlike TCP, there are multiple resources to
consider• CPU, virtual memory, network bandwidth
• Resource utilization metrics not reliable• E.g., CPU util may be high but the bottleneck
may be the memory sub-system• Use application-centric metrics like
response time or throughput
Overload Detection• Virtual Clock Time (VCT)
• Real time interval between consecutive virtual clock cycles
• Bottleneck resource• The resource that is the first to trigger a significant
increase in VCT• Bottleneck-equivalence classes• Detection: Measure the ratio of current VCT to
minimum VCT observed• Compare with a threshold (2)
Adaptation Mechanisms• Control number of processes/threads
• In practice, suspending running processes may not be a good idea
• Alternatives • Suspend less important (e.g., younger) processes• Don’t allow new processes instead of suspending existing
ones• Rate control by forcing VM to sleep• Follow an AIMD style adaptation that converges to
fair/efficient allocation• Paper presents a control-theoretic model to prove
convergence/stability properties
Salient Features• Underlying system requirements
• Schedulers should be unbiased like round-robin, unlike multi-level feedback
• VMM should implement resource policing to enforce AIMD behavior
• Various adaptation strategies can co-exist• Think TCP-reno, TCP-tahoe, …
• Suggestion: VMM could provide incentives for friendly behavior
Discussion• Is this system practical?
• Rate of adaptation• Would it be fast enough for hosted applications?• Applications need resources soon after overload starts• How would the system behave with biased schedulers?
• Can the adaptation mechanism be extended to handle different levels of importance?• This system might punish an application precisely when it
is crucial for it to service its workload• E.g., An e-commerce app during Thanksgiving
• Global knowledge can be crucial for efficiency• E.g., LRU page replacement
• Security, isolation• To me it seems this would be as secure as a system with a
more heavy-weight VMM
Implementation• User-mode Linux• Implement adaptation of number of
processes and rate control• 500 lines of code
Outline• Motivation• Approach• Implementation• Evaluation• Conclusions
Memory intensive benchmark - Performance metrics vs # VMs
• Linux suspends processes arbitrarily when excessive thrashing occurs, their system spreads the punishment evenly
Benchmark - Performance metrics vs # threads/VM (2
VMs)
• Graceful degradation
How not to do Evaluation
• No confidence intervals!• Observations for light loads are meaningless
• Pick someone your own size• Of course, their system is better than vanilla
UML, so what?• Should have compared with a system that
implements fair schedulers
Apache - 4 VMs
• Graceful degradation
Evolution of VCT w/ UML
• Unfairness at high load
Evolution of VCT w FVM
• Fair CPU allocation
Tput per VM w UML
• Unfairness at high load• Unfair CPU allocations due to different paging treatment and process suspension
Per VM Tput w FVM
• Fair behavior
Conclusions• Distributed, application-driven
resource allocation• (+) Cool idea• (-) Needs more research to be
convincing• Experimental evaluation not
satisfactory
More Discussion