Post on 02-Jan-2016
Carrying Your Environment With Youor
Virtual Machine Migration Abstraction for Research Computing
Many research computing tools have specific environment requirements.LibrariesFilesystem structureKernel flags or settingsRoot access or tools
These requirements can limit the systems that an application can be run on.
Thus, it is desirable to have a reasonably transparent way for users to migrate applications to remote systems that will run on predictable virtual systems – either to increase performance or to provide a required capability.
The Problem: Environment Portability
A command line interface that requires no more data than a normal Condor user will provide, but which allows virtual machines to be easily integrated into distributed jobs.
This requires:A compatibility assessment toolA UML migration and preparation subsystemTest and verification capabilities
The Solution: A Virtual Machine Deployment Abstraction Framework
Perl commandline interfaceFlags allow control of virtual machine deployment
requirementsUser Mode Linux virtual machines
Pre-built VM allows user to simply specify application and data which will be attached to the VM and deployed
User space execution makes VM very portableProvides known hardware/software configuration for
applications that are sensitive to these variables.Condor
Job distribution framework allows us to treat UML VMs as applications which run tasks.
Master/Worker control or other batch systems are a viable alternative.
The Recipe:
Condor Overview
Do user-mode virtual machines have reasonable performance for this purpose?
What costs are incurred in migration?What elements are critical to determine if an
environment is similar enough?How do you migrate Virtual Machines
without serious bandwidth issues?How effectively can VM migration be
abstracted, and how much must a user know about their requirements?
Key Issues
We have presumed that it is reasonable to expect that a researcher will know their application’s general requirements.Architecture, specific libraries, etc.
There are many possible ways to measure compatibility, so a flexible syntax must be providedSimple checks, such as Linux versionMore complex checks such as specific library versionsKernel settingsMemory and tmpfs
A method to specify “required” versus “report differences” is desirable in some circumstances.
Abstracting Virtual Machine Migration
Results: Compatibility Checks
VM Performance – synthetic workload
VM Native
grep “GAATTCATTCCTACCTGGGT” M.fastq | wc –lM.fastq = 3.2 GB text file
Machine VM Native Slowdown
8-core, 64-bit 143 27.5 5.2
2-core, 32-bit 142 65 2.2
4-core, 64-bit 366 82 4.5
VM Performance – research workload
VM463 s. average320 s. average
Native128 s. average194 s. average
Research workload – 32-bit application comparing an 8kb file against ~5GB
Machine VM Native Slowdown
8-core, 64-bit 348 215.5 1.6
2-core, 32-bit 548 400 1.4
4-core, 64-bit 555 262 2.1
Results: Migration and Initialization
1 2 3 40
500
1000
1500
2000
2500
VM Initialization Costs
VM InitializationVM Data De-compression timeVM Package Transfer Time
Time in seconds
Darro
w -> D
vora
k
Grieg
-> D
vora
k
Comca
st ->
Dar
row
Darro
w -> C
omca
st0
2000
4000
6000
8000
10000
12000
Speed (KB/s)
Speed (KB/s)
Hosts
Speed in KB/s
Data and VM decompression outweigh transfer on fast networks, and vice versa on slow networks.
For performance, jobs should only be run in VM if the job length exceeds total cost of transfer and initialization after accounting for slowdown and the number of machines is larger than the slowdown ratio for that job.
In general, once you have accounted for this:
Results: User Interface for VM abstraction
MachinesHost Equivalent Slowdown
Machines Virtual ofNumber
Improve compatibility checksInvestigate optimized VM migration/provisioning
methods (support for COW where distributed filesystems are available)
Investigate use of the Master/Worker paradigm to improve distribution and workload capability
Build method to handle “Ignorant” users by providing automating testing of applications in a very limited manner. Test the application in available VMs, and verify
results for the sample, then deploy in that mode.
Next Steps
Questions?
Distributed filesystems allow massive improvements in initialization speed and data distribution times.UML supports COW – a method that allows a
single VM filesystem to be shared by many VMs.
HostFS support means that the VM doesn’t have to be aware of the network or the filesystem itself.
Advantages of Master/Worker vs. normal CondorOn job completion, Condor will remove the VM,
requiring re-distribution if all tasks are not completed in a single use of the VM.
VM Migration on Batch Systems