Virtual Machine Live Migration in Cloud Computing
Transcript of Virtual Machine Live Migration in Cloud Computing
Virtual Machine Live Migration in Cloud Computing
Jie Zheng
Abstract
Cloud computing services have experienced rapid growth. Virtualization, a key technology
for cloud computing, provides an abstraction to hide the complexity of underlying hard-
ware or software. The management of a pool of virtualized resources requires the ability to
flexibly map and move applications and their data across and within pools. Live migration,
which enables such management capabilities, ushers in unprecedented flexibility for busi-
nesses. To unleash the benefits, commercial products already enable the live migration of
full virtual machines between distant cloud datacenters.
Unfortunately, two problems exist. First, no live migration progress management sys-
tem exists, leading to (1) guesswork over how long a migration might take and the inability
to schedule dependent tasks accordingly; (2) inability to balance application performance
and migration time – e.g. to finish migration later for less performance interference.
Second, multi-tier application architectures are widely employed in today’s virtualized
cloud computing environments. Although existing solutions are able to migrate a single
VM efficiently, little attention has been devoted to migrating related VMs in multi-tier ap-
plications. Application components could become split over distant cloud datacenters for
an arbitrary period during migration and that causes unacceptable application degradations.
Ignoring the relatedness of VMs during migration can lead to serious application perfor-
mance degradation.
In this thesis, the first contribution is Pacer – the first migration process management
system capable of accurately predicting the migration time and managing the progress so
that the actual migration finishing time is as close to a desired finish time as possible.
Pacer’s techniques are based on robust and lightweight run-time measurements of system
and workload characteristics, efficient and accurate analytic models for progress predic-
tions, and online adaptation to maintain user-defined migration objectives for timely mi-
grations.
The second contribution is COMMA – the first coordinated live migration system of
multi-tier applications. We formulate the multi-tier application migration problem, and
present a new communication-cost-driven coordinated approach, as well as a fully imple-
mented system on KVM that realizes this approach. COMMA is based on a two-stage
scheduling algorithm for coordinating the migration of VMs that aims to minimize migra-
tion’s impact on inter-component communications. COMMA focuses on the coordination
of multiple VM’s migration where each VM’s migration progress is handled by Pacer.
COMMA’s scheduling algorithm has low computational complexity; as a result, COMMA
is highly efficient.
Through extensive experiments including a real-world commercial cloud scenario with
Amazon EC2, we show that Pacer is highly effective in predicting migration time and con-
trolling migration progress and COMMA is highly effective in decreasing the performance
degradation cost and minimizing migration’s impact on inter-component communications.
We believe this thesis will have far reaching impact. COMMA and Pacer are applicable
to all sorts of intra-data center and inter-data center VMmigration scenarios. Together, they
solve some of the most vexing VMmigration management problems faced by operators to-
day. The techniques underlying Pacer and COMMA are not specific to the KVM platform.
The techniques can easily be ported to other virtualization platforms such as VMware, Xen
and Hyper-V.
iv
Acknowledgments
My foremost thank goes to my advisor Professor T. S. Eugene Ng. I thank him for all of
his help, inspiration and guidance in my graduate study. He is the best advisor I can ever
imagine. I thank him for his patience and encouragement that always carried me through
difficult times, and for his insights and suggestions that helped to shape my research skills.
His passion for science has influenced me a lot. His valuable feedback contributed greatly
to this thesis.
I wish to express my sincere gratitude to Dr. Kunwadee (Kay) Sripanidkulchai. I had
the fortune to work with Kay during my summer internship at IBM. She helped me on every
aspect of the research related to this thesis. She taught me a vast amount of knowledge
in the areas of cloud computing and machine virtualization and introduced me to many
advanced techniques. I really appreciate her sound advice, good company, interesting ideas
and suggestions. Without the help from Eugene and Kay, this thesis would not have been
possible.
I am grateful to my team member Zhaolei (Fred) Liu for his help in setting up the test
bed on Amazon EC2. The demonstration for our system on the commercial public cloud
elevates our system to a new level. The discussion with Fred helped a lot in my research
and thesis writing and made our collaboration the most productive part.
I wish to thank Professor Alan L. Cox who helped me setup the experiment environment
and suggested me to use VMmark to explore the I/O patterns. Before this thesis, I had
the opportunity to work with Alan on another project. Alan is a very knowledgeable and
friendly professor. I am often impressed by his logical thoughts and wise solutions to
difficult research questions.
I want to thank Professor Edward W. Knightly and Christopher Jermaine for serving on
my Ph.D. thesis committee and asking many insightful questions that helped to shape this
thesis.
I also want to thank many friends in our research group. They are Bo Zhang, Guohui
Wang, Zheng Cai, Florin Dinu, Yiting Xia and Xiaoye Sun. I enjoyed all the vivid discus-
sions we had on various topics and had lots of fun being a member of this fantastic group.
They always gave me instant help when I asked.
Last but not the least, I would like to thank my parents and my best friends who have
supported me spiritually throughout my life.
Contents
Abstract ii
List of Illustrations ix
List of Tables xi
1 Introduction 11.1 Live Migration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Lack of migration progress management . . . . . . . . . . . . . . . . . . . 7
1.3 Lack of coordination for multi-tier application migration . . . . . . . . . . 9
1.4 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.5 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2 Background 182.1 Process Migration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.2 Live Migration of Virtual Machine . . . . . . . . . . . . . . . . . . . . . . 19
2.2.1 VM Memory/CPU Migration . . . . . . . . . . . . . . . . . . . . 20
2.2.2 Network Connection Migration . . . . . . . . . . . . . . . . . . . 20
2.2.3 Storage Migration . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.2.4 Full VM Migration . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.3 Optimization of Live Migration . . . . . . . . . . . . . . . . . . . . . . . . 24
2.3.1 Compression and Deduplication . . . . . . . . . . . . . . . . . . . 24
2.3.2 Reordering Migrated Block Sequence . . . . . . . . . . . . . . . . 25
3 Migration Time Prediction and Control 273.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
vii
3.1.1 Predicting Migration Time . . . . . . . . . . . . . . . . . . . . . . 27
3.1.2 Controlling Migration Time . . . . . . . . . . . . . . . . . . . . . 28
3.2 Predicting Migration Time . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.2.1 Migration Time Model . . . . . . . . . . . . . . . . . . . . . . . . 29
3.2.2 Dirty Set and Dirty Rate Estimation . . . . . . . . . . . . . . . . . 33
3.2.3 Speed Measurement . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.3 Controlling Migration Time . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.3.1 Solving for Speeds in Each Phase of Migration . . . . . . . . . . . 41
3.3.2 Maximal Feasible Speed Estimation and Speed Tuning . . . . . . . 46
3.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.4.1 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.4.2 Experiment Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.4.3 Prediction of migration time . . . . . . . . . . . . . . . . . . . . . 51
3.4.4 Best-effort migration time control . . . . . . . . . . . . . . . . . . 56
3.4.5 Overhead of Pacer . . . . . . . . . . . . . . . . . . . . . . . . . . 63
3.4.6 Potential robustness improvements . . . . . . . . . . . . . . . . . . 63
3.5 EC2 Demonstration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
3.5.1 Network and Disk Speed Measurements . . . . . . . . . . . . . . . 65
3.5.2 Use Case 1: Prediction of Migration Time . . . . . . . . . . . . . . 66
3.5.3 Use Case 2: Best-effort Migration Time Control . . . . . . . . . . 66
3.6 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
3.6.1 Live Migration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
3.6.2 I/O Interference in Virtualized Environment . . . . . . . . . . . . . 67
3.6.3 Data Migration Technologies . . . . . . . . . . . . . . . . . . . . . 68
3.6.4 Performance Modeling and Measurement . . . . . . . . . . . . . . 68
4 Coordinated Migration of Multi-tier Applications 704.1 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
viii
4.2 Quantitative Impact of Uncoordinated Multi-tier Application Migration . . 73
4.3 System Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
4.3.1 Subsystem: Pacer . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
4.3.2 Challenges and Solutions . . . . . . . . . . . . . . . . . . . . . . . 79
4.3.3 Scheduling Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 80
4.3.4 Inter-group Scheduling . . . . . . . . . . . . . . . . . . . . . . . . 82
4.3.5 Intra-group Scheduling . . . . . . . . . . . . . . . . . . . . . . . . 85
4.3.6 Adapting to changing dirty rate and bandwidth . . . . . . . . . . . 89
4.3.7 Putting it all together . . . . . . . . . . . . . . . . . . . . . . . . . 89
4.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
4.4.1 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
4.4.2 Experiment Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
4.4.3 Migration of a 3-tier Application . . . . . . . . . . . . . . . . . . . 91
4.4.4 Manual Adjustment does not Work . . . . . . . . . . . . . . . . . 91
4.4.5 Algorithms in Inter-group Scheduling . . . . . . . . . . . . . . . . 93
4.5 EC2 demonstration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
4.6 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
5 Conclusion and Future Work 985.1 Migration Progress Management with Optimization Techniques . . . . . . 99
5.2 Migration Progress Management with Migration Planning . . . . . . . . . 100
5.3 Migration Progress Management with Task Prioritization . . . . . . . . . . 101
Bibliography 102
Illustrations
1.1 Beneficial usage scenarios of HCC. . . . . . . . . . . . . . . . . . . . . . . 4
1.2 The progress of live migration . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 An example of multi-tier application migration . . . . . . . . . . . . . . . 10
3.1 Pacer design overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.2 An example of disk dirty set estimation. . . . . . . . . . . . . . . . . . . . 35
3.3 An example of sampling for memory dirty rate estimation . . . . . . . . . . 37
3.4 Trade-off of sampling interval . . . . . . . . . . . . . . . . . . . . . . . . 38
3.5 Each round of adaption for controlling migration time . . . . . . . . . . . . 41
3.6 An example of migration speeds in different phases. . . . . . . . . . . . . . 45
3.7 The prediction of a VM (file server-30clients) migration. Pacer achieves
accurate prediction from the very beginning of the migration. . . . . . . . . 53
3.8 Migration with different desired finish times. Pacer almost matches the
ideal case when the desired time is larger than 176s. The deviation is very
small in [-2s,2s]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.9 Migration with different degrees of workload intensity. Any point in the
feasible region can be realized by Pacer. The lower bound for migration
time is limited by I/O bottleneck. Default QEMU can only follow a
narrow curve in the region. . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.10 VM migration from Rice campus to Amazon EC2. . . . . . . . . . . . . . 64
4.1 Sequential and parallel migration of a 3-tier web application across clouds. 71
x
4.2 An example about cost computation for 3 VMs . . . . . . . . . . . . . . . 73
4.3 Examples of multi-tier web services. . . . . . . . . . . . . . . . . . . . . . 77
4.4 An example of coordinating the migration with COMMA . . . . . . . . . . 77
4.5 An example of valid group. . . . . . . . . . . . . . . . . . . . . . . . . . . 81
4.6 An example for heuristic algorithm . . . . . . . . . . . . . . . . . . . . . . 84
4.7 Intra-group scheduling. (a) Start VM migrations at the same time, but
finish at different times. Result in long performance degradation time. (b)
Start VM migrations at the same time and finish at the same time. Result
in long migration time due to the inefficient use of migration bandwidth.
(c) Start VM migrations at different times and finish at the same time. No
performance degradation and short migration time due to efficient use of
migration bandwidth. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
4.8 An example of delaying the start of dirty iteration for the migration. . . . . 88
4.9 An example of scheduling algorithm (Put all together) . . . . . . . . . . . . 90
4.10 Computation time for brute-force algorithm and heuristic algorithm . . . . 95
Tables
1.1 Application moving approaches for stateless and stateful servers. . . . . . . 5
3.1 Migrated data and speed in four phases of migration . . . . . . . . . . . . . 30
3.2 Variable definitions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.3 VMmark workload summary. . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.4 Prediction errors for the VM size-based predictor and the progress meter
are several orders of magnitude higher than Pacer. . . . . . . . . . . . . . . 52
3.5 Prediction with Pacer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.6 Migration time deviation for Pacer is much smaller than the controller
without dirty block prediction. . . . . . . . . . . . . . . . . . . . . . . . . 57
3.7 Deviation of migration time on Pacer with different workload intensities.
The number in the bracket represents the worst earliest and latest
deviation in Pacer. For example, [−1, 1] means at most early by 1s and
late by 1s. ”-” means the time is beyond the feasible region. . . . . . . . . . 60
3.8 Migration time on different types of workload. Pacer can achieve the
desired migration time. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
3.9 Migration time for Pacer when the additional competing traffic varies.
Pacer can achieve the desired migration time with a small finish time
deviation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.10 Importance of dynamic dirty set and dirty rate prediction. Without any of
these algorithms, it is hard to achieve desired migration time. . . . . . . . . 62
3.11 Importance of speed scaling up algorithm. . . . . . . . . . . . . . . . . . . 62
xii
3.12 Prediction accuracy with Pacer. . . . . . . . . . . . . . . . . . . . . . . . . 65
3.13 Migration time control in EC2. . . . . . . . . . . . . . . . . . . . . . . . . 66
4.1 Example VM and workload parameters. Dirty set is defined as the data
bytes written on the VM’s virtual disk at the end of disk image copy. Dirty
rate is defined as the speed at which VM’s virtual disk and memory is
written. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4.2 Degradation time with sequential and parallel migration. INF means
migration could not converge and thus the migration time is infinite. . . . . 76
4.3 Comparisons of three approaches on 3-tier applications. {...} represents
the VM set on one physical machine. . . . . . . . . . . . . . . . . . . . . . 92
4.4 Manual adjustment on the configured speed is hard in achieving low cost
and small migration time. . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
4.5 Performance degradation cost (MB) with different migration approaches . . 94
4.6 Migration methods on EC2 experiment. . . . . . . . . . . . . . . . . . . . 97
1
Chapter 1
Introduction
Cloud computing, often referred to as simply ”the cloud”, is the delivery of on-demand
computing resources over the Internet and on a pay-for-use basis [Def]. No matter in in-
dustry or academia, cloud computing has attracted significant attention. A research report
sponsored by enterprise focused cloud computing firm, Virtustream [Virb] shows that the
cloud hits the mainstream and more than half of U.S. Businesses now use cloud comput-
ing [Cloa]. Cloud computing services have experienced rapid growth. The public cloud
services market is expected to grow to $206.6 billion by 2016 [Gar12]. Internet and busi-
ness applications are increasingly being moved to the cloud to maximize the effectiveness
of shared resources and economies of scale. Some clouds are operated by service providers,
such as Amazon [Amaa] and IBM [IBM] that offer storage and virtual servers to customers
at a low price on demand. Some clouds are built to deliver development environments as a
service, such as Google App Engine [Goo].
Cloud service usually runs in data centers. Current data centers can contain tens or hun-
dreds of thousands of servers. The main purpose of a data center is running the applications
that handle the core business and operational data of the organization. Often these appli-
cations will be composed of multiple hosts, each running a single component. Common
components of such applications are databases, file servers, application servers, middle-
ware, and various others [Wikb]. For example, components of a multi-tier e-commerce
application [wikd] may include web servers, database servers and application servers. Web
server works as a presentation tier which displays information related to services. It com-
municates with other tiers by outputting results to the clients. Application server works
as a logical tier which pulls data out from the presentation tier and controls an applica-
2
tion’s functionality by performing detailed processing. Database server works as a data
tier which stores data and keeps data neutral and independent from application servers or
business logic [wikd].
The main enabling technology for cloud computing is virtualization which abstracts
the physical infrastructure and makes it available as a soft component that is easy to isolate
and partition physical resources [Wika]. It hides the complexity of underlying hardware or
software [Poe09]. The management of a pool of virtualized resources requires the ability
to flexibly map and move applications and their data across and within pools [WSKdM11].
Usually there are multiple virtual machines (VM) running on a single physical machine.
Therefore, it provides an effective way to consolidate hardware to get vastly higher produc-
tivity from fewer servers. Cloud computing uses virtualization for efficiency, higher avail-
ability and lower costs. Virtualization also speeds and simplifies IT management, mainte-
nance and the deployment of new applications [Vmwa]. In cloud computing, a hypervisor
or virtual machine monitor (VMM) is a piece of software that creates, runs and manages
VMs. KVM [KVM], XEN [XEN09], VMware ESX [VMwb] and Hyper-V [Mic12] are
four popular hypervisors.
As data centers continue to deploy virtualized services, there are many scenarios that
require moving VMs from one physical machine to another in the same data center or even
across different data centers. We list some examples as follows.
• Planned maintenance: To maintain high performance and availability, virtual ma-
chines (VMs) needs to be migrated from one cloud to another to leverage better
resource availability, avoid down-time caused by hardware maintenance, and save
more power in the source cloud. If a physical machine requires software or hardware
maintenance, the administrator could migrate all the VMs running on that machine
to other physical machines to release the original machine [CFH+05].
• Load balancing: VMs may be rearranged across different physical machines in a
cluster to relieve load on congested hosts [CFH+05]. A workload increase of a virtual
server can be handled by increasing the resources allocated to the virtual server under
3
the condition that some idle resources are available on the physical server, or by
simply moving the virtual server to a less loaded physical server [WSVY07].
• Avoiding single-provider lock-in: While many cloud users’ early successes have
been realized using a single cloud provider [Blo08, Got08], the ability to use multiple
clouds to deliver services and the flexibility to move freely among different providers
are emerging requirements [AFGea09]. Users who implement their applications us-
ing one cloud provider ought to have the capability and flexibility to migrate their
applications back in-house or to other cloud providers in order to have control over
the business continuity and avoid fate-sharing with specific providers [ZNS11].
• Hybrid cloud computing (HCC): – where virtualizable compute and storage re-
sources from private datacenters and public cloud providers are seamlessly integrated
into one platform in which applications can migrate freely – is emerging as the most
preferred cloud computing paradigm for commercial enterprises according to recent
industry reports and surveys [Ash12, Bri11, Tof11]. It provides more scenarios that
require migration of VMs. This is not surprising since HCC combines the bene-
fits of public and private cloud computing, resulting in unprecedented flexibility
for CAPEX and OPEX savings, application adaptivity, disaster survivability, zero-
downtime maintenance, and privacy control. An impressive array of commercial
products that facilitates HCC is already available today. Figure 1.1 illustrates some
beneficial usage scenarios of migration in HCC. The ability to migrate applications
freely among private and public clouds, whether it is private-to-public, public-to-
private, or public-to-public, is a key to maximizing the benefits of HCC. Tried and
true virtual machine (VM) migration technologies are therefore central to the HCC
paradigm. A migration in HCC is inter-datacenters. Consequently it typically re-
quires a full migration of the VM which includes the storage of virtual machines.
• Enterprise IT Consolidation: Many enterprises working with multiple data centers
have attempted to deal with data center “sprawl” and cut costs by consolidating mul-
4
1 Provider price hike. 2 Provider discontinues service. 3 Privacy law change. 4 Provider privacy policy change.
Figure 1.1 : Beneficial usage scenarios of HCC.
tiple smaller sites into a few large data centers. The ability of moving of the service
with minimal or no down-time is attractive due to the corresponding reduction in the
disruption seen by a business [WSKdM11].
In summary, from cloud providers’ perspectives, VMs could be moved because of
maintenance, resource management, disaster planning and economic concerns. From cloud
users’ perspectives, they may want to move their VMs to another clouds that provide lower-
cost, better reliability or better performance.
There are different approaches to move VMs. It depends on whether the VM runs a
stateless server or a stateful server as Table 1.1 shows. Some applications contain stateless
servers that do not retain any session information [Wikc]. It treats each request from the
client as an independent transaction that is in no way related to any previous request. A
typical example of a stateless server is a web server with static contents. It takes in requests
as a URL that is fully conversant for a particular web page display and is independent of
the previous requests to the server. A stateless server can be easily moved. For example
the provider could provision a new server or perform live migration for the existing server.
However, this only applies to a small class of applications. The majority of enterprise
applications run on stateful servers. A stateful server remembers client states from one
5
Type Stateless Server Stateful Server
Feature No state Keep states
Typical examples Web server (static) FTP server
Database server
Mail server
Web server (dynamic)
Approach Replication Stop and copy
For Provisioning Live migration
Moving Live migration
Table 1.1 : Application moving approaches for stateless and stateful servers.
request to the next. For example, ftp server, database server, mail server and web server
with dynamic contents are all stateful servers. To move a stateful server, the old approach
is to stop the VM, copy the VM states and restart the VM in the destination. It brings a long
downtime to the application. A more attractive mechanism for moving applications is live
migration, because it is completely application independent (no matter stateless or stateful)
and it allows VMs to be transparently moved between physical hosts without interrupting
any running applications.
1.1 Live Migration
Live migration refers to the process of moving a running virtual machine or application
between different physical machines without disconnecting the client or application. Live
migration is controlled by the hypervisors running on the source and destination. Full
migration of a virtual machine includes the following aspects:
1. the running state of the virtual machine (i.e., CPU state, memory migration)
2. the storage or virtual disks used by the virtual machine
6
Figure 1.2 : The progress of live migration
3. existing client connections
Figure 1.2 shows the progress of full virtual machine migration. First, the source hy-
pervisor (Hypervisor 1) copies all the disk state of the virtual machine from source to
destination while the virtual machine is still running on the source hypervisor. If some disk
blocks change during this process, they will be re-copied again. When the number of disk
blocks is smaller than a threshold, the source hypervisor starts copying the memory state.
Once the disk and memory state have been transferred, the source hypervisor briefly pause
the virtual machine for the final transition of disk, memory, processor and network states to
the destination hypervisor (Hypervisor 2). Then the virtual machine will resume running
on the destination hypervisor.
In 2011, F5 network [Mur11] and VMware released the first product to enable the
live CPU/memory and storage migration of virtual machines across distant data centers.
That is, however, not to say live migration is a solved problem. Cloud providers use live
migration in many cloud management tasks for saving the operating cost and improving
the application performance. However, they found the existing live migration solutions
7
do not run as expected in two aspects as follows: (1) migration progress management (2)
multi-tier application migration.
1.2 Lack of migration progress management
The use of live migration in cloud computing has exposed a fundamental weakness in
existing solutions, namely the lack of migration progress management – ability to predict
migration time and control migration time. Without the capability to predict and control
migration time, management tasks may not achieve the expected performance. Let us see
the following examples.
• Case 1: A system operator plans to take down one physical machine for maintenance.
He evicts all the running VMs to another physical machine using live migration. He
tells the maintenance groups that maintenance could start at 8PM by guessing that
the migration will complete by that time. Unfortunately, some VMs’ migrations take
much longer time than expected and all dependent tasks are delayed. The entire
maintenance work-flow might be irrecoverably disrupted and the company needs to
provide excess overtime pay to system operators.
• Case 2: There are many failure detection and prediction systems proposed and ap-
plied to detect the abnormal performance of physical servers. Once the potential
failures are reported, system operators will migrate VMs as soon as possible to other
machines. However, the operator observes that migration cannot go any faster than a
static upper bound which is a configured speed in the live migration system.
• Case 3: Migration is also used in load balancing. A new IT strategy called follow-
ing the sun provisioning for project teams that span multiple continents also leverages
live migration. The scenario assumes multiple groups spanning different geographies
that are collaborating on a common project and that each group requires low-latency
access to the project applications and data during normal business hours [Woo11].
They need to decide which server to migrate and whether the migration will finish by
8
the expected time before the normal business hours start. Load balancing decisions
are time-sensitive. If a migration takes too long to complete, the resource usage dy-
namics may have already changed, rendering the original migration decision useless.
These scenarios expose the weakness in today’s live migration solutions. They can be
summarized as the lack of live migration management system. Several related questions
are frequently asked on numerous online forum discussions.
• How long does migration take? – is a popular question in live VM migra-
tion FAQs [Pad10, Tec11, Ste10]. Unfortunately, there is no simple formula for
calculating the answer because it depends on many dynamic run-time variables
that are not known a priori, including application I/O workload intensity, net-
work speed, and disk speed. As numerous online forum discussions indicate (e.g.
[VMw11a, VMw11b, VMw11c, VMw09, Xen08a, Xen11a, Xen11b, Xen08b]),
users routinely try to guess why migration is slow and whether it could be sped up,
and how long they might have to wait.
• How to avoid application components getting split between distant datacenters
during migration? – This question is of paramount importance to enterprises be-
cause their applications often consist of multiple interacting components perform-
ing different functions (e.g. content generation, custom logic, data management,
etc. [HSS+10]). Without the ability to manage migration progress, individual ap-
plication components could complete migration at very different times and become
split over distant cloud datacenters for arbitrary periods. The resulting large inter-
component communication delay guarantees detrimental performance impact. Per-
haps a stopgap method is to configure the migration speeds for different components
proportional to their virtual memory/storage sizes. Unfortunately, we already know
that migration time depends on a large number of dynamic run-time variables, so this
stopgap method is bound to fail (see Section 3.4).
9
• How to control the trade-off between application performance and migration
time? – is another popular question raised by users [VMw12, VMw11d]. Stud-
ies [WZ11, BKR11, ASR+10, VKKS11] have shown that live migration can degrade
application performance, and slowing migration down helps [BKR11]. Although
the administrator might be willing to slow down migration to some extent to im-
prove application performance, a migration task must still be completed by a certain
deadline or else other dependent tasks cannot proceed. Unfortunately, no solution
exists for managing the progress of a migration to finish at a desired time (neither
sooner nor later). Perhaps a stopgap method is to configure the data migration speed
to data sizedesired time . But again, this method is bound to fail due to the dynamic run-time
variables (see Section 3.4).
1.3 Lack of coordination for multi-tier application migration
Today’s datacenter usually runs applications in the multi-tier architecture in which pre-
sentation, application processing, and data management functions are logically separated.
Multi-tier architectures are widely employed in virtualized cloud computing environments
because they provide a model by which developers can create flexible and reusable ap-
plications. The cloud providers provides VMs and a wide range of features such as load
balancers, content-distribution networks, DNS hosting, etc, resulting in a complex ecosys-
tem of interdependent systems operating at multiple layers of abstractions [HFW+13]. By
segregating an application into tiers, developers acquire the option of modifying or adding
a specific layer, instead of reworking the entire application [wikd].
The goal in optimizing the migration of multi-tier applications is very different from the
goal in optimizing the migration of a single VM. Previous VM migration research focuses
on optimizing the migration of a single VM. Their goal is minimizing total migration time
and minimizing down time; however, they are insufficient for multi-tier migration because
the problem is unique in the migration of multi-tier applications.
Given the fact that the VMs running a multi-tier application are highly interactive, a se-
10
Figure 1.3 : An example of multi-tier application migration
rious issue is that during migration, the application’s performance can degrade significantly
if the dependent components of an application are split between the source and destination
sites by a high latency and/or congested network path. The goal in the migration of multi-
tier applications is to minimize the cost introduced by splitting the two components into
two distant sites. We will formulate the problem later in more details. We give an example
to illustrate the problem.
Figure 1.3 shows an example of migrating a 3-tier e-commerce application from one
cloud to another. The application has 4 VMs (shown as ovals) implementing a web server,
two application servers, and a database server. An edge between two components in the
figure indicates that those two components communicate with one another. Before mi-
11
gration, the latency of web requests is small because all the four VMs run in the same
data-center. During the migration, the four VMs are migrated one by one in the sequence
of web server, application server 1, application server 2 and database server. During the
migration of web server, it still runs in the original datacenter and the latency remain small.
However, when the web server finishes migration and starts running at the destination dat-
acenter, the communication between web server and application servers go across the wide
area network and that results in a long latency. There is a time period during which com-
municating components are split over the source and destination sites. When such a split
happens, certain inter-component communications must be conducted over the bandwidth
limited and/or high latency network, leading to degraded application performance. When
application servers finish migration, the communication between web server and applica-
tion servers are inside the same datacenter. However, some web requests to this service
still experience a long latency because the database server is still in the original datacenter
and all the database requests from application servers to the database server will go across
the wide area network. At the end of the database server migration, the whole set of VMs
run in the destination datacenter and the request latency is as small as the latency before
migration.
Although existing live migration techniques [KVM, CFH+05, NLH05] are able to mi-
grate a single VM efficiently, those techniques are not optimized for migrating related VMs
in a multi-tier application. They either migrate the VMs in a sequential order or migrate
the VMs at the same time and ignore whether they could finish at the same time. Simply
migrating all related VMs in parallel is not enough to avoid such degradation. Specifically,
two existing migration strategies, sequential and parallel migration, may result in poor per-
formance. Sequential migration, which migrates each VM one by one, results in a long
performance degradation time from when the first VM finishes migration until the last VM
finishes migration. Parallel migration, which starts migration of multiple VMs at the same
time, is not able to avoid the degradation either. This is because the amount of data to
migrate for each VM is different and therefore the VMs in general will not finish migration
12
simultaneously. The application will experience performance degradation until all VMs
have completed migration. Furthermore, if the bandwidth required for migrating all VMs
in parallel exceeds the actual available bandwidth, additional performance problems will
result.
1.4 Contributions
The contribution of this thesis is in two parts. The first part is Pacer – the first migration
progress management system (MPMS). Pacer effectively addresses all the aforementioned
issues in Section 1.2. While much details of Pacer’s constituent techniques will be dis-
cussed in Chapter 3, they share the following key strengths:
• Robust and lightweight run-time measurements – Pacer is highly effective thanks
to its use of continuously measured application I/O workload intensity (both mem-
ory and disk accesses) and measured bottleneck migration speed (no matter it is in
the network or in the disk) at run-time. Pacer enhances the robustness of the mea-
surements by employing a novel randomized sampling technique (see Section 3.2.3).
Furthermore, these techniques are implemented to minimize run-time overhead as
shown in Section 3.4.
• Novel efficient & accurate analytic models for predictions – Such analytic models
are used to (1) predict the amount of remaining data to be migrated as a function of
the application’s I/Oworkload characteristics and the migration progress, and (2) pre-
dict the finish time of a migration (i.e. addressing the first MPMS issue) as a function
of the characteristics of each migration stage (i.e. disk, dirty blocks, CPU/memory,
etc.).
• Online adaptation – Addressing the second and third MPMS issues require certain
migration objectives to be met: the former requires multiple application components
to complete migrations simultaneously; the latter requires a migration to complete at
a specific time. No fixed migration settings can successfully meet such objectives due
13
to run-time dynamics. Pacer continuously adapts to ensure the objectives are met. In
the former case, Pacer adapts the targeted migration finish time for all components
given what is measured to be feasible. In the latter case, Pacer adapts the migration
speed to maintain a targeted migration finish time in face of application dynamics.
The second contribution is COMMA (Coordinated migration of Multi-tier Applica-
tions) – the first coordination system for the live migration of multi-tier applications. Note
that this inter-cloud migration example is not the only usage scenario for COMMA. In gen-
eral, any migration scenario that requires crossing a network with limited bandwidth and/or
high latency could benefit from using COMMA. The limited bandwidth scenario can arise
within a campus or even within a machine room.
We will discuss the general challenges for live VM migration first, and then describe
the unique challenges for multi-tier application migration.
• Convergence. VM migration runs in a shared resource environment (e.g. disk I/O
bandwidth and network bandwidth). In this thesis, we define the term “available
migration bandwidth” as the maximal migration speed that migration could achieve.
It could be bottleneck either in network bandwidth or disk I/O bandwidth. If the
available resource is not allocated properly, the migration could fail because the ap-
plication may generate new data that needs to be migrated at a pace that is faster than
the migration available bandwidth. For example, if the available migration bandwidth
is 10MBps but the VM migration generates the new data at the speed of 30MBps,
migration will not converge in the dirty iteration stage and migration will fail. For
a single VM migration, the mechanism to handle non-convergence is either to set
a timeout to stop migration and report failure or to throttle the write operation and
slow down the new data generation rate. All of those mechanisms will hurt applica-
tion performance. For multiple VMmigrations, the problem is more complicated but
also more interesting.
14
• Dynamicity and interference The migration time for different VMs is different be-
cause migration time depends on many factors and some of them are dynamic. For
example, VM image size and memory size are static information, but actual work-
load and available resources (e.g. disk I/O bandwidth and network bandwidth) are
dynamic information. Assume that migration has pre-allocated network bandwidth
which will not be shared with other applications, then we can leverage Pacer to pre-
dict the migration time and control the migration time for a single VM migration.
Unique challenges for multi-tier application migration.
• Higher order control. Fundamentally, each individual VM migration process can
only be predicted and controlled to a certain extent (as shown by Pacer). Therefore,
if we continue to rely on an architecture where all VM migration processes act inde-
pendently, there is no way of achieving the multi-tier migration goal. It is necessary
to design a new architecture where a higher order control mechanism governs all VM
migration activities.
• Inter-VM-migration resource contention and allocation For multiple VM migra-
tions, the convergence issue is more complicated but also more interesting. If the net-
work bandwidth is smaller than any single VM’s new data generation rate, the prob-
lem is degraded to sequential migration. If the network bandwidth is large enough to
migrate all VMs together, the problem is easily handled by parallel migration. When
the network bandwidth sits in between the previous two cases, we need a mechanism
to check whether it is possible to migrate multiple VMs at the same time, how to
combine multiple VMs in groups that can be migrated together and how to schedule
the migration start and finish time of each group to achieve the goal of minimizing
communication cost.
• Inter-VM-migration dynamicity and interferenceWhen network bandwidth is re-
served for migration, Pacer can help to predict and control single VM migration.
15
However, the problem for multi-tier VM migration is more complicated due to inter-
ference between multiple VM migrations. When multiple VM migrations occur in
the same period, they will share the available resources. Pacer has no idea about when
other VMs’ migration will start and finish and how to solve the problem when there
is not enough available resource to migrate all VM’s at the same time. Therefore,
simply leveraging Pacer to control the migration time for multiple VM migration
without coordination is not a right solution.
• System design and efficiency The computation complexity for an optimal solution
to coordinate a multi-tier application could be very high. It is important that coordi-
nation system is efficient and has low overhead. How to formulate the problem and
make a tradeoff between efficiency and optimality are worth investigating for a good
system.
To tackle the above challenges, this thesis proposes an original migration coordina-
tion system for multi-tier applications. The system is based on a scheduling algorithm
for coordinating the migration of VMs that aims to minimize migration’s impact on inter-
component communications.
The systemworks in two stages. In the first stage, it coordinates the speeds of the migra-
tion of static data of different VMs, such that all VMs complete their static data migration
at nearly the same time. In the second stage, it coordinates the migration of dynamically
generated data by organizing the VMs into feasible migration groups and deciding the best
schedule for migrating these groups. Furthermore, the system schedules the migration of
VMs inside a group to fully utilize the available migration bandwidth. We have imple-
mented the proposed system and have conducted a number of preliminary experiments to
demonstrate its potential in optimizing the migration of a 3-tier application. The results are
very encouraging. Compared to a simple parallel migration strategy, our system is able to
reduce the number of data bytes affected by migration up to 475 times.
While much details of COMMA’s constituent techniques will be discussed in Chap-
ter 4.3, they share the following key strengths:
16
• Problem formulation and novel approach – We formulate the multi-tier applica-
tion migration problem, and presents a new communication-cost-driven coordinated
approach, as well as a fully implemented system on KVM that realizes this approach.
The multi-tier application migration problem is to minimize the performance degra-
dation caused by splitting the communicating components between source and des-
tination sites during the migration. To quantify the performance degradation, we
define the unit of cost as the volume of traffic between VMs that need to crisscross
between the source and destination sites during migration.
• Two stage scheduling The algorithm works in two stages. In the first stage, it co-
ordinates the migration speed of the static data of VMs so that all VMs complete
the precopy phase at nearly the same time. In the second stage, it coordinates the
migration of dynamically generated data by inter-group and intra-group scheduling.
COMMA iteratively computes and updates a desirable schedule for migrating VMs
based on both runtime VM workload characteristics as well as static configuration
information (i.e., memory, disk size) of each VM. The evaluation of COMMA shows
that it is able to greatly reduce migration’s impact on inter-component communica-
tions. We also demonstrate its potential in optimizing the migration of a 3-tier ap-
plication. Similar to the Pacer’s design, we also leverage online adaptation to ensure
the migration objectives are met according to the scheduling algorithm.
• Efficiency with heuristic algorithm – In order to minimize the performance degra-
dation cost, COMMA needs to compute the optimal group combination and migra-
tion sequence. We propose two algorithms: a brute- force algorithm and a heuristic
algorithm. The brute-force algorithm can find the optimal solution but its compu-
tation complexity is high. The heuristic algorithm can achieve a sub-optimal result
with low computation complexity.
17
1.5 Thesis Organization
The rest of this thesis is organized as follows. Chapter 2 introduces the background
about live migration. Chapter 3 presents the constituent techniques in Pacer for migra-
tion progress management. It includes the migration time model, key algorithms and sys-
tem designs. It also presents experimental results demonstrating the capability and benefit
of Pacer. Chapter 4 presents the techniques in COMMA for coordinating the migration
of multi-tier applications. It also shows the experiments conducted to demonstrate how
COMMA performs in migrating a 3-tier application. Chapter 5 summarizes the contribu-
tion of the thesis and discusses the future work.
18
Chapter 2
Background
2.1 Process Migration
During the 1980s, process migration attracted significant attentions in system re-
search [PM83, TLC85, JLHB88, DO91, BL98]. Process migration is the relocation of
a process from the processor on which it is executing to another processor.
Process migration has proved to be a difficult feature to implement in operating systems
until 1982. Powell et al implemented process migration in DEMOS/MP operating system.
In the system, a process can be moved during its execution, and continue on another pro-
cessor, with continuous access to all its resources. Messages are correctly delivered to the
process’s new location, and message paths are quickly updated to take advantage of the
process’s new location. The kernel can participate in message send and receive operations
in the same manner as a normal process [PM83].
Theimer et al leverage process migration with network transparency in the design of a
remote execution facility which allows a user of a workstation-based distributed system to
offload programs onto idle workstations, thereby providing the user with access to compu-
tational resources beyond that provided by his personal workstation [TLC85].
Jul et al proposed fine-grained mobility for small data objects (such as arrays, records,
and integers) as well as objects with processes. Thus, the unit of mobility can be
much smaller than in process migration systems which typically move entire address
spaces [JLHB88].
Douglis et al designs an automatic system for transparent process migration in Sprite
operating system. It could automatically identify idle machines, invoke eviction and use
process migration mechanism to offload work onto idle machines, and also to evict mi-
19
grated processes when idle workstations are reclaimed by their owners [DO91].
Barak et al used a preemptive process migration for load-balancing and memory ush-
ering, in order to create a convenient multi-user time-sharing execution environment for
HPC, particularly for applications that are written in PVM or MPI [BL98].
Process migration enables dynamic load distribution, fault resilience, eased system ad-
ministration, and data access locality. Despite these goals and ongoing research efforts,
migration has not achieved widespread use [MDP+00]. Milojicic et al [MDP+00] gives
a thorough survey of possible reasons for this, including the problem of the residual de-
pendencies that a migrated process retains on the machine from which it migrated. Clark
et at [CFH+05] points out that the residual dependency problem cannot easily be solved
in any process migration scheme - even modern mobile run-times such as Java and .NET
suffer from problems when network partition or machine crash causes class loaders to fail.
The migration of entire operating systems inherently involves fewer or zero such depen-
dencies, making it more resilient and robust. Virtualization led to techniques for virtual
machine live migration.
2.2 Live Migration of Virtual Machine
Live migration provides the capability to move VMs from one physical location to another
while the VMs are still running without any perceived degradation. It is called ”live”
migration because it incurs downtime of only tens of milliseconds. Many hypervisors
support live migration within the LAN [NLH05, CFH+05, Red09, WSVY07, JDWS09,
HG09]. It usually requires the two physical machines have a shared storage. However,
migrating across the wide area presents more challenges specifically because of the large
amount of data that needs to be migrated under limited network bandwidth. In order to
enable live migration over the wide area, full migration of a virtual machine which includes
VM’s storage state, CPU state, memory state and network connections.
The memory migration and network connection migration for the wide area have been
demonstrated to work well [BKFS07, WSG+09]. However, the storage migration inher-
20
ently faces significant performance challenges because of its much larger size compared
the size of the memory. We will introduce the live migration with shared storage in the
following section. Then we will discuss the live storage migration with different mod-
els. Finally we will summarize the optimization techniques for live migration in different
aspects.
2.2.1 VMMemory/CPU Migration
VM migration technologies focused only on capturing and transferring the run-time, in-
memory state of a VM in a LAN in the early stage. It assumes that all physical machines
involved in a migration are attached to the same SAN or NAS server.
Clark et al. [CFH+05] implemented a live migration system on Xen [XEN09] for the
local-area migration. It migrates the memory and CPU state of VM without support for
migrating local block devices. During the memory migration, a pre-copy approach is used
in which pages of memory are iteratively copied from the source machine to the destination
host. Page-level protection hardware is used to ensure a consistent snapshot is transferred.
The final phase pauses the virtual machine, copies any remaining pages to the destination,
and resumes execution there.
VMware also issued a live migration function called VMotion [NLH05] to their virtual
center management software. The approach is generally similar to the previous one. They
rely on storage area networks or NAS to migrate connections to SCSI devices.
2.2.2 Network Connection Migration
For the livemigration in the local network, a virtual Ethernet network interface card (VNIC)
is provided as part of the virtual platform. Each VNIC is associated with one or more phys-
ical NICs. Since each VNIC has its own MAC address that is independent of the physical
NIC’s MAC address, virtual machines can be moved while they are running between ma-
chines and still keep network connections alive as long as the new machine is attached to
the same sub-net as the original machine [NLH05].
21
Wood et al. [WSKdM11] uses existing VPN technologies to provide the live migration
infrastructure across wide area network. They present a new signaling protocol that allows
endpoint reconfiguration actions that currently take hours or days, to be performed in tens
of seconds.
When migration takes place between servers in different networks, the migrated VM
has to obtain a new IP address and thus existing network connections break. Brad-
ford [BKFS07] uses a temporary network redirection scheme to overcome this by com-
bining IP tunneling with dynamic DNS.
2.2.3 Storage Migration
KVM, VMware and Xen are three popular virtualization platforms. They use different
approaches to migrate storage of VM.
Snapshot: The approach based on snapshots was introduced in VMware ESX
3.5 [MCGC11]. The migration begins by taking a snapshot of the base disk, and all new
writes are sent to this snapshot. Concurrently, the approach copies the base disk to the des-
tination volume. After finishing copying the base disk, another snapshot is taken, and then
the approach consolidates the first snapshot into the base disk. This process is repeated
until the amount of data in the snapshot becomes lower than a threshold. Finally the VM
is suspended and the final snapshot is consolidated into the destination disk, and the VM
is resumed on the destination volume. This approach has two major limitations. Firstly,
migration using snapshots is not atomic. Secondly, there are performance and space costs
associated with running a VM with several levels of snapshots.
Dirty Block Tracking: This approach is based on an iterative copy with a Dirty Block
Tracking mechanism. It is widely used in KVM, VMware ESX 4.0 and refined in ESX
4.1[KVM, MCGC11]. It uses a bitmap to track modified blocks on the source disk and it-
eratively copy those blocks to the destination disk. The process is repeated until the number
of dirty blocks below a threshold or remaining at each cycle stabilizes. VMware live storage
migration and live migration with shared storage are two separate functions. At this point,
22
VMware live storage migration suspends VM and copy the remaining dirty blocks. KVM
live migration will start memory migration at the end of storage migration. Dirty block
tracking overcome the limitations of snapshot. It makes new optimizations possible and
guarantees atomic switch-over between the source and destination volumes. [MCGC11].
IO Mirroring: VMware ESX 5.0 uses synchronous IO mirroring in live storage mi-
gration. It works by mirroring all new writes from the source to the destination concurrent
with a bulk copy of the base disk [MCGC11].
Xen does not support storage migration. In order to support storage migration in Xen,
a solution is proposed in [WSKdM11] to integrate the DRBD storage migration system
into Xen. The solution employs a hybrid technique combining dirty block tracking and I/O
mirroring. Our system is implemented on KVM, but the proposed models and algorithms
are general to VMware ESX 4.0 and DRBD on Xen. For VMware ESX 5.0, I/O mirroring
would cause more traffic through the network. The traffic can be estimated by monitoring
the application workload. Adapting Pacer to I/O mirroring migration systems could be an
area of future work.
In this thesis, our migration time model is based on the dirty block tracking approach
which is the most popular approach across different virtualization platforms. However, it
is easy to adapt our time model and our system to the IO Mirroring approach. We discuss
this in Section 3.
2.2.4 Full VMMigration
For wide-area migration, where common network-attached storage, accessible by the
source and destination servers, is not available [BKFS07]. Therefore, live migration of
VM’s local storage state is necessary. Previous work in storage migration can be classified
into three migrationmodels: pre-copy, post-copy and pre+post-copy. In the pre-copy model,
storage migration is performed prior to memory migration whereas in the post-copy model,
the storage migration is performed after memory migration. The pre+post-copy model is a
hybrid of the first two models.
23
In the pre-copy model as implemented by KVM [KVM10] (a slightly different variant
is also found in [BKFS07]), the entire virtual disk file is copied from beginning to end
prior to memory migration. During the virtual disk copy, all write operations to the disk
are logged. The dirty blocks are retransmitted, and new dirty blocks generated during this
time are again logged and retransmitted. This dirty block retransmission process repeats
until the number of dirty blocks falls below a threshold, then memory migration begins.
During memory migration, dirty blocks are again logged and retransmitted iteratively. The
strength of the pre-copy model is that VM disk read operations at the destination have
good performance because blocks are copied over before the time when the VM starts
running at the destination. However, the pre-copy model has weaknesses. Firstly, pre-
copying may introduce extra traffic. Some transmitted blocks will become dirty and require
retransmissions, resulting in extra traffic beyond the size of the virtual disk. Secondly, if
the I/O workload on the VM is write-intensive, write-throttling is employed to slow down
I/O operations so that iterative dirty block retransmission can converge. While throttling is
useful, it can degrade application I/O performance.
In the post-copy model [HNO+09, HON+09], the storage migration is executed after
the memory migration completes and the VM is running at the destination. Two mecha-
nisms are used to copy disk blocks over: background copying and remote read. In back-
ground copying, the simplest strategy proposed by Hirofuchi et al. [HON+09] is to copy
blocks sequentially from the beginning of a virtual disk to the end. During this time if
the VM issues an I/O request, it is handled immediately. If the VM issues a write oper-
ation, the blocks are directly updated at the destination storage. If the VM issues a read
operation and the blocks have yet to arrive at the destination, then on-demand fetching is
employed to request those blocks from the source. We call such operations remote reads.
With the combination of background copying and remote reads, each block is transferred at
most once, ensuring that the total amount of data transferred for storage migration is min-
imized. However, remote reads incur extra wide-area delays, resulting in I/O performance
degradation.
24
In the hybrid pre+post-copy model [LZW+08], the virtual disk is copied to the desti-
nation prior to memory migration. During disk copy and memory migration, a bit-map of
dirty disk blocks is maintained. After memory migration completes, the bit-map is sent to
the destination where a background copying and remote read model is employed for the
dirty blocks. While this model still incurs extra traffic and remote read delays, the amount
of extra traffic is smaller compared to the pre-copy model and the number of remote reads
is smaller compared to the post-copy model.
The post-copy and the pre+post-copy models can potentially reduce network traffic, but
they cannot recover from a network failure during the migration. Therefore, widely used
systems such as KVM [KVM], Xen [CFH+05], and VMware [NLH05, svm] are all based
on the pre-copy model.
Due to the obvious drawbacks in the post-copy and the pre+post-copy models above,
the system design is based on the pre-copy model in this thesis. We implement a complete
system on KVM. The idea is also easy to apply to other hypervisors.
2.3 Optimization of Live Migration
Many optimization techniques of virtual machine live migration have been proposed. We
will discuss how these techniques could be integrated into our system in the future work
(Section 5.
2.3.1 Compression and Deduplication
Sapuntzakis et al. [PCP+02] developed techniques to reduce the amount of data sent over
the network: copy-on-write disks track just the updates to VM disks, “ballooning” zeros
unused memory, demand paging fetches only needed blocks, and hashing avoids sending
blocks that already exist at the remote end.
Jin et al. [JDW+09] uses memory compression to provide fast, stable virtual machine
migration. Based on memory page characteristics, they design an adaptive zero-aware
25
compression algorithm for balancing the performance and the cost of virtual machine mi-
gration.
Hacking et al. [HH09] also proposes similar ideas in leveraging compression in the live
migration of large enterprise applications.
Zhang et al. [ZHMM10] introduces data deduplication into migration by utilizing the
self-similarity of run-time memory image and hash based fingerprints. It employs run
length encode to eliminate redundant memory data during migration.
In this thesis, we do not apply any compression or decuplication in our time model. Our
algorithm and system targets at the migration system without compression and deduplica-
tion. We will adapt our time model to the system with compression and deduplication in
the future.
2.3.2 Reordering Migrated Block Sequence
Our previous research about live storage migration shows that existing solutions for wide-
area migration incur too much disruption as they will significantly slow down storage I/O
operations during migration. The resulting increase in service latency could be very costly
to a business. We proposed a novel storage migration scheduling algorithm[ZNS11] to
improve storage I/O performance during wide-are migration. Our algorithm is unique in
that it considers individual virtual machine’s storage I/O workload such as temporal lo-
cality, spatial locality and popularity characteristics to compute an efficient data transfer
schedule. Using a fully implemented system on KVM and a trace-driven framework, we
show that our algorithm provides large performance benefits across a wide range of popular
virtual machine workloads.
VMware also presents a similar optimization solution that detects frequently written
blocks and defers copying them [MCGC11]. It looks into the distribution of disk IO repe-
tition and enables multi-stage filter for hot blocks.
Similar to the above discussion, we do not apply the block sequence reordering algo-
rithm in the thesis. Our algorithm and system targets at the general migration system with
26
the default block migration sequence. We will adapt our time model to the system with
block reordering algorithm in the future work.
27
Chapter 3
Migration Time Prediction and Control
3.1 Overview
Figure 3.1 presents an overview of the design of Pacer. The three ovals represent the three
management functions in Pacer. The rectangles represent key modules. Migration time
prediction and migration time control share three key modules. Migration time control is
supported by three additional modules. Coordination of concurrent migration leverages
prediction and time control.
3.1.1 Predicting Migration Time
Predicting migration time is a challenging problem as it depends on many dynamic fac-
tors such as application workload, competing resource consumption by other virtual and
physical machines, and network performance. We show in Chapter 3.4.3.1 that a back-of-
the-envelope estimate based on image and memory size, and network bandwidth is inaccu-
rate. Pacer addresses several significant technical challenges in order to accurately predict
migration time for full VM migration.
• Migration time model for each migration phase: To predict migration time, Pacer
models migration behavior for each individual phase of migration such as disk copy,
dirty iteration, memory migration, etc. We develop a set of equations that quantita-
tively capture the relationship between time and speed. By solving these equations
based on observed conditions, Pacer can accurately predict migration time.
• Detailed remaining work estimation: In order to predict the migration time, it is
crucial to determine the remaining amount of bytes that need to be migrated, because
28
Figure 3.1 : Pacer design overview.
both memory pages and disk blocks can be written to by the VM after they have
been copied to the destination. Any of these that have changed at the source after
they have been copied over are called dirty blocks and will need to be re-copied in
the dirty iteration phase. The total number of dirty pages/blocks is not known before
migration completes as it depends on how the application is accessing memory and
storage (i.e., the application workload).
• Speed measurement: Prediction of migration time also depends on how fast can
migration run. It is crucial to do smooth and robust speed measurement during mi-
gration.
3.1.2 Controlling Migration Time
To take control of migration time, we also need to address these challenges.
29
• Speed tuning: The interference brought in by the migrated VM’s workload and other
additional competing workloads may degrade the migration speed. Solutions that do
not consider the interference and assume the real migration speed exactly matches the
configured migration speed may not finish in time due to the lower achieved speed in
reality. Pacer incorporates an algorithm to try its best to reach the desired migration
speed despite the interference, and an algorithm to estimate the maximal migration
speed that can be realized under the interference.
• Adaptation: The VM’s disk I/O workload and additional competing disk I/O work-
loads can change dynamically throughout migration. For example, when the VM’s
disk writing pattern changes, the previously dirty block prediction may no longer be
accurate. For another example, when the additional competing disk I/O workloads
vary, the maximal feasible migration speed which can be realized may change. Pacer
is thus designed to be continuously adaptive to address such workload dynamics.
3.2 Predicting Migration Time
Pacer performs predictions periodically (default configuration is every 5 seconds). To pre-
dict the remaining time during the migration, three things must be known: (1) what oper-
ation is performed in each phase of migration, (2) how much data there is to migrate, (3)
how fast the migration is progressing. This section will address these three issues. In the
formulas, we use bold font for constants and regular font for variables.
3.2.1 Migration Time Model
The total migration time T can be modeled to four distinct parts: tPrecopy for the pre-copy
phase, tDirtyIteration for the period after pre-copy but before memory migration, tMemory for
the period from the beginning of the memory migration until the time the VM is suspended,
and TDowntime for a small downtime needed to copy the remaining dirty blocks and dirty
pages once they drop below a configured threshold. TDowntime is considered as a constant
30
Phase 1 Phase 2 Phase 3 Phase 4
Content Storage Storage Memory Remaining Storage
Precopy Dirty iteration Memory, CPU, Network Connections
Amount of migrated data Known Unknown Unknown Known
DISK SIZE Threshold
Speed Measure Measure Measure -
Table 3.1 : Migrated data and speed in four phases of migration
because the remaining data to be migrated is fixed (e.g. downtime is 30ms in KVM).
T = tPrecopy + tDirtyIteration + tMemory + TDowntime (3.1)
For the pre-copy phase, we have:
tPrecopy =DISK SIZE
speedPrecopy(3.2)
where DISK SIZE is the VM virtual disk size obtained directly from the VM configuration
and speedPrecopy is the migration speed for the pre-copy phase.
At the end of pre-copy, a set of dirty blocks need to be migrated. The amount is de-
fined as DIRTY SET SIZE. The variable is crucial to the prediction accuracy during the
dirty iteration. However, its exact value is unknown until the end of pre-copy phase. It
is very challenging to know the dirty set size ahead-of-time while the migration is still in
the pre-copy phase, and there is no previous solution. We propose a novel algorithm in
Section 3.2.2 to solve this problem.
In the dirty iteration, while dirty blocks are migrated and cleaned, the clean blocks may
be overwritten concurrently and get dirty. The number of blocks getting dirty per second
is called the dirty rate. The dirty rate depends on the number of clean blocks (fewer clean
blocks means fewer blocks can become dirty later) and the workload of the VM. Similar to
the need for dirty set size estimation, we need to predict the dirty rate while migration is
31
Name Description
T a given migration time
DISK SIZE the size of the VM disk storage
MEM SIZE the size of configured memory for VM
phase the phase of migration, PRE-COPY, DIRTY-ITERATION or MEMORY MIGR
remain time the remaining time before migration deadline
past time the time past since the beginning of migration
remain precopy size the size of remaining disk data
in the pre-copy phase
dirty dsize the actual amount of dirty blocks so far
remain msize the size of remaining memory to be copied
dirty set size the estimated size of dirty set at the end of pre-copy
dirtyrate disk the rate of blocks dirtied in the dirty iteration
dirtyrate mem the rate of memory dirtied in memory migration
speed next expected the expected speed for next round
speed expected the expected speed for this round
speed observed the observed speed in this round
speed scaleup flag the flag to indicate whether speed is scaled up
interval the time for each round, e.g. 5s
estimated max speed the estimated maximal speed for migration
FULL SPEED a configured extremely high speed to exhaust the disk I/O bandwidth
NETWORK SPEED the available network bandwidth
Table 3.2 : Variable definitions.
32
still in pre-copy. We propose an algorithm in Section 3.2.2 to predict the average dirty rate
(AV E DIRTY RATE). Then, we have
tDirtyIteration =DIRTY SET SIZE
speedDirtyIteration −AV E DIRTY RATE(3.3)
where speedDirtyIteration is the migration speed for the dirty iteration phase.
Memory migration typically behaves similarly to the storage migration dirty itera-
tion. All memory pages are first marked dirty, then dirty pages are iteratively migrated
and cleaned, while pages can become dirty again after being written. We propose an
algorithm in Section 3.2.2 that is effective for estimating the average memory dirty rate
(AV E MEM DIRTY RATE).
During memory migration, different systems have different behaviors. For KVM, the
VM still accesses storage in the source and disk blocks could get dirty during memory mi-
gration. Thus, in KVM, memory migration and storage dirty iteration may take turn. Then,
denoting the size of memory as MEM SIZE and memory migration speed as speedMemory,
we have
tMemory = MEM SIZE/(speedMemory
−AV E MEM DIRTY RATE
−AV E DIRTY RATE) (3.4)
Other variants: The previous derivation assumes some behaviors that are specific to
KVM. However, the model can readily be adapted to other systems. As an example, for
VMware, storage migration and memory migration are two separate tasks. At the end of
storage dirty iteration, the VM is suspended and the remaining dirty blocks are copied to
destination. From then on, storage I/O requests go to the destination storage so no more
dirty blocks are generated, but the VM’s memory and CPU are still at the source so stor-
age I/O accesses are remote until migration completes. The speed for memory migration
in VMware would be lower than that in KVM because the network bandwidth is shared
33
between migration and remote I/O requests. Therefore, for VMware, Equations 3.4 will be
adjusted as follows:
tMemory =MEM SIZE
speedMemory − AV E MEM DIRTY RATE(3.5)
The above migration time model describes how the time is spent on each phase of live
migration. The next question is address is the amount of data to migrate.
3.2.2 Dirty Set and Dirty Rate Estimation
Migrated data consists of two parts. The first part is the original disk and memory, the size
of which is known. The second part is the generated dirty blocks and dirty pages during
migration, the size of which is unknown. We now present algorithms for predicting this
unknown.
Disk dirty set estimation: Dirty block tracking is based on the block size configured
in the migration system (1MB in KVM). For each block, we record the average write
interval, the variance of write interval (used in dirty rate estimation), and the last written
time. When a write operation is issued, Pacer updates the record for all the blocks accessed
by the operation.
There are three subsets of the dirty set. SET1 is the migrated blocks which are already
dirty. SET2 is the migrated blocks which are clean right now, but we estimate that it will
get dirty before the end of pre-copy. SET3 is the non-migrated blocks which are estimated
to get dirty after its migration time and before the end of pre-copy.
FUNCTION getEstimatedDirtyBlockSet(remain precopy size,
speed expected)
SETDirty = {}
SET1 = {blocki| already migrated and marked as dirty }
Tend = current time + remain precopy sizespeed expected
SET2 = {blocki| already migrated and marked as clean }
34
∩{blocki|∃k : tlast written(blocki)
+k · ave write interval(blocki)
∈ [current time, Tend]} // k is an integer
SET3 = {blocki|not migrated yet}
Estimate the expected migration time ti for each blocki ∈ SET3
SET3 = SET3 ∩ {blocki|∃k : tlast written(blocki)
+k · ave write interval(block i) ∈ [ti, Tend]}
SETDirty = SET1 ∪ SET2 ∪ SET3
return SETDirty
An example is shown in Figure 3.2. The first 4 blocks are already migrated to desti-
nation. t1 is the current time when the dirty set estimation algorithm is invoked, and t2 is
the estimated pre-copy finish time. Among the migrated blocks, block 2 is known to be
dirty and is in SET1. Block 4 is migrated and is clean so far, but we estimate that it will
get dirty before t2, so block 4 is in SET2. Among the non-migrated blocks, block 5 was
accessed before, and we predict that it will be written after its migration time and before
t2. Block 5 is in SET3. Thus the dirty set is {2, 4, 5}.
Disk dirty rate estimation: We develop an analytic model of the dirty iteration to
estimate disk dirty rate. Let t be the time budgeted for dirty iteration. Consider the state
of the disk at the beginning of the dirty iteration. Let N be the number of dirty blocks
in SETDirty and M be the number of clean blocks in SETClean, and let dblocki be the
i-th block in the dirty set and cblocki be the i-th block in the clean set. Abstractly, during
each time interval t′ = tN , Pacer needs to perform the work to migrate one of the N dirty
blocks and any newly generated dirty blocks during this time interval. In the first interval t′,
dblock1 is migrated. The expected number of new generated dirty blocks that are assumed
to be cleaned immediately during this first interval (D1) is computed as follows:
35
Figure 3.2 : An example of disk dirty set estimation.
D1 =∑ t′
ave write interval(blocki)
∀blocki ∈ SETClean ∪ {dblock1}
(3.6)
Note that dblock1 is included because it becomes clean. In general, the expected number
of new generated dirty block during the k-th interval is as follows:
36
Dk =∑ t′
ave write interval(blocki)
∀blocki ∈ SETClean ∪ {dblock1, dblock2, ..., dblockk}
(3.7)
Thus, the average dirty rate can be computed as follows:
AV E DIRTY RATE =
NP
i=1
Di
t· BLOCKSIZE
=M
X
i=1
BLOCKSIZE
ave write interval(cblocki)
+N
X
k=1
(N + 1 − k) · BLOCKSIZE
N · ave write interval(dblockk)
(3.8)
Our previous research about I/O characteristic in typical virtualization work-
loads [ZNS11] shows that the disk write rate is stable over long time scales. Therefore,
the disk dirty rate prediction is able to perform well. To further optimize the algorithm,
we add the following mechanism to remove inactive blocks from dirty rate calculation.
For simplicity, assume the write intervals of a block follow a normal distribution [SGM90]
∼ N(µ, σ). The possibility that the next arrival time is in [µ − 2σ, µ + 2σ] is 95%. There-
fore, if the time since the last write is already longer than 2σ for a block, that block can
be safely considered inactive. The average write interval for such a block is set to infinity.
This mechanism significantly improves the accuracy of dirty rate prediction.
FUNCTION getDirtyRateDisk(dirty set)
SETDirty = dirty set
N =the cardinality of SETDirty
SETClean = all written blocks − SETDirty
For each blocki ∈ SETClean⋃
SETDirty
37
Figure 3.3 : An example of sampling for memory dirty rate estimation
IF(current time − tlast written(blocki)
< 2 ∗ σ(blocki))
dirtyrate(blocki) = blocksizeave write interval(blocki)
ELSE
dirtyrate(blocki) = 0
ENDIF
END FOR
Sort SETDirty by block id sequence
dirtyrate = (∑
dirtyrate(blocki) ∀blocki ∈ SETClean)
+(∑N
k=1(N+1−k)∗dirtyrate(blockk)
N ∀blockk ∈ SETDirty)
return dirtyrate
Memory dirty rate estimation:
The disk dirty rate estimation algorithm would incur high overhead if it is applied to
38
Figure 3.4 : Trade-off of sampling interval
memory dirty rate estimation. Therefore, we propose a sampling-based algorithm to trade
precision for reduced overhead. The idea is that Pacer periodically takes a snapshot of the
dirty bitmap of memory pages, resets the dirty bitmap, and updates two types of informa-
tion. Figure 3.3 shows an example of a bitmap for 9 memory pages. In the example, 5
pages are written during the interval. Two types of information is updated. The first is a
cumulative write access counter for each page. If a page is written to during this period,
this counter is incremented. The second is the number of unique written pages u during
this period obtained by counting the number of set bits. In the example, the write access
counters for page 2, 4, 5, 8, 9 are incremented by 1. With these information, we can
estimate the average dirty rate as follows. We define the access ratio for each page i as
access ratio(i) = write access counter(i)P
write access counter(i),i∈{all pages}
Denote the sampling interval to be ts and then the rate at which unique write pages are
generated is uts. This rate is an upper bound for the true dirty page rate, and it corresponds
to the worst case scenario where all pages were clean at the beginning of the interval. With
access ratio representing the contribution of a page to the overall dirty rate, the dirty rate
for page i can be estimated as d(i) = uts· access ratio(i). Similar to the analysis for the
disk dirty iteration, when we migrate the n-th page, the dirty rate isn∑
i=1d(i). The average
dirty rate is thereforeNP
k=1
kP
i=1
d(i)
N where N is the total number of memory pages.
The selected sampling interval would affect the accuracy of the estimation. For exam-
39
ple, if we sample at 2s and there is a page written every one second, its estimated dirty rate
is lower than the real dirty rate. A way to increase the accuracy is to reduce the sampling
interval in consecutive rounds and see whether the estimated dirty rate increases. If the
dirty rate increases, the sampling interval is reduced further until the rate stabilizes or the
interval meets a configured minimal interval. Figure 3.4 shows the tradeoff for sampling
interval between accuracy and overhead. In Pacer, the sampling interval starts at 2s and is
reduced by half if needed. To bound the overhead, we set a minimum sampling interval to
0.25s.
FUNCTION getDirtyRateMEM()
∀page i in memory
access ratio(i) = write access counter(i)P
write access counter(i),∀i∈{all pages}
d(i) = uts· access ratio(i)
ave dirty rate =
NP
k=1
kP
i=1
d(i)
N
IF(ts > MIN SAMPLE TIME)
&&(ave dirty rate > previous dirty rate)
ts = ts/2
previous dirty rate = ave dirty rate
END IF
return ave ditry rate
3.2.3 Speed Measurement
Smoothing measurements: In each interval, we measure the migrated data and compute
the average actual speed during the interval. In order to smooth out short time scale vari-
ations of the measured actual speed, we apply the commonly used exponential smoothing
average method to update the measured actual speed. The smoothing weight α represents
the degree of weighting decrease, a constant smoothing factor between 0 and 1. A lower
α discounts older observations faster and does not smooth-out short term fluctuation well.
40
We ran some experiments to test α in [0.5, 0.9] and found 0.8 to be a reasonable choice.
speedsmooth = α · speedsmooth + (1 − α) · speedmeasured (3.9)
Robustness of measurements: To predict the overall migration time accurately, the
measured speed must not be biased during certain periods. However, the measured speed
can be highly biased if the application’s disk accesses are concentrated in a region of the
disk because this causes a bias in disk seek time. During migration, the disk handles the in-
terleaving I/O requests from the application and from the migration. The disk arm therefore
moves back and forth between the active region of the application and the migration. As the
migrated blocks get farther away from the application’s active region, seek time increases
and migration speed decreases. This bias in measured migration speed hurts the accuracy
of prediction. To improve the robustness of the speed measurement, instead of migrat-
ing disk blocks sequentially, we divide the virtual disk into stripes and generate a pseudo
random ordering for visiting these stripes to perform block migration. We use a pseudo
random sequence rather than a true random sequence because it allows the computation
of the expected migration time for a specific block in the dirty set estimation algorithm in
Section 3.2.2. With this optimization, the robustness of measurement is greatly improved.
Results on the benefit of this technique are presented in Section 4.4.
3.3 Controlling Migration Time
Pacer divides the migration time into rounds of small intervals as Figure 3.5 shows. In
each round, Pacer adapts migration speed to maintain a targeted migration finish time. It
updates the estimation of dirty block set (dirty set size), dirty disk rate (dirtyrate disk)
and dirty memory rate (dirtyrate mem) based on the algorithms in Section 3.2.2, and then
Pacer compute proper migration speed in the way that the following section describes. The
speed is adjusted later based on the algorithms that handle I/O interference in Section 3.3.2.
41
Figure 3.5 : Each round of adaption for controlling migration time
3.3.1 Solving for Speeds in Each Phase of Migration
For a specific desired migration time T , many combinations of migration speeds in each
phase are feasible. Pacer aims to control the migration progress in a systematic and stable
way, which leads to the following speed solutions.
Migrating memory pages generally will not generate disk I/O because for performance
consideration the memory of the VM is usually mapped to the memory of the physical ma-
chine, and thus the speed of memory migration is limited by the available network band-
width (NETWORK SPEED which can be directly measured) and so
speedMemory =NETWORK SPEED (3.10)
With this simplification, only two variables need to be solved: speedPrecopy and
42
speedDirtyIteration. There are still many combinations of such speeds that can finish mi-
gration in time T . However, to minimize the severity of disk I/O interference caused by
migration, we seek to minimize the maximum migration speed used. This policy implies
that
speedPrecopy = speedDirtyIteration (3.11)
where speedDirtyIteration is the average speed for the dirty iteration in storage migration.
Thus, the appropriate speedPrecopy can finally be solved by substituting and rearranging
terms in Eq. (3.1).
INPUT OF ALGORITHM: T,DISK SIZE,MEM SIZE
INITIALIZATION
remain precopy size = DISK SIZE
remain msize = MEM SIZE
remain time = T
phase =PRE-COPY
speed scaleup flag = FALSE
speed expected = remain precopy size+remain msizeremain time
Other variables are initialized to be 0
LOOP
Set storage migration speed to be speed expected
Perform migration for the time indicated by interval unless it finishes
At the end of the period
past time = past time + interval
remain time = T − past time
IF(phase changes)
update phase
43
ENDIF
IF(phase is PRE-COPY)
dirty set = getEstimatedDirtyBlockSet(remain precopy size,
expected speed)
dirty set size =the cardinality of dirty set
dirtyrate disk = getDirtyRateDisk(dirty set)
ELSE
dirty set size = 0
dirtyrate disk = #blocks dirtied in previous round·blocksizeinterval
END IF
IF(phase is PRE-COPY or DIRTY-ITERATION)
dirtyrate mem = getDirtyRateMem()
ELSE
dirtyrate mem = #pages dirtied in previous round·pagesizeinterval
END IF
speed observed = the amount of transferred trafficinterval
estimated max speed = getEstimatedMaxSpeed(speed observed,
speed expected, estimated max speed)
speed next expected = getExpectedSpeed(phase, remain time,
remain precopy size, dirty dsize, remain msize, dirty set size,
dirtyrate disk, dirtyrate mem, estimated max speed)
speed next expected = getSpeedScaleup(speed next expected,
speed scaleup flag, speed expected, speed observed)
speed expected = speed next expected
More precisely, during the pre-copy phase, at the beginning of each interval, we solve
the following equations to obtain the migration speed (speedPrecopy or s1 for short) to use
for the interval. NETWORK SPEED is measured in the previous interval and passed into the
44
equations as a constant.
Solve following equations. We use t1, t2, t3
to represent tPrecopy, tDirtyIteration, tMemory
and s1, s2 to represent speedPrecopy,speedDirtyIteration
remain time is the remaining migration time before deadline
remain precopy size is the remaining disk data in the precopy
t1 + t2 + t3 = remain time − Tdowntime
t3 = remain msizeNETWORK SPEED−dirtyrate mem−dirtyrate disk
s1 = remain precopy sizet1
dirty set size + dirtyrate disk · t2 = s2 · t2
s1 = s2
s1, s2 ≥ 0
0 ≤ t1, t2 ≤ remain time − TDowntime − t3
During the dirty iteration, we have the total bytes of current dirty blocks dirty dsize.
The migration speed consists of two parts. One part is to migrate the current dirty blocks in
the remaining time before memory migration. The other part is to migrate newly generated
dirty blocks at the rate of dirtyrate disks.
speedDirtyIteration =dirty dsize
remain time − tMemory+ dirtyrate disk (3.12)
During the memory migration, the migration speed is set to the available network band-
width.
We apply an algorithm which will be described in Section 3.3.2 for computing the max-
imal feasible migration speed that can be realized under interference. When Pacer detects
45
Figure 3.6 : An example of migration speeds in different phases.
that the computed speed is higher than the maximal feasible speed, it knows finishing by
the desired time is not feasible. Then it migrates as fast as possible by transferring data as
fast as possible without rate limiting, and computes a new finish time prediction and reports
it to the user. Furthermore, it employs disk I/O throttling to upper bound the disk dirty rate
to a configurable fraction of the achievable migration speed.
Figure 3.6 illustrates how the migration speed might be controlled by Pacer during dif-
ferent migration phases. During pre-copy, Pacer aims to maintain a stable speed but adapts
to workload changes if necessary. During dirty iteration, the migration speed depends on
the dirty set size and the dirty rate. At the beginning of dirty iteration, the dirty set already
includes the most frequently written blocks, so few new blocks will get dirty, corresponding
46
to a low dirty rate. As more dirty blocks become clean, the dirty rate increases.
The shape of the curve in practice depends on the workload. Pacer aims to migrate
the dirty set at a stable pace. This results in a dirty iteration migration speed curve that is
parallel to the dirty rate curve. Finally, during memory migration, migration can typically
proceed at a higher speed than in the previous two phases because its bottleneck is most
likely in the network.
Other variants: Similar to the discussion in Section 3.2.1 for migration time model,
the speed control can readily be adapted to other systems. As an example, for VMware,
Equations 3.10 will be adjusted as follows:
speedMemory =NETWORK SPEED−IO RATE (3.13)
where IO RATE denotes the bandwidth consumed by remote storage I/O and can be esti-
mated by monitoring the application workload.
3.3.2 Maximal Feasible Speed Estimation and Speed Tuning
Due to interference (no matter from disk or network), the achieved migration speed may
vary. It is therefore important to estimate the true maximal feasible migration speed and
ensure the desired migration speed is realized.
We estimate the maximal feasible speed by comparing the wanted speeds as specified
by Pacer and the observed speeds in reality. When migration starts, if we detect that the
observed speed cannot reach the wanted speed, we will record this speed values pair. In sub-
sequent rounds, if the new observed speed is lower than or equal to the recorded observed
speed and the new wanted speed is higher than the recorded wanted speed, we estimate that
the maximal feasible speed has been reached. The maximal feasible speed is updated by the
current observed speed. In the future, when any observed speed is higher than the maximal
feasible speed, the maximal feasible speed is updated. In order to smooth out short time
scale variations on the maximal feasible speed, we use an exponential smoothing average
for updating the maximal feasible speed. The smoothing weight β in Pacer is set to 0.8.
47
FUNCTION getEstimatedMaxSpeed(speed observed,
speed expected, estimated max speed)
IF(no recorded pairs
&&speed observed < speed expected)
speed pair expected = speed expected
speed pair observed = speed observed
ELSE IF(speed observed < estimated max speed)
IF(speed observed ≤ speed pair observed
&&speed expected > speed pair expected)
estimated max speed = α · estimated max speed+
(1 − α) · speed observed
speed pair observed = speed observed
speed pair expected = speed expected
ELSE IF(speed observed > speed pair observed)
speed pair observed = speed observed
speed pair expected = speed expected
END IF
ELSE
estimated max speed = α · estimated max speed+
(1 − α) · speed observed
speed pair observed = speed observed
speed pair expected = speed expected
END IF
return estimated max speed
When the observed speed cannot reach the wanted speed in a round, Pacer will scale
up the wanted speed for the next round and set a scale-up flag to indicate that the speed has
been scaled up. In the next round, if the new observed speed is not higher than the previous
48
observed speed, that means the scaling up did not help. Pacer then does not perform scale
up for the next round.
FUNCTION getSpeedScaleup(
speed next expected, speed scaleup flag, speed observed,
speed expected)
IF(speed scaleup flag == FALSE
&&speed observed < speed expected
&&speed next expected ≥ speed observed)
speed scaleup flag = TRUE
ELSE IF(speed scaleup flag == TRUE
&&speed observed < speed record observed)
speed scaleup flag = FALSE
ENDIF
IF(speed scaleup flag == TRUE)
speed next expected = speed next expected+
(speed expected − speed observed)
speed record observed = speed observed
ENDIF
return speed next expected
3.4 Evaluation
3.4.1 Implementation
Pacer is implemented on the kernel-based virtual machine (KVM) platform. KVM consists
of a loadable kernel module, a processor specific module, and a user-space program –
a modified QEMU emulator. QEMU performs management tasks for the VM. Pacer is
implemented on QEMU version 0.12.50 with about 2500 lines of code. Two options are
49
added to the migration command: (1) an option to enable migration prediction and report
the predicted migration time periodically (2) an option to specify the desired migration time
and let Pacer control the migration progress to achieve the specified desired finish time.
3.4.2 Experiment Setup
The experiments are set up on two physical machines. Each machine has a 3GHz Quad-
core AMD Phenome II X4 945 processor, 8GB RAM, 640GB WD Caviar Black SATA
hard drive, and Ubuntu 9.10 with Linux kernel (with the KVM module) version 2.6.31.
In all experiments (unless specified), the migration speed is restricted to be no more than
32MBps to mimic the level of available bandwidth in inter-datacenters scenarios.
In our test platform, the I/O write speed on the destination disk for migration is at most
15MBps, while RAID is widely used in commercial clouds to increase the I/O speed to
be over a hundred MBps. To fully measure the prediction accuracy with a wide range of
configured speeds, and to meet the time control requirement of various desired migration
time, we modify QEMU at the destinationmachine not to write the received data to the disk.
To ensure that the result is not biased by the disabled writing, we run a set of experiments of
enabling and disabling writing at the destination, vary the number of clients, and compare
the average prediction error in both cases. The difference is less than 1s. We vary the
desired migration time and compare the difference between the actual migration time and
desired time in both cases. The difference is less than 1s again. The results show that
disabling writing does not bias the experiment results.
The experiment VMs run VMmark Virtualization Benchmark [VMW10]. VMmark
consists of five types of workloads: file server, mail server, database server, web server,
and java server, with each representing different types of applications. Table 3.3 shows the
configuration of those servers. We vary the number of client threads to generate different
levels of workload intensity. A simple program is used to generate competing disk I/O
traffic on the source machine to create more challenging test scenarios that are more repre-
sentative of multi-tenancy clouds. It randomly accesses the disk by generating read/write
50
Workload VM Configuration Server Default#
Name Application Clients
File SLES 10 32-bit dbench 45
Server (fs) 1 CPU,256MB RAM,8GB disk
Mail Windows 2003 32-bit Exchange 1000
Server (ms) 2 CPU,1GB RAM,24GB disk 2003
Java Windows 2003 64-bit SPECjbb 8
Server (js) 2 CPU,1GB RAM,8GB disk @2005-based
Web SLES 10 64-bit SPECweb 100
Server (ws) 2 CPU,512MB RAM,8GB disk @2005-based
Database SLES 10 64-bit MySQL 16
Server (ds) 2 CPU,2GB RAM,10GB disk
Table 3.3 : VMmark workload summary.
I/O requests. Three models are applied to control the I/O rate by varying the interval be-
tween two I/O requests. The static model generates I/O with a constant interval. Two
dynamic models generate I/O following an exponential distribution (λ = 10, 50 or 90) or
Pareto distribution (PAR(α, k) where α = 2 and k = 10, 50, or 90). Each experiment is
run for 3 times with different random number seeds. The results show very little variance
(< 0.1%). We believe that is because the VMmark workload is quite stable from run to run
as our previous research [ZNS11] about VMmark workload shows.
The performance of prediction is evaluated by prediction error. The predictor com-
putes and reports its prediction tpi every N seconds from the beginning of migration until
the migration finishes. After the migration, we evaluate the accuracy of the prediction by
computing the absolute difference between the actual migration time t and the reported
prediction time, and then report the average of those absolute differences:P
|tpi−t|t/N . We
optimize Pacer to avoid prediction spikes due to some sudden temporary workload shifts
by generating a cumulative average predicted time over all past individual predicted times
51
and using it as the reported prediction time.
3.4.3 Prediction of migration time
3.4.3.1 VM-size based predictor and progress meter do not work
In the following experiment, we will show that the VM-size based prediction method and a
more dynamic method, progress meter, fail to give an accurate prediction of the migration
time.
The VM-size based predictor uses the formula storage size+memory sizeconfigured migration speed . This approach
is commonly used when users want to predict the migration time.
Another dynamic predictor is also implemented for comparison. The predictor is called
progress meter, which is based on the migration progress reported by QEMU. Whenever
the migration progress increases by 1%, the predictor records the current migration time t
and the progress x%, computes the progress rate x%t , and uses that rate to predict the finish
time 100%∗tx% dynamically.
The experiment runs on two types of VM image sizes to represent the typical image
sizes in industrial environments. 160GB is the size of an Amazon EC2 small instance
and 8GB is the size of the VMmark file server image. We use a micro benchmark that
repeatedly writes to a data region of the VM’s virtual disk at a specified write rate. The size
of the written region and the write rate vary to create different dirty set sizes and dirty rates
during the migration.
Table 3.4 shows the results. The prediction errors for the VM-size based predictor
and the progress meter are several orders of magnitude larger than those of Pacer, mainly
because those two methods do not predict the time of the dirty iteration and memory mi-
gration. The prediction errors of those two methods scales up with higher write rates and
larger written region sizes, while Pacer always achieves small prediction errors in all cases.
52
(a) VM-160GB
Predictor Prediction Error
Vary Write Rate Vary Written Region Size
(Written Region Size 10GB) (Write Rate 20MBps)
5MBps 15MBps 25MBps 5GB 15GB 25GB
VM size-based 326s 395s 519s 185s 698s 1157s
Predictor
Progress 316s 382s 510s 169s 687s 1149s
Meter
Pacer 6s 5s 8s 8s 10s 9s
(b) VM-8GB
Predictor Prediction Error
Vary Write Rate Vary Written Region Size
(Written Region size 1GB) (Write Rate 20MBps)
5MBps 15MBps 25MBps 512MB 1GB 2GB
VM size-based 43s 74s 99s 46s 60s 122s
Predictor
Progress 41s 70s 94s 45s 51s 114s
Meter
Pacer 4s 6s 5s 5s 6s 4s
Table 3.4 : Prediction errors for the VM size-based predictor and the progress meter areseveral orders of magnitude higher than Pacer.
53
0
100
200
300
400
500
600
700
800
0 100 200 300 400 500 600
Mig
ratio
n Ti
me
(s)
Time (s)
Predicted Migration TimeReal Migration Time
Figure 3.7 : The prediction of a VM (file server-30clients) migration. Pacer achieves accu-rate prediction from the very beginning of the migration.
3.4.3.2 Pacer in face of uncertain dynamics
We vary multiple dimensions in the migration environment to demonstrate that Pacer per-
forms well under different scenarios. We use the file server VM with 8GB storage as the
representative workload in many experiments, because it is the most I/O intensive workload
in VMmark and it challenges Pacer the most. Pacer computes and reports a predicted time
every five seconds.
Figure 3.7 shows an example of the prediction process during migration. The experi-
ment is based on the migration of a file server with 30 clients. There is additional competing
traffic on the same hypervisor. The additional competing traffic follows the exponential dis-
tribution of average 10ms sleeping time. The actual migration time is 596s. In the first 20
54
seconds, Pacer predicts the migration time as 400 seconds because it does not have enough
data for an accurate prediction. From 20 seconds onwards, its prediction time is very close
to the actual migration time. The prediction error is [0s, 26s] excluding the first 20 seconds.
The average prediction error is 14s over the entire migration period and 7s for the period
excluding the first 20 seconds.
Table 3.5 shows more scenarios for evaluating Pacer under different dynamic changes.
The first three experiments have no additional competing traffic.
Vary configured speed: This experiment is based on the file server with the workload
of 15 clients. We vary the configured migration speed from 30MBps to 50MBps. As
Table 3.5 shows, the average prediction error varies from 2s to 7s.
Vary the number of clients: This experiment is based on the file server with the default
configured speed of 32MBps. We vary the number of clients from 0 to 30 to represent light
workload, medium workload, and heavy workload. The average prediction error ranges
from 2s to 6s. The results show that Pacer achieves good prediction even with heavy work-
load.
Vary workload type: We vary the workload types with the default configured speed of
32MBps. The average prediction error varies from 1s to 8s across four types of workload.
Vary additional competing traffic: This experiment is based on file server with 15
clients. We vary the intensity of additional competing traffic based on the Pareto model of
average 50ms and 90ms sleeping time. The average prediction errors are 4s and 6s.
According to the results and observations, an advantage of Pacer is that Pacer achieves
accurate prediction from the very beginning of the migration. We take the prediction values
in the first minute and compute the average prediction error for each experiment above. The
resulting errors are within the range of [2s, 12s], which is slightly larger than the average
prediction error of the entire migration. The reason why Pacer achieves accurate predic-
tion from the very beginning is because of the effective dirty set and dirty rate prediction
algorithms. We will quantify the benefits of these algorithms in Section 3.4.4.3.
In summary, Pacer provides accurate average prediction in various scenarios. The pre-
55
Actual Average
Migration Prediction
Time Error
Vary configured speed
(fs-15 clients)
30 MBps 309s 5s
40 MBps 234s 2s
50 MBps 201s 7s
Vary the number of client
(Configured speed 32MBps)
0 client 263s 2s
15 client 288s 2s
30 client 331s 6s
Vary workload types
ms-200 client 794s 8s
js-16 client 264s 1s
ws-100 client 269s 2s
ds-16 client 402s 8s
Vary additional competing traffic
(fs-15 clients)
Pareto 50ms 319s 4s
Pareto 90ms 299s 6s
Table 3.5 : Prediction with Pacer
diction error ranges from 1s to 8s across all the above scenarios.
56
3.4.4 Best-effort migration time control
3.4.4.1 Dirty block prediction is critical for effective time control
We implement an adaptive time controller without dirty block prediction. The migration
speed is computed by the formula remain pre copy+existing dirty blocksremain time . Similar to the setup in
Section 3.4.3.1, the experiment uses two types of image size, 160GB and 8GB. The micro
benchmark is leveraged to generate dynamic write workload on VM. The desired migration
time is 6500s for the migration of VM (160GB) and is 400s for the migration of VM (8GB).
Table 3.6 shows the migration time deviation. The actual migration time of Pacer is
very close to the desired time, with maximal deviation of [-1s,+6s]. The migration time of
the controller without dirty block prediction exceeds the desired time up to 1528s and the
deviation gets larger when the workload is more write intensive, because the controller lack
the capability to predict the amount of remaining blocks for migration and thus it selects a
wrong speed. We will show how the key components in Pacer help to reduce the deviation
later in Section 3.4.4.3.
3.4.4.2 Pacer in face of uncertain dynamics
Similar to the experiments for prediction, we vary multiple dimensions in the migration
environment to show that Pacer can perform adaptive pacing to realize the desired migration
time.
Vary desired migration time: This experiment is based on the file server with the
workload of 30 clients. We vary the desired migration time from 150s to 400s. The Fig-
ure 3.8 shows that when the desired time is within the range of [200s, 400s], the migration
time in the three runs is very close to the desired time, with maximal deviation of [−2s, 2s].
When we decrease the desired migration time way beyond anything feasible, the I/O be-
comes the bottleneck, and consequently Pacer will hit its minimal migration time of 176s,
while the default QEMU with the configured speed of 32MBps can finish the migration in
362s.
57
(a) VM-160GB
Migration Migration Time Deviation
Time Vary Write Rate Vary Written Region Size
Controller (Written Region Size 10GB) (Write Rate 20MBps)
5MBps 15MBps 25MBps 5GB 15GB 25GB
Controller 282s 309s 327s 264s 1004s 1528s
w/o dirty block prediction
Pacer 2s 4s 4s 5s 6s 4s
(b) VM-8GB
Migration Migration Time Deviation
Time Vary Write Rate Vary Written Region Size
Controller (Written Region Size 1GB) (Write Rate 20MBps)
5MBps 15MBps 25MBps 1GB 2GB 3GB
Controller 31s 47s 59s 54s 88s 110s
w/o dirty block prediction
Pacer 1s 2s -1s 1s 1s 2s
Table 3.6 : Migration time deviation for Pacer is much smaller than the controller withoutdirty block prediction.
58
150
200
250
300
350
400
450
150 200 250 300 350 400 450
Mig
ratio
n Ti
me
(s)
Desired Migration Time (s)
Pacerdefault QEMU
Ideal
Figure 3.8 : Migration with different desired finish times. Pacer almost matches the idealcase when the desired time is larger than 176s. The deviation is very small in [-2s,2s].
Vary the number of clients: We vary the number of clients from 0 to 60 on the file
server. As Figure 3.9 shows, there exists a lower bound for migration time (minimal migra-
tion time) because of the I/O bottleneck. Pacer can adaptively pace the migration to achieve
any target migration time in the feasible region above the smallest possible time for migra-
tion to complete, while default QEMU can only achieve one migration time for a specific
number of clients. Moreover, when the number of clients increases above 35, QEMU can-
not converge and the migration time becomes infinite. The reason is that QEMU uses a
configured constant speed that will not increase when the I/O bandwidth becomes higher.
We choose six different desired migration times from 144s to 400s in the feasible re-
gion, and migrate VM with different number of clients with those different desired migra-
tion times. The results in Table 3.7 show that Pacer can achieve the desired time in all cases
59
Figure 3.9 : Migration with different degrees of workload intensity. Any point in thefeasible region can be realized by Pacer. The lower bound for migration time is limited byI/O bottleneck. Default QEMU can only follow a narrow curve in the region.
with maximal deviation of [−2s, 1s].
Vary workload type: We perform live migration with Pacer for five types of VMmark
workloads. In order to guarantee that the default QEMU can converge in the migration, we
decrease the number of clients. We run default QEMU first and get the migration time, and
then we set this time to be Pacer’s desired migration time. Table 3.8 shows that Pacer can
achieve desired migration time with a small deviation in [−2s, +2s].
Vary additional competing traffic: To test whether Pacer can achieve desired migra-
tion time when different levels of I/O interference exist, we run the following experiment
with the program in Section 3.4.2 to generate additional competing I/O traffic. The mi-
60
Desired 10 20 30 40 50 60
Time Clients Clients Clients Clients Clients Clients
144s [−1, 0] [0, 0] – – – –
176s [0, 0] [−1, 1] [0, 1] – – –
203s [−1, 1] [−2, 1] [0, 0] [0, 1] – –
222s [0, 0] [0, 1] [−1, 0] [−1, 0] [0, 1] –
305s [0, 0] [−2, 1] [−1, 0] [−2, 0] [0, 0] [0, 0]
400s [0, 0] [−1, 0] [−2, 0] [−2, 0] [−1, 1] [−2, 0]
Table 3.7 : Deviation of migration time on Pacer with different workload intensities. Thenumber in the bracket represents the worst earliest and latest deviation in Pacer. For ex-ample, [−1, 1] means at most early by 1s and late by 1s. ”-” means the time is beyond thefeasible region.
Workload Desired Pacer
Migr Migr
Time(s) Time(s)
fs-30 clients 362 360
ms-200 clients 897 899
js-16 clients 274 275
ws-100 clients 287 287
ds-16 clients 471 473
Table 3.8 : Migration time on different types of workload. Pacer can achieve the desiredmigration time.
61
Sleeping Run1 Run2 Run3
Time MigrTime MigrTime MigrTime
Dev(s) Dev(s) Dev(s)
No Add Traffic -1 0 0
Static 50ms 0 -5 1
Expo (ave 50ms) -5 0 -4
Pareto (ave 50ms) 0 -2 3
Static predictor 90ms -3 0 -5
Expo (ave 90ms) -5 -2 1
Pareto (ave 90ms) 0 2 1
Table 3.9 : Migration time for Pacer when the additional competing traffic varies. Pacercan achieve the desired migration time with a small finish time deviation.
grated VM runs file server with 30 clients. The desired migration time is 264s. Table 3.9
shows the results for three runs. Pacer can achieve the desired time when the I/O inter-
ference varies. The deviation is [−5s, 3s] which is small comparing to the desired time of
264s.
3.4.4.3 Benefits of key components in Pacer
Dirty set and dirty rate prediction: In order to understand the benefit of key components
in Pacer, we design an experiment to compare the systems with and without dynamic dirty
set and dirty rate prediction to evaluate the effectiveness of those algorithms. The workload
is file server. As Table 3.10 shows, the actual migration time will exceed the desired mi-
gration time significantly in the case that there is no prediction algorithm. When only the
dynamic dirty set prediction algorithm is added into the system, the accuracy of migration
time improves but still exceeds the desired time. When both the dirty set and dirty rate
prediction algorithms are used in Pacer, Pacer can perform adaptive pacing with very little
deviation [−2s,−1s].
62
Work Desired Pacer without Pace with Pacer
dirty set/rate only dirty set
load Time(s) prediction(s) prediction(s) (s)
30 clients 200 216 206 198
60 clients 400 454 431 399
Table 3.10 : Importance of dynamic dirty set and dirty rate prediction. Without any of thesealgorithms, it is hard to achieve desired migration time.
Desired With speed Without speed
Time tuning tuning
200s 198s 284s
300s 300s 380s
400s 399s 553s
Table 3.11 : Importance of speed scaling up algorithm.
Speed measurement and tuning: We design an experiment to run Pacer with and
without maximal speed prediction. The VM runs file server with 30 clients. Additional
competing traffic is generated by constant interval 10ms. Without maximal speed predic-
tion, migration runs in 697s when the desired time is 600s. With prediction, migration can
finish in time. Moreover, we design another experiment to run migration with and with-
out the speed scale-up algorithm on the file server with 30 clients, but without additional
competing traffic on the disk. We set the desired migration time to be 200s, 300s and 400s.
The results are shown in Table 3.11. Without the speed scale-up algorithm, migration will
considerably exceed the desired time in all three experiments.
63
3.4.5 Overhead of Pacer
In this experiment, we measure the overhead introduced by Pacer in terms of time and
space. For example, for best effort time control, we run migration with Pacer for the file
server workload with 60 clients and a desired migration time of 400s. We measure the
computation time of Pacer in each round. We observe that the computation time is 28.24ms
at the beginning of migration. As the migration progresses and more blocks in the dirty set
are determined, the computation time drops to below 1ms in the final stage of migration.
Overall, Pacer on average only incurs 2.4ms of computation time for each 5 second interval.
The overhead is 0.05% ,which is negligible. The space overhead in terms of additional
memory required to run Pacer compared to default QEMU is less than 1MB. Prediction
consumes less computation resource than best-effort time control.
We also evaluate the overhead introduced by Pacer for each disk I/O write operation
during migration. The default QEMU already has a dirty block tracking function to track
each disk write operation during migration. Pacer just leverages the existing tracking sys-
tem and performs a simple update for average write interval. We ran experiments to mea-
sure the disk I/O write latency with and without Pacer. The average disk I/O latency at
millisecond accuracy and throughput at MB/s accuracy is the same with and without Pacer.
We also measure the application throughput and response time on the file server during
migration with and without Pacer. The results show no side effect on application perfor-
mance with Pacer. In summary, the overhead of Pacer is small and has no impact on the
performance of the application.
3.4.6 Potential robustness improvements
Pacer could be improved further by including mechanisms to mitigate the negative im-
pact of rare case when migration environment variables are not steady. Firstly, Pacer is an
adaptive system with a fixed adaptation interval (5s) in the current design. Instead, a flex-
ible interval can be applied when Pacer detects that the workload intensity or the network
available bandwidth varies significantly. Reducing the adaptation interval will improve the
64
adaptivity but it also incurs more overhead. By adjusting the adaptation interval, we can
make a trade-off between the speed of adaptation and overhead. Secondly, we can test
the migration environment, e.g. network bandwidth, against expected patterns to find out
whether any increasing or decreasing trend exists. These mechanisms will be considered
in our future work.
3.5 EC2 Demonstration
Figure 3.10 : VM migration from Rice campus to Amazon EC2.
To demonstrate the functions of Pacer in a commercial hybrid cloud environment, we
conduct a set of experiments using the Amazon EC2 cloud. In these experiments we mi-
grate VMs from the Rice campus network to EC2. On EC2, we use High-CPU Medium
instances running Ubuntu 12.04. EC2 instances do not support KVM, thus we use the
“no-kvm” mode in QEMU in EC2. The downside is that without KVM’s hardware virtual-
ization support, a QEMU VM’s performance is reduced.
65
Workload
intensity none Low Medium Heavy
Actual
Migration 227s 240s 255s 250s
Time
Average
Prediction 6.35 5.39 4.71 7.76
Error
Table 3.12 : Prediction accuracy with Pacer.
3.5.1 Network and Disk Speed Measurements
We characterize the network and disk speed that can be achieved between Rice and EC2
and make several interesting observations. First, we use iperf to measure TCP network
throughput for 200s. We find that when transmitting data from Rice to EC2, the through-
put increases gradually and linearly for a surprisingly long 30s before it maximizes at
roughly 60MBps. More specifically, 50% of the speed samples fall between 58MBps to
62MBps. After the initial 30s, 5% of the speed samples are below 40MBps and 3.5% are
below 30MBps. Based on these findings, we cap the migration speed in the experiments to
50MBps. Second, we use scp to copy a 8GB file from Rice to EC2 to measure achievable
disk speed. We sample the scp reported speed every 0.5s. The average speed achieved is
30.9MBps. Thus, disk speed is the most likely bottleneck for migration in these EC2 exper-
iments. To compute the degree of disk speed variation, we compute the absolute difference
between each speed sample and the average speed. We find the amount of variation to be
significant, with the average absolute difference being 5MBps.
66
Desired time 500 600 700 800
Deviation [-2 +2] [-2 +2] [-1 +2] [-3 0]
Table 3.13 : Migration time control in EC2.
3.5.2 Use Case 1: Prediction of Migration Time
To measure the accuracy of Pacer’s prediction, we migrate one VM that runs the file server
from Rice to EC2. We vary the number of clients to emulate different workload intensities
of the VM server. The CPU utilization rate is 30-45% for the low workload, 45-55% for
the medium workload, and 55-70 for the high workload.
For each intensity of the workload we run three sets of experiments and report the
average prediction error in Table 3.12. The first observation is that the accuracy of the
prediction does not decrease as the workload increases. Secondly, given the fact that the
network and disk speeds are quite unstable, Pacer still can predict with an average absolute
error of about 5s. We find that if disk writes at the destination are disabled to eliminate
the impact of disk speed variation, the average prediction error is reduced to 2s. Given the
disk speed typically fluctuates 16% from the average speed, the obtained average prediction
error ranging from 2% to 3% of the actual migration time is quite desirable.
3.5.3 Use Case 2: Best-effort Migration Time Control
In this experiment we migrate the 8GB file server with medium workload and vary the
desired migration time from 500s to 800s. For each desired time we run three experiments
and report the range of the deviations in Table 3.13. Although we have reported that the
network and disk speeds between Rice and EC2 are not very stable, Pacer still works very
well in controlling the migration time to within 3s of the desired time.
67
3.6 Related Work
3.6.1 Live Migration
While to our knowledge no previous work is directly comparable to Pacer, there exists
some related work on setting the speed or estimating the time of CPU/memory-only VM
migration. Breitgand et al. [BKR11] propose a cost function for computing the network
bandwidth allocated to CPU/memory-only migration in order to minimize the theoretical
number of application delay bound violations as given by a queuing theory model.
Akoush et al. [ASR+10] simulate the execution of the iterative data copy algorithm of
CPU/memory-only migration so as to estimate the required migration time. The simula-
tion makes certain simplifying assumptions such as fixed network bandwidth and fixed or
historically known memory page dirty rate.
Relative to these previous works, not only does Pacer address a different set of prob-
lems in migration progress management for full VM migration, Pacer also takes a system-
building approach based on real measurements and run-time adaptation, which are found
to be crucial to cope with workload and performance interference dynamics, to realize a
complete system.
3.6.2 I/O Interference in Virtualized Environment
In the virtualized environment, multiple VMs may coexist in the same system. They share
the same underlying I/O resources, and thus the I/O interference comes. The I/O from
one VM may affect the performance of the I/O of other VMs. Two types of solutions
are proposed to mitigate the I/O interference: performance isolation and resource adap-
tation. I/O scheduling algorithms have been proposed for fair sharing and performance
isolation among multiple VMs [GMV10]. On the other hand, resource adaptation algo-
rithms adjust the allocation of resources among VMs when performance degradation is
detected [NKG10, PHS+09]. These performance isolation or resource adaptation solutions
could potentially assist Pacer to maintain a desired migration speed with accuracy. With-
68
out such underlying supports, Pacer uses simple run-time migration speed measurement
as inputs to dynamically adjust the aggressiveness of migration, attain the desired migra-
tion speed, and apply the maximal possible migration speed if it cannot reach the desired
migration time due to the I/O interference.
3.6.3 Data Migration Technologies
A related area is data migration quality of service. In [LAW02], Lu et al. presents Aque-
duct, a data migration system that minimizes the impact on the application performance.
However, Aqueduct simply treats the migration as a low-priority task and does not provide
a predictable migration time.
Dasgupta et al. [DGJ+05] and Zhang et al. [ZSS06] present different rate controlling
schemes that attempt to meet a data migration time goal and evaluate them through simu-
lations. However, both of these proposals do not consider the dirty data generated by write
operations during the migration nor the need for the iterative dirty block migration. The
possible reason for the above imperfections is that data migration solutions including those
commercial ones (e.g. IBM DB2 UDB [PRH+03]) are designed to migrate data from one
local storage device to another. Those solutions are based on the assumption that upon the
migration of a data block, all accesses to that block is redirected to the destination device
without incurring much penalty, and thus the notion of dirty data does not exist. However,
the same idea of redirection will lead to terrible performance degradation in live VM mi-
gration if the migration is carried out over a long distance. Moreover, it is possible for
network outage to interrupt the migration process, and thus the redirection technique also
runs the risk of losing the latest copy of the migrated data.
3.6.4 Performance Modeling and Measurement
As live migration becomes common in cloud management, the way it performs is thus im-
portant to users. Therefore, performance modeling and measurement for VM livemigration
is proposed [WZ11, BKR11, ASR+10, VBVB09, ZF07, CCS10].
69
Wu et al. [WZ11] conducted a series of experiments on Xen to profile the time for
migration a DomU VM running different resource-intensive applications while Dom0 is
allocated different CPU shares for processing the migration. Regression methods are then
used to create the performance model based on the profiling data.
Breitgand et al. [BKR11] introduces a new model to quantify the trade-off between
minimizing the copy phase duration and maintaining an acceptable quality of service during
the pre-copy phase for CPU/memory-only migration.
Akoush et al. [ASR+10] characterize the parameters affecting live CPU/memory-only
migration with particular emphasis on the Xen virtualization platform. They provide two
simulation models to predict memory migration time.
Voorsluys et al [VBVB09] present a performance evaluation on the effects of live mi-
gration of virtual machines on the performance of applications running inside Xen VMs.
It shows that in most cases, migration overhead is acceptable but cannot be disregarded,
especially in systems where service availability and responsiveness are governed by strict
Service Level Agreements.
Zhao et al [ZF07] seeks to provide a model that can characterize the VM migration
process and predict its performance, based on a comprehensive experimental analysis.
Checconi et al [CCS10] addresses the issue of how to meet the strict timing constraints
of (soft) real-time virtualized applications while the virtual machine hosting them is un-
dergoing a live migration. It introduces a stochastic model for the migration process and
reserves resource shares to individual VMs.
70
Chapter 4
Coordinated Migration of Multi-tier Applications
Although existing live migration techniques [KVM, CFH+05, NLH05] are able to migrate
a single VM efficiently, those techniques are not optimized for migrating related VMs in a
multi-tier application. When the entire virtual disk is being migrated, the amount of data
to move and the time it takes to perform such a move become non-trivial. Given the fact
that the VMs running a multi-tier application are highly interactive, a serious issue is that,
during the migration, the performance of the application can degrade significantly if the
dependent components of an application are split between the source and the destination
sites by a high latency and/or congested network path.
Figure 4.1 shows an example of migrating a 3-tier e-commerce application from one
cloud to another. Note that this inter-cloud migration example is not the only scenario that
can suffer from performance degradation. The limited bandwidth scenario can arise within
a campus or even within a machine room.
In this example, the application has 4 VMs (shown as ovals) implementing a web server,
two application servers, and a database server. An edge between two components in the
figure indicates that those two components communicate with one another. We define a
performance metric called the performance degradation time, which is the time period dur-
ing which any communicating components are split over the source and destination sites.
When such a split happens, certain inter-component communications must be conducted
over the bandwidth limited and/or high latency network, leading to degraded application
performance. Specifically, this example shows that two existing migration strategies, se-
quential and parallel migration, may result in poor performance. Sequential migration,
which migrates each VM one by one, results in a long performance degradation time from
71
when the first VM finishes migration until the last VM finishes migration. Parallel mi-
gration, which starts migration of multiple VMs at the same time, is not able to avoid the
degradation either. This is because the amount of data to migrate for each VM is different
and therefore the VMs in general will not finish migration simultaneously. The application
will experience performance degradation until all VMs have completed migration. Fur-
thermore, if the bandwidth required for migrating all VMs in parallel exceeds the actual
available bandwidth, additional performance problems will result (See the Challenge 2 in
section 4.3).
(a) Before migration
(b) With two migration strategies
Figure 4.1 : Sequential and parallel migration of a 3-tier web application across clouds.
72
In this chapter, we formulate the problem in the livemigration of multi-tier applications.
At the same time, we show the quantitative impact of uncoordinated multi-tier application
migration. We present a new communication-cost-driven coordinated approach, as well as
a system called COMMA (Coordinated migration of multi-tier applications) that realizes
this approach. Experimental results show the great capability and benefits of the coordi-
nation system. We also demonstrate the functions of COMMA in EC2, the most popular
commercial cloud environment.
4.1 Problem Formulation
Let n be the number of VMs in the multi-tier application and the set of VMs be
{vm1, vm2, ..., vmn}. The goal of the multi-tier application migration problem is to min-
imize the performance degradation caused by splitting the communicating components
between source and destination sites during the migration. Specifically, we propose a
communication-cost driven approach. To quantify the performance degradation, we de-
fine the unit of cost as the volume of traffic between VMs that need to crisscross between
the source and destination sites during migration. More concretely, by using the traffic
volume to measure cost, components that communicate more heavily are treated as more
important. While many other metrics could be selected to evaluate the cost, e.g. the end-to-
end latency of requests, the number of affected requests, performance degradation time, we
do not adopt them for different reasons. We do not adopt the end-to-end latency of requests
and the number of affected requests because it is application dependent and requires extra
support for measurement from application level. We do not adopt performance degradation
time because it ignores the communication rate between components. We define the cost
as the volume of traffic which does not require any extra support from application and is
application independent.
Let traffic matrix TM represent the communication traffic rate between any two VMs
prior to the start of migration. Figure 4.2 shows an example about how to compute the cost.
There are 3 VMs and any of the two VMs talks to each other. Our cost model is based
73
Figure 4.2 : An example about cost computation for 3 VMs
on the traffic prior to migration rather than the traffic during migration. During migration,
the traffic rate of the application may be distorted by a variety of factors such as network
congestion between the source and destination sites and I/O congestion caused by the data
copying activities. Therefore, we cannot optimize against the traffic rate during migration
because the actual importance of the interaction between components could be lost through
such distortions. Let migration finish time for vmi be ti. Our goal is to minimize the total
cost of migration, where:
cost =n
∑
i=1
n∑
j>i
|ti − tj | ∗ TM [i, j] (4.1)
4.2 Quantitative Impact of Uncoordinated Multi-tier Application Mi-
gration
Multi-tier applications can involve many components. Amazon web service architecture
center [Amab] provides users with the necessary guidance and best practices to build appli-
cations in the cloud. It also provides architectural guidance for design and implementation
of systems that run on the Amazon web service infrastructure. There are common config-
urations from Amazon for different types of applications. Figure 4.3 shows the reference
three-tier architecture for highly-scalable and reliable web or mobile-web applications.
The three tiers are web server tier, application server tier and database server tier.
There is inter-tier communication between webserver/application server tiers and appli-
74
cation server/database server tiers. HTTP requests are handled by load balancing servers
which automatically distributes incoming application traffic across multiple web servers.
If the request requires further processing from application servers, web servers will send
requests to the load balancers between web servers and application servers. Once the ap-
plication server gets the requests, it may need to query database servers for more required
data. Besides talking to application servers, the database servers have intra-tier commu-
nication. Database servers talk to each other for greater fault tolerance with master/slave
mode.
The number of servers deployed in the application is adjusted up or down according to
user-defined conditions. For a popular application with large traffic demand, more VMs are
deployed to maintain performance. Otherwise, for a less popular application, fewer VMs
are deployed to minimize costs.
Figure 4.3 shows examples of multi-tier e-commerce applications in Amazon
AWS [Amab] with different numbers of VMs. In order to gain quantitative insights, we
perform numerical analysis to illustrate the potential performance degradation experienced
when such applications are migrated from one cloud to another.
Assume that the VMs have the characteristics in Table 4.1, and the available migra-
tion bandwidth is 256Mbps which is shared by all VM’s migrations. The parameters that
we select are image size, memory size, dirty set size and max dirty rate. These are four
key parameters for determining the migration time as we discussed in the migration time
model in Chapter 3. The four parameters depend on different types of configuration and
workload. We select a set of common configurations to demonstrate the problem. The
image size and memory size follow the recommendation from VMware benchmark con-
figuration [VMW10]. Dirty rate is the rate of generated dirty blocks during dirty iteration.
Dirty set is the size of dirty blocks at the end of pre-copy stage. Different workloads have
different dirty rate and dirty sets. We use a higher value for database server to mimic the
intensive write disk operations in database servers. Degradation time is defined as the total
time when two interacting components are split over two clouds.
75
Two migration approaches based on existing techniques are considered in the analysis.
The first approach is sequential migration in which VMs are migrated one by one. The
migration speed is assumed to reach the available migration bandwidth. The total migration
time is the sum of each VM’s migration time. Each VM’s migration time is computed asImage Size+Mem Size
Bandwidth + Dirty Set
(Bandwidth−Max Dirtyrate2
)
The second approach is parallel migration, which starts concurrent migration of all VMs
at the same time. We allocate the migration bandwidth to each VM according to its maximal
dirty rate for convergence. If the available migration bandwidth is larger than the sum of
VMs’ maximal dirty rates, the remaining bandwidth is evenly distributed to each VM. The
migration time for each VM is computed by applying the VM’s specific information and
available migration bandwidth into above equation. The total migration time for all VMs is
decided by the longest migration time for a single VM. The performance degradation time
is obtained by computing the difference between the migration finish times for VMs.
The results in Table 4.2 shows that existing solutions lead to large performance degra-
dation time. Sequential migration incurs long degradation time, which increases with the
size of the application topology. Although parallel migration has a shorter degradation
time, it is still very significant (tens of seconds). Furthermore, a serious problem is that
migration cannot converge when the number of parallel migrated VMs exceeds 5, because
the sum of the VMs’ dirty rates is greater than the available bandwidth and thus dirty data
copying cannot finish unless I/O throttling is employed to reduce the dirty rates.
In summary, quantitative analysis of migration impact on multi-tier applications shows
that multi-tier application could be very complex and today’s migration methods lead to
significant performance degradations.
4.3 System Design
To address the problem introduced above, we design a system called COMMA to coordi-
nate the migration of multi-tier applications. COMMA is the first migration coordination
system for multiple VMs using a series of scheduling algorithms.
76
Component Type Image Mem Dirty Max Dirty
Size Size Set Rate
Web/App Server 8GB 1GB 100MB 2MBps
Load Balancer
Database 8GB 1GB 1GB 15MBps
Table 4.1 : Example VM and workload parameters. Dirty set is defined as the data byteswritten on the VM’s virtual disk at the end of disk image copy. Dirty rate is defined as thespeed at which VM’s virtual disk and memory is written.
Sequential Parallel
Migration Migration
Migration Degradation Migration Degradation
Time(s) Time(s) Time(s) Time(s)
2VM 620 328 620 32
3VM 912 1238 912 58
4VM 1205 1820 1205 50
5VM 1498 2111 1498 44
6VM 1826 4042 INF INF
7VM 2118 4624 INF INF
8VM 2410 4915 INF INF
9VM 2739 8158 INF INF
Table 4.2 : Degradation time with sequential and parallel migration. INF means migrationcould not converge and thus the migration time is infinite.
77
Figure 4.3 : Examples of multi-tier web services.
Figure 4.4 : An example of coordinating the migration with COMMA
The system consists of two parts: 1) a centralized controller running on a hypervisor
on the migration source network, and 2) a local process per VM running on each VM’s
hypervisor to govern each VM migration. The local process provides two functions: 1) it
periodically reports to the controller about the migration status to let the controller make
the progress management decision, and 2) it exposes a control interface, which receives
78
messages from the controller and adjusts the migration progress accordingly. The reported
migration status includes actual migration speed, actual dirty blocks, actual dirty rate, pre-
dicted dirty set at the end of pre-copy and predicted maximal dirty rate. Based on the
migration status, the controller periodically executes the scheduling algorithm to compute
the proper settings for each VM migration process in order to achieve COMMA’s perfor-
mance objective. The controller sends control messages to the local processes. The control
messages include the migration speed and when the migration speed should be set. Then
each local process would implement the controller’s decisions to achieve the overall ob-
jective of finishing the migration with minimal degradation cost. More details about the
control interface is introduced in Pacer’s implementation in Chapter 3.4.1.
The full migration of multiple VMs is scheduled into small intervals. Before migra-
tion, the user provides the list of VMs to be migrated as well as the source hypervisors
and destination hypervisors to COMMA. COMMA queries the source hypervisors for each
VM’s image size and memory size. At the same time, COMMA uses iperf [ipe] to measure
the available network bandwidth between the source and destination sites and uses iptraf
[ipt05] to measure the traffic matrix for the communication rates among VMs. At the begin-
ning, the measured network bandwidth is considered as the available migration bandwidth.
However, we do not only rely on this measurement. We break the migration time into short
intervals where we update and recompute the available migration bandwidth in each inter-
val. In each interval, we assume the bandwidth is fixed, and then the system computes the
new estimated available migration bandwidth for the scheduling of next interval.
4.3.1 Subsystem: Pacer
COMMA focuses on the coordination of multiple VMs’ migration where each VM’s mi-
gration progress is handled by Pacer. Pacer provides two types of interfaces to COMMA:
query and control. Pacer is able to response to the query for its actual migration speed,
migration progress, predicted dirty set and predicted dirty rate in the pre-copy phase, and
actual dirty set and dirty rate in the dirty iteration phase. Pacer also provides control inter-
79
faces for COMMA to start migration, stop migration and set desired migration speed.
4.3.2 Challenges and Solutions
In Chapter 1, we have discussed the challenges for multi-tier application migration. We
will discuss how we tackle these challenges with our solutions.
• Higher order control. Fundamentally, each individual VM migration process can
only be predicted and controlled to a certain extent (as shown by Pacer). It is nec-
essary to design a new architecture where a higher order control mechanism governs
all VM migration activities. COMMA designs a centralized controller to coordinate
the migration of VMs in the multi-tier application.
• Inter-VM-migration resource contention and allocation For multiple VM migra-
tions, the convergence issue is more complicated but also more interesting. We need a
mechanism to check whether it is possible to migrate multiple VMs at the same time,
how to combine VM migration in a group for convergence and how to schedule the
migration start and finish time of each group to achieve the goal of minimizing com-
munication cost. COMMA introduces the concept of “valid group” to decide how
to combine VMs into groups for convergence consideration. Then it performs inter-
group scheduling for valid VM groups and intra-group scheduling for each VM in a
group. Inter-group scheduling is to ensure feasibility given the available bandwidth
and to guarantee convergence. Intra-group scheduling is to maximize bandwidth uti-
lization.
• Inter-VM-migration dynamicity and interference Interference among multiple
VM migrations exists. When multiple VM migrations occur in the same period,
they will share the available resource. COMMA collects the actual migration speed
and progress from each VM and makes adjustment based on the feedback.
• System design and efficiency The computation complexity for an optimal solution
to coordinate a multi-tier application could be very high. It is important that coordi-
80
nation system is efficient and with low overhead. COMMAwith a heuristic algorithm
in the scheduling is able to reduce the computation overhead by 99% while achieving
96% of the optimal performance in our experiments 4.4.
4.3.3 Scheduling Algorithm
The algorithm works in two stages. In the first stage, it coordinates the migration speed of
the static data of VMs (Phase 1) so that all VMs complete the precopy phase at nearly the
same time. In the second stage, it coordinates the migration of dynamically generated data
(Phase 2, 3, 4) by inter-group and intra-group scheduling. The definition of four phases in
migration is in Chapter 3.
Phase 1 migrates static content. Thus there is no inherent minimum speed requirement.
Phase 2 and 3 migrate dynamically generated content. The content generation rate implies
a minimum migration speed which must be achieved or else throttling might become nec-
essary (which causes application performance degradation). Therefore, we should dedicate
as much of the available bandwidth to phase 2 and 3 in order to prevent application per-
formance degradation. This clearly implies that the phase 1 migration activities should not
overlap with phase 2 and 3. More discussion about adapting to changing dirty rate and
bandwidth is in Section 4.3.6.
4.3.3.1 First stage
The goal of the first stage is to migrate VMs in parallel and finish all VMs phase 1 at the
same time. Assuming the data copying for each VM is performed over a TCP connection,
it is desirable to migrate VMs in parallel because the aggregate transmission throughput
achieved by the parallel TCP connections tend to be higher than a single TCP connection.
In this stage, the amount of migrated data is fixed. The controller adjusts each VM’s
migration speed according to its virtual disk size (See Equation 4.2).
During the migration, the controller periodically gathers and analyzes the actual avail-
able network bandwidth, migration speeds and the progresses of VMs. Then it leverages
81
Figure 4.5 : An example of valid group.
the maximal speed prediction and tuning algorithms in our previous migration progress
management system Pacer to pace the migration of the whole set of VMs.
speedvmi=
DISK SIZEi∗BANDWIDTH
TOTEL DISK SIZE
(4.2)
Figure 4.4 shows an example of migrating 4 VMs with COMMA. The user submits a
migration request to the controller with the logical topology of the application, VM config-
uration, traffic matrix, possible network bandwidth and destination hypervisors’ addresses.
The controller coordinates the migration of 4 VMs such that their precopy phases complete
at the same time. At the end of the first stage, each VM has recorded a set of dirty blocks
which require retransmission in the next stage.
82
4.3.3.2 Second stage
In the second stage, we introduce the concept of “valid group” to overcome the second chal-
lenge above. COMMA performs inter-group scheduling to minimize the communication
cost and intra-group scheduling to efficiently use network bandwidth.
To satisfy the convergence constraint, the VMs in the multi-tier application are divided
into valid groups according to the following rule: the sum of VM’s maximal dirty rate in a
group is no larger than the available network bandwidth (See Equation 4.3). The maximal
dirty rate is usually achieved at the end of dirty iteration, since at this time most blocks
are clean and they have high probability of getting dirty again. The maximal dirty rate is
needed before the second stage but it is unknown until the migration finishes, and thus we
leverage a dirty rate estimation algorithm which has been shown to work well in Chapter 3
to estimate the maximal dirty rate before the second stage starts.
∑
vmi∈group
{Max dirty ratei} ≤ BANDWIDTH (4.3)
Figure 4.5 shows an example about how to compute valid groups for the 3 VMs demon-
strated in Figure 4.2. The maximal dirty rates for VM1, VM2 and VM3 are 20MBps,
5MBps and 10MBps. There are six valid groups. Group {V M1, V M2, V M3} is not a
valid group because the sum of maximal dirty rates is larger than the available network
bandwidth 30MBps. The six valid groups could generate four possible combinations of
migration sequence.
4.3.4 Inter-group Scheduling
In order to minimize the performance degradation cost, COMMA needs to compute the
optimal group combination and migration sequence. We propose two algorithms: a brute-
force algorithm and a heuristic algorithm. The brute-force algorithm can find the optimal
solution but its computation complexity is high. The heuristic algorithm reduces the com-
putation overhead by 99% while achieving 96% of the optimal performance in our experi-
83
ments 4.4.
4.3.4.1 Brute-force algorithm
The brute-force algorithm lists all the possible combinations of valid groups, performs the
permutation for different migration sequence and computes the performance degradation
cost. It records the group combination and migration sequence which generates theminimal
cost.
Given a set of VMs, the algorithm generates all subsets first, and each subset will be
considered as a group. The algorithm eliminates the invalid groups that do not meet the
rule above. It then computes all combinations of valid groups that exactly add up to a
complete set of all VMs. Figure 4.4 shows one such combination of two valid groups that
add up to a complete set: {vm1, vm2} and {vm3, vm4}. Next the algorithm permutes each
of such combination to get sequences of groups, and those sequences stand for different
migration orders. The algorithm then computes the communication cost of each sequence
based on the traffic matrix and the migration time reported from the intra-group scheduling
algorithm. Finally the algorithm will select the group combination and the sequence with
the minimal communication cost.
Let n be the number of VMs in the application. The time complexity for the brute-force
algorithm is O(2n ∗ n!), because it takes O(2n) to compute all the subsets and takes O(n!)
to perform permutation for each combination.
4.3.4.2 Heuristic algorithm
Our heuristic algorithm tries to estimate the minimal cost by prioritizing VMs that need
to communicate with each other the most. Given the traffic matrix, we can get a list
L of the communication rates between any two VMs. Each element in L includes
(rate, V Mi, V Mj). It represents the communication between node V Mi and node V Mj
with rate. The heuristic algorithm takes the traffic matrix as input and generates the VM
group set S as follows. Figure 4.6 shows an example of migrating 4 VMs based on heuristic
84
Figure 4.6 : An example for heuristic algorithm
algorithm.
• Step 1: Sort the communication rates in L by descending order. S is empty
at the beginning. In the example, the list L of communication rates is
{(80, V M3, V M4), (50, V M3, V M1), (20, V M1, V M2), (10, V M2, V M4)}.
• Step 2: Repeatedly take the largest rate element (rate, V Mi, V Mj) from L. Check
whether V Mi and V Mj are already in S
– Case1: Neither V Mi nor V Mj is in S: If the two VMs can be combined into
a valid group, insert a new group {V Mi, V Mj} into S. Otherwise, insert two
groups {V Mi} and {V Mj} into S.
– Case2: Only one VM is in S. For example, V Mi is in S and V Mj is not in
S. Find the group which includes V Mi. Check whether V Mj can be merged
into the group based on the convergence constraint in Equation 4.3. If it is still
a valid group after merging, then V Mj is merged into the group. Otherwise, a
85
new group {V Mj} is inserted into S. For the case that V Mj is in S and V Mi
is not, it is similar.
– Case3: Both V Mi and V Mj are in S: If the two groups can be merged into one
group with convergence constraint, then merge the two groups.
In the example, we take the maximal rate (80, V M3, V M4) first. Neither V M3
nor V M4 is in S. It matches case 1. (V M3, V M4) can be combined into a valid
group. We insert a new group (V M3, V M4) into S. Then we take the second rate
(50, V M1, V M3). It matches case 2 because only V M3 is in S. V M1 cannot be
merged into the valid group (V M3, V M4). A new group V M1 is inserted into S.
Then, we take the rate (20, V M1, V M2). Similarly, it matches case 2 again because
only V M1 is in S. V M2 can be merged into the valid group (V M1). For the last rate
(10, V M2, V M4), both VMs are in S. It matches the case 4. The two groups can not
be merged into one group with convergence constraint, so we will not merge the two
groups.
• Step 3: At the end of step 2, we have S which includes the valid group of VMs. The
algorithm then compares permutations on the groups to find the one with minimal
cost.
The time complexity for the heuristic algorithm is O(n2logn + n2 + n!). Sorting in the
step 1 takes O(nlogn). In the worst case, there are n2 elements in the list L which means
every VM communicate with the other VMs. The permutation in step 3 takes O(n!) in the
worst case when each VM forms a group.
4.3.5 Intra-group Scheduling
To migrate the VMs in a valid group, one possible solution is to allocate bandwidth equals
to the VM’s maximal dirty rate to the corresponding VM. Then, we start the migration of
all VMs in the group at the same time. The definition of valid group guarantees that we
have enough bandwidth to support all VMs in the group migrating concurrently.
86
Figure 4.7 : Intra-group scheduling. (a) Start VM migrations at the same time, but finishat different times. Result in long performance degradation time. (b) Start VM migrationsat the same time and finish at the same time. Result in long migration time due to theinefficient use of migration bandwidth. (c) Start VMmigrations at different times and finishat the same time. No performance degradation and short migration time due to efficient useof migration bandwidth.
However, starting the VMs’ migration at the same time is not an efficient use of avail-
able migration bandwidth. Figure 4.7 shows the migration of three VMs in the dirty it-
eration with different mechanisms to illustrate this inefficiency. Figure 4.7(a) shows that
3 VMs start dirty iteration of migration at the same time. Different VM have different
migration speed and dirty rate. Therefore, they finish migration at different times without
coordination. For example, V M1 takes 5 minutes to migrate most of the dirty blocks/pages.
Then it could enter phase 4 to pause the VM and switch over to run in the destination. V M3
may take 10 minutes to finish. That result sin 5 minutes of performance degradation. Re-
calling that the goal of COMMA is to reduce the performance degradation cost during
migration. Therefore, the ideal case is that the VMs in the group finish migration at the
same time. In order to make them finish at the same time, we could force V M1 and V M2
to hold in the dirty iteration and continue migrating new generated dirty blocks until V M3
is done as Figure 4.7(b) shows. This mechanism is not efficient because it wastes a lot of
migration bandwidth in holding V M1 and V M2 in the dirty iteration.
To efficiently use the migration bandwidth, the algorithm schedules the migration of
87
VMs inside a group to finish at the same time but it allows them to start the dirty iteration
at different times as Figure 4.7(c) shows.
The design is based on the following observations in practice. (1) Delaying the start
time of VMs with light workload can allow for more bandwidth to be allocated to VMs
with heavy workload. (2) At the end of the first stage, most of the VM’s frequently written
blocks are already marked as dirty blocks, and the dirty rate is low at this time. Therefore,
delaying the start time of dirty iteration will not significantly increase the number of dirty
blocks. (3) Once the dirty iteration starts, it is better to finish migration as soon as possible
to save the bandwidth.
We run migrations of a file server with 30 clients to demonstrate our observations.
Figure 4.8 shows the dirty rate for the two experiments. Figure 4.8(a) shows the migration
without any delay for the dirty iteration. From 0 to 280s, migration is in the pre-copy
phase and its dirty rate is very stable around 32KBps. Dirty iteration start from 280s to
350s. The dirty rate is very low at the beginning and increases as dirty iteration proceeds.
Figure 4.8(b) show the migration with 35s delay on the start of dirty iteration. During this
period, we can see the dirty rate is almost zero. It means there is no more clean blocks
getting dirty.
Initially we assume that the minimal required speed for each VM is set equal to the
VM’s maximal dirty rate, and then the migration time for each VM is estimated based
on a time model in Chapter 3. The algorithm would schedule different starting times for
VMs according to their estimated migration time so that every VM is expected to finish the
migration at the same time.
Available network bandwidth may be larger than the sum of VM’s minimal required
migration speed. If there is extra available bandwidth, the bandwidth will be further allo-
cated to the VMs to minimize the total migration time of the group. This allocation is done
iteratively. Suppose the group has N VMs, the extra available bandwidth is first allocated
to vmN , where the subscript indicates the VM’s start time order in the schedule. That is,
vmN is the VM that starts the latest in the schedule. The allocation of this extra bandwidth
88
0
10000
20000
30000
40000
50000
60000
0 50 100 150 200 250 300 350 400
Dirt
y R
ate
(B/s
)
Migration Time (s)
No delaying on the start of dirty iteration
(a) No delay
0
10000
20000
30000
40000
50000
60000
70000
80000
0 50 100 150 200 250 300 350 400 450 500
Dirt
y R
ate
(B/s
)
Migration Time (s)
Delay 35s on the start of dirty iteration
(b) Delay 35s
Figure 4.8 : An example of delaying the start of dirty iteration for the migration.
89
reduces vmN ’s migration time, and thus its start time can be moved closer to the finish
time target in the schedule. Next, the extra available bandwidth prior to the start of vmN
is given to vmN−1. vmN−1’s migration time is thus reduced also. Then the extra available
bandwidth prior to the start of vmN−1 is given to vmN−2 and so on, until the migration
time for the first VM to start is also minimized.
4.3.6 Adapting to changing dirty rate and bandwidth
The maximal dirty rate and migration bottleneck are the key input parameters in the
scheduling algorithm. When those parameters fluctuates, COMMA is able to handle it with
adaptation. COMMA will periodically estimate the maximal dirty rate, measure the avail-
able bandwidth and recompute the schedule for not-yet-migrated groups. When COMMA
detects that available bandwidth is smaller than the sum of any two VM’s maximal dirty
rate, the migration will be degraded to sequential migration to ensure convergence. In the
extremely rare case, if the available bandwidth is smaller than a single VM’s maximal dirty
rate, throttling is performed to that VM such that the dirty rate is reduced and migration
could converge.
4.3.7 Putting it all together
In order to put all algorithms together, we show an example (see Figure 4.9) of applying
scheduling on the migration of an application with 4 virtual machines. In the first stage, we
migrate static content. The goal is to migrate VMs in parallel and finish all VMs phase 1 at
the same time. Then the second stage starts. In the second stage, we introduce the concept
of valid group to address the second challenge about convergence. COMMA performs
inter-group scheduling to minimize the communication cost and intra-group scheduling
to efficiently use network bandwidth. In the example, we compute all the possible valid
groups and then select the optimal group combination {V M3, V M4} and {V M1, V M2}.
With inter-group scheduling, the valid group {V M3, V M4} will be migrated before the
valid group {V M1, V M2}. To efficiently use the migration bandwidth, COMMA schedules
90
Figure 4.9 : An example of scheduling algorithm (Put all together)
the migration of VMs inside a group to finish at the same time but it allows them to start the
dirty iteration at different times. The overall purpose is still to minimize the performance
degradation cost. We demonstrate a possible migration scheduling in the example for the
two valid groups.
4.4 Evaluation
4.4.1 Implementation
The implementation is based on the kernel-based virtual machine (KVM) platform. KVM
consists of a loadable kernel module, a processor specific module, and a user-space program
– a modified QEMU emulator. QEMU performs management tasks for the VM. COMMA
is implemented on QEMU version 0.12.50. A centralized controller is implemented in
C++.
91
4.4.2 Experiment Setup
The experiments are set up on six physical machines. Each machine has a 3GHz Quad-
core AMD Phenome II X4 945 processor, 8GB RAM, 640GB WD Caviar Black SATA
hard drive, and Ubuntu 9.10 with Linux kernel (with the KVM module) version 2.6.31.
4.4.3 Migration of a 3-tier Application
In this experiment, we migrate RUBiS [RUB], a well-known benchmark for auction web-
site. RUBiS includes one web server, two application servers and one database server, with
the topology as Figure 4.3 shows. The 4 VMs are deployed on at most 3 physical machines
with different placement setup to mimic the randomness of VM placement policy on public
clouds. The 4 VMs have the same image size of 8GB and the memory size is 2GB for web
server and application server, 512MB for database server. The workload is 300 clients. The
migration bottleneck is caused by I/O write speed on the destination disk which is at most
15MBps. Table 4.3 shows the result.
Sequential migration has the longest migration time and the highest cost in all cases.
More than 2GB data are affected by sequential migration. Parallel migration reduces the
cost to less than 1GB, but it is still much higher than the cost with COMMA. COMMA has
up to 475 times of reduction on the number of data bytes affected by migration.
COMMA has a slightly larger migration time than parallel migration. It is because
COMMA tries to make VMs’ migrations finish at the same time but parallel migration
does not. When some VMs finish earlier, the other undergoing VMs which share the same
resources can take advantages of the released resource and make migration finish earlier.
4.4.4 Manual Adjustment does not Work
While the above experiment runs sequential and parallel migration, one could try to ad-
just sequential and parallel migration to better support multi-tier application migration by
reordering the migration sequence in sequential migration and manually configuring the
migration speed in parallel migration based on static migration info, e.g. VM disk size.
92
VM Sequential Parallel COMMA
Placement Migration Migration Migration
Migration Cost Migration Cost Migration Cost
Time(s) (MB) Time(s) (MB) Time(s) (MB)
{web,app1,app2,db} 2289 2267 2155 13 2188 7
{web,db},{app1,app2} 2479 2620 918 72 1043 2
{web,app1},{db,app2} 2425 2617 1131 304 1336 2
{web}{app1,app2}{db} 2330 2273 914 950 926 2
{web,app1}{app2}{db} 2213 1920 797 717 988 4
{web}{app1}{app2,db} 2310 2151 1012 259 1244 5
Table 4.3 : Comparisons of three approaches on 3-tier applications. {...} represents theVM set on one physical machine.
However, in this experiment, we show that such mechanism cannot achieve the goal of
minimizing the degradation cost.
The experiment is based on SPECweb2005, which is another popular web service
benchmark [SPE]. It contains a frontend Apache server and a backend server that works
as a database. The image sizes for the two VMs are 8GB and 16GB respectively. The
workload has 50 clients. Table 4.4 shows the results of six migration methods. The first
two methods are sequential migrations with different VM orders. The sequential migration
method causes large degradation cost 265MB and 139MB respectively for two different
migration orders.
The next three methods are based on parallel migration. In the first experiment, both
VMs are configured with the same migration speed of 32MBps, and they do not finish at
the same time, with the cost of 116MB. In the second experiment, the speed for frontend
VM (8GB) is set to be 16MBps and keep the speed for the backend VM (16GB) to be
32MBps. With the change of configured migration speed proportional to the image size, the
user may expect the two migrations to finish at the same time. However, the result is not as
93
Sequential Migr. Parallel Migr. COMMA
frontend backend 32/32 16/32 5/10
first first MBps MBps MBps
Cost(MB) 265 139 116 122 9 0.2
Migr.
Time(s) 1584 1583 1045 1043 1697 1043
Table 4.4 : Manual adjustment on the configured speed is hard in achieving low cost andsmall migration time.
expected because the migration can not achieve the configured speeds most of the time (I/O
bottleneck is 15MBps). To further reduce the time gap in the previous parallel migration
method, a conservative solution is to reduce the configured speeds for both VMs. In the
third parallel migration experiment the configured speeds are set to be 5MBps and 10MBps.
The degradation time reduces to 36s and the cost reduces to 9MB, but the low speed brings
a side effect of longer migration time. The three experiments show that it is impossible for
users to set the migration speed statically to achieve low performance degradation cost and
timely migration simultaneously. In a real cloud environment, with additional competing
traffic or more intensive workload, guessing the proper speed configurations will be even
harder. With COMMA, the controller can coordinate the migration progress of the two
VMs automatically. The two VMs finish the migration as fast as possible and have only
0.2MB of cost.
4.4.5 Algorithms in Inter-group Scheduling
In the experiment, we evaluate the brute-force algorithm and the heuristic algorithm in
performance degradation cost and the computation time. We run simulations to evaluate
the different migration approaches on the multi-tier web services as Figure 4.3 shows. We
generate random number between 0 to 100KBps as the communication rate when there is a
94
Sequential Parallel COMMA-Bruteforce COMMA-Heuristic
Migration Migration Migration Migration
2VM 28 3 0 0
3VM 84 3 0 0
4VM 114 3 0 0
5VM 109 3 0 0
6VM 222 INF 1 2
7VM 287 INF 2 2
8VM 288 INF 1 2
9VM 424 INF 9 13
Table 4.5 : Performance degradation cost (MB) with different migration approaches
link between two VMs. Each experiment is run for 3 times with different random number
seeds. Table 4.5 shows the average results. In the first four cases (V M ≤ 5), all VMs can
be coordinated to finish at the same time and the cost is 0. In larger scales of (V M ≥ 6),
the coordination algorithm will try its best to schedule VM’s migration and achieve the
minimal cost. The coordination with the brute force algorithm achieves a slightly lower cost
than the coordination with the heuristic algorithm. Comparing to the sequential migration,
COMMA with the brute-force algorithm could reduce the cost by 97.9% and COMMA
with the heuristic algorithm could reduce the cost by 96.9%. COMMA use the heuristic
algorithm in the scheduling achieves 96% of the optimal performance in our experiments.
Figure 4.10 shows the computation time for the brute-force algorithm and the heuristic
algorithm. When the number of VM increases to 9, the computation time for the brute-
force algorithm sharply increases to 32 seconds while the computation time for the heuristic
algorithm is still very low, 274 micro seconds. COMMA with a heuristic algorithm in the
scheduling reduces the computation overhead by 99%.
95
10
100
1000
10000
100000
1e+06
1e+07
1e+08
2 3 4 5 6 7 8 9
Com
puta
tion
Tim
e (u
s)
The number of VMs
Brute-forceHeuristic
Figure 4.10 : Computation time for brute-force algorithm and heuristic algorithm
4.5 EC2 demonstration
To demonstrate COMMA in a real commercial hybrid cloud environment, we conduct an
experiment using Amazon EC2 public cloud. We migrate two VMs from the Rice Univer-
sity campus network to EC2 instances with the same settings as the experiment in the last
section except that, in the SPECweb2005 setting, a client number of 10 is used as the de-
fault workload. The SPECweb 2005 webserver cluster consists of one 8G front webserver
and another 16G database server. For the EC2 setting, we use High-CPUMedium instances
running Ubuntu 12.04. EC2 instances do not support KVM, and thus QEMU runs on the
”no-kvm” mode. The reason we decrease the application workload in this EC2 experiment
is that the performance of the VM running on EC2 instance is very slow without KVM
kernel support, and the decreased workload ensures that the dirty iteration migration stage
96
converges.
The result is in Table 4.6. There are three different migration approaches other than
COMMA. In the sequential approach, the performance degradation time is equal to the
time of migrating the last VM, because the two VMs will stay in two different data centers
from the time when the first VM finishes the migration until when the second VM finishes
the migration. For the parallel approach with the same configured migration speed for both
VMs, the degradation cost is still 19 MB, which is not that different from 28MB and 17MB
in the sequential approach. The reason for the degradation cost is that the parallel approach
still has no control over the migration progress, and multiple VMs could conduct the mi-
gration at different speeds depending on network dynamics and thus finish the migration at
different times. The second and third approaches in parallel migration is to set the migra-
tion speed to be proportional to the size of the VM image, with the expectation of finishing
the migration of all VMs at the same time. We set the migration speed to 32MBps/16MBps
and 10MBps/5MBps in two experiments. The degradation cost decreases to 6MB, which is
much smaller than that in the previous two approaches. However, these approaches do not
fully utilize the available bandwidth, and thus the migration time could increase, especially
in the case with the migration speed of 5/10 MBps which could be less than the available
bandwidth. For COMMA, the degradation cost is only 0.1MB, and the migration time is
the shortest because it utilizes bandwidth more efficiently. The above results show that
COMMA is able to successfully coordinate the migration of multi-tier applications across
two data centers with extremely low degradation cost. COMMA reduces the degradation
cost by 190 times compared to parallel migration.
4.6 Related work
To the best of our knowledge, we are the first group to address the problem of live migration
of multi-tier applications. There is no previous work directly comparable. There is related
work on applying prefetch strategy [NC12] and deduplication techniques [AKSSR11] in
multiple simultaneous migrations.
97
Sequential Migr. Parallel Migr. COMMA
Migration Migration
frontend backend 32/32 16/32 5/10
first first MBps MBps MBps
Cost(MB) 28 17 19 6 6 0.1
Migr.
Time(s) 871 919 821 885 1924 741
Table 4.6 : Migration methods on EC2 experiment.
Nicolae et al. [NC12] proposes a hypervisor-transparent approach for efficient live mi-
gration of I/O intensive workloads. The focus of their work is on optimizing single VM
migrations. It relies on a hybrid active push-prioritized prefetch strategy to speed up mi-
gration and reduce migration time, which makes it highly resilient to rapid changes of disk
state exhibited by I/O intensive workloads.
AI-Kiswany [AKSSR11] employs data deduplication in live migration to reduce the mi-
gration traffic. It presents VMFlockMS, a migration service optimized for cross-datacenter
transfer and instantiation of groups of virtual machine images. VMFlockMS is designed to
deploy a set of virtual appliances by making efficient use of the available cloud resources
to locally access and deduplicate the images and data in a distributed fashion with minimal
requirements imposed on the cloud API to access the VM image repository. Deduplication
is an orthogonal approach to COMMA in that the problem is to reduce migration traffic
rather than minimize performance degradation cost.
98
Chapter 5
Conclusion and Future Work
Cloud computing is a rapidly expanding, multi-billion-dollar business. Amazon’s abil-
ity to challenge IBM for a $600-million federal cloud project signals this new era [clob].
Smaller and more nimble cloud-based services providers will spur competition and enable
agency transformation [clob]. More than 700 IT professionals in six countries across the
globe agreed virtualization technology contributes significantly to data center management
challenges, indicating the impact is undeniable and vast [vira]. Live migration of virtual
machine serves as a powerful management tool for planned maintenance, load balancing,
avoiding single-provider lock-in and enterprise IT consolidation. Surprisingly, two prob-
lems exist in today’s live migration system.
Firstly, as far as we know, none of the existing live migration systems can accurately
predict the migration time or control migration time. It is hard for the system administrators
to schedule migration with reasonable time reservation. This fact led us to the adaptive pac-
ing approach which makes the migration finish time predictable and controllable. Our first
contribution is Pacer – the first system capable of accurately predicting the migration time,
coordinating the migrations of multiple application components to minimize the perfor-
mance degradation, and managing the progress so that the actual migration finishing time
is as close to a desired finish time as possible. Through extensive experiments including
a real-world commercial cloud scenario with Amazon EC2, we show that Pacer is highly
effective. The approach is crucial in many VM migration use cases. Just as importantly,
as Pacer demonstrates, the adaptive pacing approach can be realized successfully in prac-
tice. The details in modeling the migration, estimating the remaining work, controlling the
speed, and adapting to dynamics in Pacer are intricate, but the resulting system is highly
99
effective and has negligible overhead.
Recently we have extended Pacer by providing a new function for prediction migration
time before migration begins. The main addition is to monitor the disk I/O workload and
to measure the available network bandwidth for a short period of time, e.g. 3 minutes, and
to use these observations for migration time prediction. We have found that the prediction
accuracy is as good as the prediction during migration. The new function is helpful for
operators for planning and scheduling cloud management tasks.
Secondly, multi-tier application architectures are widely employed in today’s virtual-
ized cloud computing environments. Although existing solutions are able to migrate a
single VM efficiently, little attention has been devoted to migrating related VMs in multi-
tier applications. Ignoring the relatedness of VMs during migration can lead to serious
application performance degradation if the dependent components of an application are
split between the source and destination sites by a high latency and/or congested network
path. Simply migrating all related VMs in parallel is not enough to avoid such degradation.
To tackle the above problem, we propose an original migration coordination system for
multi-tier applications. The system is based on a scheduling algorithm for coordinating the
migration of VMs that aims to minimize migrations impact on inter-component communi-
cations. We formulate the problem of livemigration of multi-tier applications and introduce
COMMA, a migration coordination system that minimizes the performance degradation of
the application during migration. Using a fully implemented system on KVM, we show
that the system is highly effective in decreasing the performance degradation cost and min-
imizing migration’s impact on inter-component communications.
We intend to make Pacer and COMMA freely available to the community. There are
many possibilities for extending Pacer and COMMA in the future.
5.1 Migration Progress Management with Optimization Techniques
In the Chapter 2, we discuss many optimization techniques for live migration with schedul-
ing, compression, and deduplication. Current designs for Pacer and COMMA are based on
100
the migration time model without optimization. It will be interesting to extend the design
of Pacer and COMMA to support the live migration with optimization techniques. We take
the migration with block reordering sequence for example.
Our previous research about workload-aware storage migration scheduling algo-
rithm [ZNS11] aims at improving storage I/O performance during wide-area migration
by reordering block migration sequence. Rather than copying the storage from beginning
to end, the algorithm deliberately compute a schedule to transfer storage at the appropri-
ate granularity which we call chunk and in the appropriate order to minimize performance
degradation. To integrate Pacer and COMMA in the a migration systemwith block reorder-
ing sequence, we just need to simply adjust the way to computeDIRTY SET SIZE and
AV E DIRTY RATE in the Equation (3.3) of the migration time model in Section 3.
Dirty set estimation and dirty rate estimation algorithms need to be extended to leverage
the new sequence. For example, block1 is the first block which will be migrated without
reordering. It has a high possibility to get dirty during pre-copy stage. With recording, it
may be the last block to be migrated in the pre-copy stage and may be not written in the
pre-copy stage. In such case, this block will not be included in the dirty set. Therefore,
Pacer just need to take the reordering migration block sequence as input and run the algo-
rithms for dirty set and dirty rate estimation to update the amount of dirty blocks and the
time of dirty iteration.
5.2 Migration Progress Management with Migration Planning
In this thesis, we focus on the migration process itself. However, live migration is a cloud
management operation. Before live migration operation is issued, the cloud administra-
tor needs to do planning for migration. For example, when the administrator detects that
some machines have hardware issues and he wants to shutdown these machines for main-
tenance. He need to prepare the a set of new machines as the migration destination. The
placement of VMs on the destination is an interesting question. It had better keep the same
performance metrics as the source hypervisor. Moreover, migration will require a lot of
101
resource, e.g. network bandwidth, disk bandwidth, CPU and memory of hypervisors both
in the source and destination. The migration will affect other applications running in the
source or destination. In order to minimize the impact on the application performance, the
migration must be conducted in a proper time when the application workload is low. In
our system, our time model does not include the time for planning destination machines.
It will be helpful to take the planning, scheduling and migration all together to control the
expected finish time.
5.3 Migration Progress Management with Task Prioritization
Cloud administrators leverage live migration in many management tasks to improve or
maintain the performance of applications running on the source site. However, the migra-
tion will compete for the network resource, and the prioritization of migration and other
management tasks should therefore be explicitly considered. Let’s assume that there is a
key management task provisioning in the cloud. Provisioning requires to quickly prepare
the VM image, reserve the resource and install required software on the guest OS for users.
If provisioning task is more urgent than live migration task, the cloud administrator could
assign a higher priority to provisioning process. Then live migration could use the remain-
ing bandwidth to do the pre-copy and dirty iteration stages. At the end of dirty-iteration
stage, if the system monitors that migration dirty rate is higher than the available migration
bandwidth, it should automatically increase the task priority for migration. A manage-
ment system with task prioritization is very helpful to favor urgent tasks and also guarantee
migration’s convergence.
102
Bibliography
[AFGea09] Michael Armbrust, Armando Fox, Rean Griffith, and et. al. Above the clouds:
A berkeley view of cloud computing. Technical Report UCB/EECS-2009-28,
EECS Department, University of California, Berkeley, Feb 2009.
[AKSSR11] Samer AI-Kiswany, Dinesh Subhraveti, Prasenjit Sarkar, and Matei Ripeanu.
Vmflock: Virtual machine co-migration for the cloud. In HPDC, 2011.
[Amaa] Amazon Web Service. http://aws.amazon.com.
[Amab] Amazon. Aws reference architecture. http://aws.amazon.com/
architecture/.
[Ash12] Warwick Ashford. Hybrid clouds most popular with UK business, survey re-
veals. http://tinyurl.com/868pxzd, February 2012.
[ASR+10] Sherif Akoush, Ripduman Sohan, Andrew Rice, Andrew W.Moore, and Andy
Hopper. Predicting the performance of virtual machine migration. In IEEE
18th annual international symposium on modeling, analysis and simulation of
computer and telecommunication systems. IEEE, 2010.
[BKFS07] Robert Bradford, Evangelos Kotsovinos, Anja Feldmann, and Harald
Schioberg. Live wide-area migration of virtual machines including local per-
sistent state. In ACM VEE, June 2007.
[BKR11] David Breitgand, Gilad Kutiel, and Danny Raz. Cost-aware live migration of
services in the cloud. In USENIX Workshop on Hot Topics in Management of
Internet, Cloud, and Enterprise Networks and Services. USENIX, 2011.
103
[BL98] Amnon Barak and Oren La’adan. The mosix multicomputer operating sys-
tem for high performance cluster computing. Future Generation Computer
Systems, 13(4):361–372, 1998.
[Blo08] ”Amazon Web Services Blog”. Animoto - Scaling Through Viral Growth.
http://aws.typepad.com/aws/2008/04/animoto—scali.html, April 2008.
[Bri11] North Bridge. Future of Cloud Computing Survey. http://tinyurl.com/7f4s3c9,
June 2011.
[CCS10] Fabio Checconi, Tommaso Cucinotta, and Manuel Stein. Real-time issues
in live migration of virtual machines. In Euro-Par 2009–Parallel Processing
Workshops, pages 454–466. Springer, 2010.
[CFH+05] Christopher Clark, Keir Fraser, Steven Hand, Jacob Gorm Hansen, Eric Jul,
Christian Limpach, Ian Pratt, and Andrew Warfield. Live migration of virtual
machines. In NSDI’05: Proceedings of the 2nd conference on Symposium on
Networked Systems Design & Implementation, pages 273–286, Berkeley, CA,
USA, 2005. USENIX Association.
[Cloa] . http://www.forbes.com/sites/reuvencohen/2013/04/16/the-cloud-hits-the-
mainstream-more-than-half-of-u-s-businesses-now-use-cloud-computing/.
[clob] Mid-year review: 10 predictions for cloud computing.
http://gcn.com/Articles/2013/08/21/cloud-predictions.aspx?Page=3.
[Def] Cloud computing. http://www.ibm.com/cloud-computing/us/en/what-is-
cloud-computing.html.
[DGJ+05] Koustuv Dasgupta, Sugata Ghosal, Rohit Jain, Upendra Sharma, and Akshat
Verma. Qosmig: Adaptive rate-controlled migration of bulk data in storage
systems. In Proc. ICDE, 2005.
104
[DO91] Fred Douglis and John Ousterhout. Transparent process migration: Design al-
ternatives and the sprite implementation. Software: Practice and Experience,
21(8):757–785, 1991.
[Gar12] Gartner. Gartner says worldwide cloud services market. http://www.
gartner.com/newsroom/id/2163616/, 2012.
[GMV10] Ajay Gulati, Arif Merchant, and Peter Varman. mclock:handling throughput
variability for hypervisor io scheduling. In OSDI, October 2010.
[Goo] Googleappengine. https://developers.google.com/appengine/.
[Got08] Derek Gottfrid. The New York Times Archives + Amazon Web Services
= TimesMachine. http://open.blogs.nytimes.com/ 2008/05/21/the-new-york-
times-archives-amazon-web- services-timesmachine/, May 2008.
[HFW+13] Keqiang He, Alexis Fisher, Liang Wang, Aaron Gember, Aditya Akella, and
Thomas Ristenpart. Next stop, the cloud: Understanding modern web service
deployment in ec2 and azure. In IMC, 2013.
[HG09] Michael R. Hines and Kartik Gopalan. Post-copy based live virtual machine
migration using adaptive pre-paging and dynamic self-ballooning. In VEE ’09:
Proceedings of the 2009 ACM SIGPLAN/SIGOPS international conference on
Virtual execution environments, 2009.
[HH09] Stuart Hacking and Benoit Hudzia. Improving the live migration process
of large enterprise applications. In VTDC’09: Proceedings of the 3rd Inter-
national Workshop on Virtualization Technologies in Distributed Computing,
2009.
[HNO+09] Takahiro Hirofuchi, Hidemoto Nakada, Hirotaka Ogawa, Satoshi Itoh, and
Satoshi Sekiguchi. A live storage migration mechanism over wan and its per-
formance evaluation. In VIDC’09: Proceedings of the 3rd International Work-
105
shop on Virtualization Technologies in Distributed Computing, Barcelona,
Spain, 2009. ACM.
[HON+09] Takahiro Hirofuchi, Hirotaka Ogawa, Hidemoto Nakada, Satoshi Itoh, and
Satoshi Sekiguchi. A live storage migration mechanism over wan for relo-
catable virtual machine services on clouds. In CCGRID’09: Proceedings of
the 2009 9th IEEE/ACM International Symposium on Cluster Computing and
the Grid, Shanghai, China, 2009. IEEE Computer Society.
[HSS+10] M. Hajjat, X. Sun, Y.W.E. Sung, D. Maltz, S. Rao, K. Sripanidkulchai, and
M. Tawarmalani. Cloudward bound: planning for beneficial migration of en-
terprise applications to the cloud. In ACM SIGCOMM Computer Communica-
tion Review, 2010.
[IBM] IBM.
[ipe] iperf. http://sourceforge.net/projects/iperf/.
[ipt05] iptraf. http://iptraf.seul.org/, 2005.
[JDW+09] Hai Jin, Li Deng, Song Wu, Xuanhua Shi, and Xiaodong Pan. Live virtual
machine migration with adaptive memory compression. In IEEE International
Conference on Cluster Computing, 2009.
[JDWS09] Hai Jin, Li Deng, Song Wu, and Xuanhua Shi. Live virtual machine migration
integrating memory compression with precopy. In IEEE International Confer-
ence on Cluster Computing, 2009.
[JLHB88] Eric Jul, Henry Levy, Norman Hutchinson, and Andrew Black. Fine-grained
mobility in the emerald system. ACM Transactions on Computer Systems
(TOCS), 6(1):109–133, 1988.
[KVM] KVM. Kernel based virtual machine. http://www.linux-kvm.org/
page/Main_Page.
106
[KVM10] KVM. QEMU-KVM code. http://sourceforge.net/projects/kvm/files, January
2010.
[LAW02] Chenyang Lu, Cuillermo A. Alvarez, and John Wilkes. Aqueduct: Online data
migration with performance guarantees. In Proc. of the USENIX Conference
on File and Storage Technologies (FAST), 2002.
[LZW+08] Yingwei Luo, Binbin Zhang, Xiaolin Wang, Zhenlin Wang, Yifeng Sun, and
Haogang Chen. Live and Incremental Whole-System Migration of Virtual
Machines Using Block-Bitmap. In IEEE International Conference on Cluster
Computing, 2008.
[MCGC11] Ali Mashtizadeh, Emre Celebi, Tal Garfinkel, and Min Cai. The design and
evolution of live storage migration in vmware esx. In Proceedings of the an-
nual conference on USENIX Annual Technical Conference. USENIX Associa-
tion, 2011.
[MDP+00] Dejan S Milojicic, Fred Douglis, Yves Paindaveine, Richard Wheeler, and
Songnian Zhou. Process migration. ACM Computing Surveys (CSUR),
32(3):241–299, 2000.
[Mic12] Microsoft. Hyper-V live migration FAQ. http://technet.microsoft.com/en-
us/library/ff715313(v=ws.10).aspx, January 2012.
[Mur11] Alan Murphy. Enabling Long Distance Live Migration with F5 and VMware
vMotion. http://tinyurl.com/7pyntvq, 2011.
[NC12] Bogdan Nicolae and Franck Cappello. Towards efficient live migration of I/O
intensive workloads: A transparent storage transfer proposal. In ACM HPDC,
2012.
[NKG10] Ripal Nathuji, Aman Kansal, and Alireza Ghaffarkhah. Q-clouds:managing
107
performance interference effect for qos-aware clouds. In Proceedings of Eu-
roSys, Paris,France, 2010.
[NLH05] Michael Nelson, Beng-Hong Lim, and Greg Hutchins. Fast transparent mi-
gration for virtual machines. In USENIX’05: Proceedings of the 2005 Usenix
Annual Technical Conference, Berkeley, CA, USA, 2005. USENIX Associa-
tion.
[Pad10] Pradeep Padala. Understanding Live Migration of Virtual Machines.
http://tinyurl.com/24bdaza, June 2010.
[PCP+02] Constantine P.Sapuntzakis, Ramesh Chandra, Ben Pfaff, Jim Chow, Monica
S.Lam, and Mendel Rosenblum. Optimizing the migration of virtual comput-
ers. In OSDI’02: Proceedings of the 5th Symposium on Operating Systems
Design and Implementation, 2002.
[PHS+09] Pradeep Padala, Kai-Yuan Hou, Kang G. Shin, Xiaoyun Zhu, Mustafa Uysal,
Zhikui Wang, Sharad Singhal, and Arif Merchant. Automated control of mul-
tiple virtualized resources. In Proceedings of EuroSys, 2009.
[PM83] Michael L Powell and Barton P Miller. Process migration in DEMOS/MP,
volume 17. ACM, 1983.
[Poe09] Chris Poelker. Why virtualization is the foundation of cloud computing.
http://tinyurl.com/cdtcyqz, 2009.
[PRH+03] S. Parekh, K. Rose, J. L. Hellerstein, S. Lightstone, M. Huras, and V. Chang.
Managing the performance impact of administrative utilities. In 14th
IFIP/IEEE International Workshop on Distributed Systems: Oper ations and
Management, 2003.
[Red09] IBM Redbooks. IBM Powervm Live Partition Mobility IBM International
Technical Support Organization. Vervante, 2009.
108
[RUB] RUBiS. Rice university bidding system. http://rubis.ow2.org.
[SGM90] Carl Staelin and Hector Garcia-Molina. Clustering active disk data to improve
disk performance. Technical Report CS-TR-283-90, Department of Computer
Science, Princeton University, Sep 1990.
[SPE] SPEC. Specweb2005. http://www.spec.org/web2005/.
[Ste10] Colin Steele. Virtual machine migration FAQ: Live migration, P2V and more.
http://tinyurl.com/cxavodk, August 2010.
[svm] VMWare Storage vMotion. http://www.vmware.com/products/storage-
vmotion/overview.html.
[Tec11] Dell TechCenter. Hyper-V R2 Live Migration FAQ.
http://tinyurl.com/c8rayf5, November 2011.
[TLC85] Marvin M Theimer, Keith A Lantz, and David R Cheriton. Preemptable re-
mote execution facilities for the V-system, volume 19. ACM, 1985.
[Tof11] Kevin C. Tofel. Forget public; private clouds: The future is hybrids!
http://tinyurl.com/bsmsj9p, June 2011.
[VBVB09] William Voorsluys, James Broberg, Srikumar Venugopal, and Rajkumar
Buyya. Cost of virtual machine live migration in clouds: A performance eval-
uation. In Cloud Computing, pages 254–265. Springer, 2009.
[vira] Five New Virtualization Challenges Impacting IT Pros and Data Cen-
ter Management. http://www.marketwatch.com/story/five-new-virtualization-
challenges-impacting-it-pros-and-data-center-management-2013-08-22.
[Virb] Virtustream. http://www.virtustream.com/.
109
[VKKS11] Akshat Verma, Gautam Kumar, Ricardo Koller, and Aritra Sen. Cosmig: mod-
eling the impact of reconfiguration in a cloud. In IEEE 19th annual inter-
national symposium on modeling, analysis and simulation of computer and
telecommunication systems. IEEE, 2011.
[Vmwa] Virtualize Your IT Infrastructure. http://www.vmware.com/virtualization/.
[VMwb] VMware ESX. http://www.vmware.com/products/vsphere/esxi-and-
esx/index.html.
[VMw09] VMware Forum. http://tinyurl.com/7gttah2, 2009.
[VMW10] VMWare. VMmark Virtualization Benchmarks.
http://www.vmware.com/products/vmmark/, January 2010.
[VMw11a] VMware Forum. http://tinyurl.com/ccwd6jg, 2011.
[VMw11b] VMware Forum. http://tinyurl.com/cr6tqnj, 2011.
[VMw11c] VMware Forum. http://tinyurl.com/bmlnjqk, 2011.
[VMw11d] VMware Forum. http://tinyurl.com/d4qr2br, 2011.
[VMw12] VMware Forum. http://tinyurl.com/7azb3xt, 2012.
[Wika] Cloud Computing. http://en.wikipedia.org/wiki/Cloud computing.
[Wikb] Data Center. https://en.wikipedia.org/wiki/Data center.
[Wikc] Stateless protocol. http://en.wikipedia.org/wiki/Stateless protocol.
[wikd] wiki. Multi-tier architecture. http://en.wikipedia.org/wiki/
Multitier_architecture.
[Woo11] Timothy Wood. Improving data center resource Management, deployment,
and availability with virtualization. PhD thesis, University of Massachusetts
Amherst, 2011.
110
[WSG+09] Timothy Wood, Prashant Shenoy, Alexandre Gerber, K.K. Ramakrishnan,
and Jacobus Van der Merwe. The Case for Enterprise-Ready Virtual Private
Clouds. In Proc. of HotCloud Workshop, 2009.
[WSKdM11] Timothy Wood, Prashant Shenoy, K.K.Ramakrishnan, and Jacobus Van der
Merwe. Cloudnet: Dynamic pooling of cloud resources by live wan migration
of virtual machines. In ACM VEE, 2011.
[WSVY07] Timothy Wood, Prashant Shenoy, Arun Venkataramani, and Mazin Yousif.
Black-box and gray-box strategies for virtual machine migration. In NSDI,
2007.
[WZ11] Yangyang Wu and Ming Zhao. Performance modeling of virtual machine live
migration. In Proceedings of the 2011 IEEE 4th International Conference on
Cloud Computing. IEEE, 2011.
[Xen08a] Xen Forum. http://tinyurl.com/d5v8j9p, 2008.
[Xen08b] Xen Forum. http://tinyurl.com/c37he9g, 2008.
[XEN09] XEN. XEN Project. http://www.xen.org, January 2009.
[Xen11a] Xen Forum. http://tinyurl.com/d477jza, 2011.
[Xen11b] Xen Forum. http://tinyurl.com/c7tyg94, 2011.
[ZF07] Ming Zhao and Renato J Figueiredo. Experimental study of virtual machine
migration in support of reservation of cluster resources. In Proceedings of the
2nd international workshop on Virtualization technology in distributed com-
puting, page 5. ACM, 2007.
[ZHMM10] Xiang Zhang, Zhigang Huo, Jie Ma, and Dan Meng. Exploiting data dedu-
plication to accelerate live virtual machine migration. In IEEE International
Conference on Cluster Computing, 2010.
111
[ZNS11] Jie Zheng, T.S.Eugene Ng, and Kunwadee Sripanidkulchai. Workload-aware
live storage migration for clouds. In ACM VEE, April 2011.
[ZSS06] Jianyong Zhang, Prasenjit Sarkar, and Anand Sivasubramaniam. Achieving
completion time guarantees in an opportunistic data migration scheme. In
Proc. SIGMETRICSREVIEW, 2006.