Virtual Machine Live Migration in Cloud Computing

Virtual Machine Live Migration in Cloud Computing

Jie Zheng

Abstract

Cloud computing services have experienced rapid growth. Virtualization, a key technology

for cloud computing, provides an abstraction to hide the complexity of underlying hard-

ware or software. The management of a pool of virtualized resources requires the ability to

flexibly map and move applications and their data across and within pools. Live migration,

which enables such management capabilities, ushers in unprecedented flexibility for busi-

nesses. To unleash the benefits, commercial products already enable the live migration of

full virtual machines between distant cloud datacenters.

Unfortunately, two problems exist. First, no live migration progress management sys-

tem exists, leading to (1) guesswork over how long a migration might take and the inability

to schedule dependent tasks accordingly; (2) inability to balance application performance

and migration time – e.g. to finish migration later for less performance interference.

Second, multi-tier application architectures are widely employed in today’s virtualized

cloud computing environments. Although existing solutions are able to migrate a single

VM efficiently, little attention has been devoted to migrating related VMs in multi-tier ap-

plications. Application components could become split over distant cloud datacenters for

an arbitrary period during migration and that causes unacceptable application degradations.

Ignoring the relatedness of VMs during migration can lead to serious application perfor-

mance degradation.

In this thesis, the first contribution is Pacer – the first migration process management

system capable of accurately predicting the migration time and managing the progress so

that the actual migration finishing time is as close to a desired finish time as possible.

Pacer’s techniques are based on robust and lightweight run-time measurements of system

and workload characteristics, efficient and accurate analytic models for progress predic-

tions, and online adaptation to maintain user-defined migration objectives for timely mi-

grations.

The second contribution is COMMA – the first coordinated live migration system of

multi-tier applications. We formulate the multi-tier application migration problem, and

present a new communication-cost-driven coordinated approach, as well as a fully imple-

mented system on KVM that realizes this approach. COMMA is based on a two-stage

scheduling algorithm for coordinating the migration of VMs that aims to minimize migra-

tion’s impact on inter-component communications. COMMA focuses on the coordination

of multiple VM’s migration where each VM’s migration progress is handled by Pacer.

COMMA’s scheduling algorithm has low computational complexity; as a result, COMMA

is highly efficient.

Through extensive experiments including a real-world commercial cloud scenario with

Amazon EC2, we show that Pacer is highly effective in predicting migration time and con-

trolling migration progress and COMMA is highly effective in decreasing the performance

degradation cost and minimizing migration’s impact on inter-component communications.

We believe this thesis will have far reaching impact. COMMA and Pacer are applicable

to all sorts of intra-data center and inter-data center VMmigration scenarios. Together, they

solve some of the most vexing VMmigration management problems faced by operators to-

day. The techniques underlying Pacer and COMMA are not specific to the KVM platform.

The techniques can easily be ported to other virtualization platforms such as VMware, Xen

and Hyper-V.

iv

Acknowledgments

My foremost thank goes to my advisor Professor T. S. Eugene Ng. I thank him for all of

his help, inspiration and guidance in my graduate study. He is the best advisor I can ever

imagine. I thank him for his patience and encouragement that always carried me through

difficult times, and for his insights and suggestions that helped to shape my research skills.

His passion for science has influenced me a lot. His valuable feedback contributed greatly

to this thesis.

I wish to express my sincere gratitude to Dr. Kunwadee (Kay) Sripanidkulchai. I had

the fortune to work with Kay during my summer internship at IBM. She helped me on every

aspect of the research related to this thesis. She taught me a vast amount of knowledge

in the areas of cloud computing and machine virtualization and introduced me to many

advanced techniques. I really appreciate her sound advice, good company, interesting ideas

and suggestions. Without the help from Eugene and Kay, this thesis would not have been

possible.

I am grateful to my team member Zhaolei (Fred) Liu for his help in setting up the test

bed on Amazon EC2. The demonstration for our system on the commercial public cloud

elevates our system to a new level. The discussion with Fred helped a lot in my research

and thesis writing and made our collaboration the most productive part.

I wish to thank Professor Alan L. Cox who helped me setup the experiment environment

and suggested me to use VMmark to explore the I/O patterns. Before this thesis, I had

the opportunity to work with Alan on another project. Alan is a very knowledgeable and

friendly professor. I am often impressed by his logical thoughts and wise solutions to

difficult research questions.

I want to thank Professor Edward W. Knightly and Christopher Jermaine for serving on

my Ph.D. thesis committee and asking many insightful questions that helped to shape this

thesis.

I also want to thank many friends in our research group. They are Bo Zhang, Guohui

Wang, Zheng Cai, Florin Dinu, Yiting Xia and Xiaoye Sun. I enjoyed all the vivid discus-

sions we had on various topics and had lots of fun being a member of this fantastic group.

They always gave me instant help when I asked.

Last but not the least, I would like to thank my parents and my best friends who have

supported me spiritually throughout my life.

Contents

Abstract ii

List of Illustrations ix

List of Tables xi

1 Introduction 11.1 Live Migration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.2 Lack of migration progress management . . . . . . . . . . . . . . . . . . . 7

1.3 Lack of coordination for multi-tier application migration . . . . . . . . . . 9

1.4 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

1.5 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2 Background 182.1 Process Migration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.2 Live Migration of Virtual Machine . . . . . . . . . . . . . . . . . . . . . . 19

2.2.1 VM Memory/CPU Migration . . . . . . . . . . . . . . . . . . . . 20

2.2.2 Network Connection Migration . . . . . . . . . . . . . . . . . . . 20

2.2.3 Storage Migration . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.2.4 Full VM Migration . . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.3 Optimization of Live Migration . . . . . . . . . . . . . . . . . . . . . . . . 24

2.3.1 Compression and Deduplication . . . . . . . . . . . . . . . . . . . 24

2.3.2 Reordering Migrated Block Sequence . . . . . . . . . . . . . . . . 25

3 Migration Time Prediction and Control 273.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

vii

3.1.1 Predicting Migration Time . . . . . . . . . . . . . . . . . . . . . . 27

3.1.2 Controlling Migration Time . . . . . . . . . . . . . . . . . . . . . 28

3.2 Predicting Migration Time . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3.2.1 Migration Time Model . . . . . . . . . . . . . . . . . . . . . . . . 29

3.2.2 Dirty Set and Dirty Rate Estimation . . . . . . . . . . . . . . . . . 33

3.2.3 Speed Measurement . . . . . . . . . . . . . . . . . . . . . . . . . 39

3.3 Controlling Migration Time . . . . . . . . . . . . . . . . . . . . . . . . . . 40

3.3.1 Solving for Speeds in Each Phase of Migration . . . . . . . . . . . 41

3.3.2 Maximal Feasible Speed Estimation and Speed Tuning . . . . . . . 46

3.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

3.4.1 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

3.4.2 Experiment Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

3.4.3 Prediction of migration time . . . . . . . . . . . . . . . . . . . . . 51

3.4.4 Best-effort migration time control . . . . . . . . . . . . . . . . . . 56

3.4.5 Overhead of Pacer . . . . . . . . . . . . . . . . . . . . . . . . . . 63

3.4.6 Potential robustness improvements . . . . . . . . . . . . . . . . . . 63

3.5 EC2 Demonstration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

3.5.1 Network and Disk Speed Measurements . . . . . . . . . . . . . . . 65

3.5.2 Use Case 1: Prediction of Migration Time . . . . . . . . . . . . . . 66

3.5.3 Use Case 2: Best-effort Migration Time Control . . . . . . . . . . 66

3.6 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

3.6.1 Live Migration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

3.6.2 I/O Interference in Virtualized Environment . . . . . . . . . . . . . 67

3.6.3 Data Migration Technologies . . . . . . . . . . . . . . . . . . . . . 68

3.6.4 Performance Modeling and Measurement . . . . . . . . . . . . . . 68

4 Coordinated Migration of Multi-tier Applications 704.1 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

viii

4.2 Quantitative Impact of Uncoordinated Multi-tier Application Migration . . 73

4.3 System Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

4.3.1 Subsystem: Pacer . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

4.3.2 Challenges and Solutions . . . . . . . . . . . . . . . . . . . . . . . 79

4.3.3 Scheduling Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 80

4.3.4 Inter-group Scheduling . . . . . . . . . . . . . . . . . . . . . . . . 82

4.3.5 Intra-group Scheduling . . . . . . . . . . . . . . . . . . . . . . . . 85

4.3.6 Adapting to changing dirty rate and bandwidth . . . . . . . . . . . 89

4.3.7 Putting it all together . . . . . . . . . . . . . . . . . . . . . . . . . 89

4.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

4.4.1 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

4.4.2 Experiment Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

4.4.3 Migration of a 3-tier Application . . . . . . . . . . . . . . . . . . . 91

4.4.4 Manual Adjustment does not Work . . . . . . . . . . . . . . . . . 91

4.4.5 Algorithms in Inter-group Scheduling . . . . . . . . . . . . . . . . 93

4.5 EC2 demonstration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

4.6 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

5 Conclusion and Future Work 985.1 Migration Progress Management with Optimization Techniques . . . . . . 99

5.2 Migration Progress Management with Migration Planning . . . . . . . . . 100

5.3 Migration Progress Management with Task Prioritization . . . . . . . . . . 101

Bibliography 102

Illustrations

1.1 Beneficial usage scenarios of HCC. . . . . . . . . . . . . . . . . . . . . . . 4

1.2 The progress of live migration . . . . . . . . . . . . . . . . . . . . . . . . 6

1.3 An example of multi-tier application migration . . . . . . . . . . . . . . . 10

3.1 Pacer design overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

3.2 An example of disk dirty set estimation. . . . . . . . . . . . . . . . . . . . 35

3.3 An example of sampling for memory dirty rate estimation . . . . . . . . . . 37

3.4 Trade-off of sampling interval . . . . . . . . . . . . . . . . . . . . . . . . 38

3.5 Each round of adaption for controlling migration time . . . . . . . . . . . . 41

3.6 An example of migration speeds in different phases. . . . . . . . . . . . . . 45

3.7 The prediction of a VM (file server-30clients) migration. Pacer achieves

accurate prediction from the very beginning of the migration. . . . . . . . . 53

3.8 Migration with different desired finish times. Pacer almost matches the

ideal case when the desired time is larger than 176s. The deviation is very

small in [-2s,2s]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

3.9 Migration with different degrees of workload intensity. Any point in the

feasible region can be realized by Pacer. The lower bound for migration

time is limited by I/O bottleneck. Default QEMU can only follow a

narrow curve in the region. . . . . . . . . . . . . . . . . . . . . . . . . . . 59

3.10 VM migration from Rice campus to Amazon EC2. . . . . . . . . . . . . . 64

4.1 Sequential and parallel migration of a 3-tier web application across clouds. 71

x

4.2 An example about cost computation for 3 VMs . . . . . . . . . . . . . . . 73

4.3 Examples of multi-tier web services. . . . . . . . . . . . . . . . . . . . . . 77

4.4 An example of coordinating the migration with COMMA . . . . . . . . . . 77

4.5 An example of valid group. . . . . . . . . . . . . . . . . . . . . . . . . . . 81

4.6 An example for heuristic algorithm . . . . . . . . . . . . . . . . . . . . . . 84

4.7 Intra-group scheduling. (a) Start VM migrations at the same time, but

finish at different times. Result in long performance degradation time. (b)

Start VM migrations at the same time and finish at the same time. Result

in long migration time due to the inefficient use of migration bandwidth.

(c) Start VM migrations at different times and finish at the same time. No

performance degradation and short migration time due to efficient use of

migration bandwidth. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

4.8 An example of delaying the start of dirty iteration for the migration. . . . . 88

4.9 An example of scheduling algorithm (Put all together) . . . . . . . . . . . . 90

4.10 Computation time for brute-force algorithm and heuristic algorithm . . . . 95

Tables

1.1 Application moving approaches for stateless and stateful servers. . . . . . . 5

3.1 Migrated data and speed in four phases of migration . . . . . . . . . . . . . 30

3.2 Variable definitions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.3 VMmark workload summary. . . . . . . . . . . . . . . . . . . . . . . . . . 50

3.4 Prediction errors for the VM size-based predictor and the progress meter

are several orders of magnitude higher than Pacer. . . . . . . . . . . . . . . 52

3.5 Prediction with Pacer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

3.6 Migration time deviation for Pacer is much smaller than the controller

without dirty block prediction. . . . . . . . . . . . . . . . . . . . . . . . . 57

3.7 Deviation of migration time on Pacer with different workload intensities.

The number in the bracket represents the worst earliest and latest

deviation in Pacer. For example, [−1, 1] means at most early by 1s and

late by 1s. ”-” means the time is beyond the feasible region. . . . . . . . . . 60

3.8 Migration time on different types of workload. Pacer can achieve the

desired migration time. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

3.9 Migration time for Pacer when the additional competing traffic varies.

Pacer can achieve the desired migration time with a small finish time

deviation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

3.10 Importance of dynamic dirty set and dirty rate prediction. Without any of

these algorithms, it is hard to achieve desired migration time. . . . . . . . . 62

3.11 Importance of speed scaling up algorithm. . . . . . . . . . . . . . . . . . . 62

xii

3.12 Prediction accuracy with Pacer. . . . . . . . . . . . . . . . . . . . . . . . . 65

3.13 Migration time control in EC2. . . . . . . . . . . . . . . . . . . . . . . . . 66

4.1 Example VM and workload parameters. Dirty set is defined as the data

bytes written on the VM’s virtual disk at the end of disk image copy. Dirty

rate is defined as the speed at which VM’s virtual disk and memory is

written. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

4.2 Degradation time with sequential and parallel migration. INF means

migration could not converge and thus the migration time is infinite. . . . . 76

4.3 Comparisons of three approaches on 3-tier applications. {...} represents

the VM set on one physical machine. . . . . . . . . . . . . . . . . . . . . . 92

4.4 Manual adjustment on the configured speed is hard in achieving low cost

and small migration time. . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

4.5 Performance degradation cost (MB) with different migration approaches . . 94

4.6 Migration methods on EC2 experiment. . . . . . . . . . . . . . . . . . . . 97

1

Chapter 1

Introduction

Cloud computing, often referred to as simply ”the cloud”, is the delivery of on-demand

computing resources over the Internet and on a pay-for-use basis [Def]. No matter in in-

dustry or academia, cloud computing has attracted significant attention. A research report

sponsored by enterprise focused cloud computing firm, Virtustream [Virb] shows that the

cloud hits the mainstream and more than half of U.S. Businesses now use cloud comput-

ing [Cloa]. Cloud computing services have experienced rapid growth. The public cloud

services market is expected to grow to $206.6 billion by 2016 [Gar12]. Internet and busi-

ness applications are increasingly being moved to the cloud to maximize the effectiveness

of shared resources and economies of scale. Some clouds are operated by service providers,

such as Amazon [Amaa] and IBM [IBM] that offer storage and virtual servers to customers

at a low price on demand. Some clouds are built to deliver development environments as a

service, such as Google App Engine [Goo].

Cloud service usually runs in data centers. Current data centers can contain tens or hun-

dreds of thousands of servers. The main purpose of a data center is running the applications

that handle the core business and operational data of the organization. Often these appli-

cations will be composed of multiple hosts, each running a single component. Common

components of such applications are databases, file servers, application servers, middle-

ware, and various others [Wikb]. For example, components of a multi-tier e-commerce

application [wikd] may include web servers, database servers and application servers. Web

server works as a presentation tier which displays information related to services. It com-

municates with other tiers by outputting results to the clients. Application server works

as a logical tier which pulls data out from the presentation tier and controls an applica-

2

tion’s functionality by performing detailed processing. Database server works as a data

tier which stores data and keeps data neutral and independent from application servers or

business logic [wikd].

The main enabling technology for cloud computing is virtualization which abstracts

the physical infrastructure and makes it available as a soft component that is easy to isolate

and partition physical resources [Wika]. It hides the complexity of underlying hardware or

software [Poe09]. The management of a pool of virtualized resources requires the ability

to flexibly map and move applications and their data across and within pools [WSKdM11].

Usually there are multiple virtual machines (VM) running on a single physical machine.

Therefore, it provides an effective way to consolidate hardware to get vastly higher produc-

tivity from fewer servers. Cloud computing uses virtualization for efficiency, higher avail-

ability and lower costs. Virtualization also speeds and simplifies IT management, mainte-

nance and the deployment of new applications [Vmwa]. In cloud computing, a hypervisor

or virtual machine monitor (VMM) is a piece of software that creates, runs and manages

VMs. KVM [KVM], XEN [XEN09], VMware ESX [VMwb] and Hyper-V [Mic12] are

four popular hypervisors.

As data centers continue to deploy virtualized services, there are many scenarios that

require moving VMs from one physical machine to another in the same data center or even

across different data centers. We list some examples as follows.

• Planned maintenance: To maintain high performance and availability, virtual ma-

chines (VMs) needs to be migrated from one cloud to another to leverage better

resource availability, avoid down-time caused by hardware maintenance, and save

more power in the source cloud. If a physical machine requires software or hardware

maintenance, the administrator could migrate all the VMs running on that machine

to other physical machines to release the original machine [CFH+05].

• Load balancing: VMs may be rearranged across different physical machines in a

cluster to relieve load on congested hosts [CFH+05]. A workload increase of a virtual

server can be handled by increasing the resources allocated to the virtual server under

3

the condition that some idle resources are available on the physical server, or by

simply moving the virtual server to a less loaded physical server [WSVY07].

• Avoiding single-provider lock-in: While many cloud users’ early successes have

been realized using a single cloud provider [Blo08, Got08], the ability to use multiple

clouds to deliver services and the flexibility to move freely among different providers

are emerging requirements [AFGea09]. Users who implement their applications us-

ing one cloud provider ought to have the capability and flexibility to migrate their

applications back in-house or to other cloud providers in order to have control over

the business continuity and avoid fate-sharing with specific providers [ZNS11].

• Hybrid cloud computing (HCC): – where virtualizable compute and storage re-

sources from private datacenters and public cloud providers are seamlessly integrated

into one platform in which applications can migrate freely – is emerging as the most

preferred cloud computing paradigm for commercial enterprises according to recent

industry reports and surveys [Ash12, Bri11, Tof11]. It provides more scenarios that

require migration of VMs. This is not surprising since HCC combines the bene-

fits of public and private cloud computing, resulting in unprecedented flexibility

for CAPEX and OPEX savings, application adaptivity, disaster survivability, zero-

downtime maintenance, and privacy control. An impressive array of commercial

products that facilitates HCC is already available today. Figure 1.1 illustrates some

beneficial usage scenarios of migration in HCC. The ability to migrate applications

freely among private and public clouds, whether it is private-to-public, public-to-

private, or public-to-public, is a key to maximizing the benefits of HCC. Tried and

true virtual machine (VM) migration technologies are therefore central to the HCC

paradigm. A migration in HCC is inter-datacenters. Consequently it typically re-

quires a full migration of the VM which includes the storage of virtual machines.

• Enterprise IT Consolidation: Many enterprises working with multiple data centers

have attempted to deal with data center “sprawl” and cut costs by consolidating mul-

4

1 Provider price hike. 2 Provider discontinues service. 3 Privacy law change. 4 Provider privacy policy change.

Figure 1.1 : Beneficial usage scenarios of HCC.

tiple smaller sites into a few large data centers. The ability of moving of the service

with minimal or no down-time is attractive due to the corresponding reduction in the

disruption seen by a business [WSKdM11].

In summary, from cloud providers’ perspectives, VMs could be moved because of

maintenance, resource management, disaster planning and economic concerns. From cloud

users’ perspectives, they may want to move their VMs to another clouds that provide lower-

cost, better reliability or better performance.

There are different approaches to move VMs. It depends on whether the VM runs a

stateless server or a stateful server as Table 1.1 shows. Some applications contain stateless

servers that do not retain any session information [Wikc]. It treats each request from the

client as an independent transaction that is in no way related to any previous request. A

typical example of a stateless server is a web server with static contents. It takes in requests

as a URL that is fully conversant for a particular web page display and is independent of

the previous requests to the server. A stateless server can be easily moved. For example

the provider could provision a new server or perform live migration for the existing server.

However, this only applies to a small class of applications. The majority of enterprise

applications run on stateful servers. A stateful server remembers client states from one

5

Type Stateless Server Stateful Server

Feature No state Keep states

Typical examples Web server (static) FTP server

Database server

Mail server

Web server (dynamic)

Approach Replication Stop and copy

For Provisioning Live migration

Moving Live migration

Table 1.1 : Application moving approaches for stateless and stateful servers.

request to the next. For example, ftp server, database server, mail server and web server

with dynamic contents are all stateful servers. To move a stateful server, the old approach

is to stop the VM, copy the VM states and restart the VM in the destination. It brings a long

downtime to the application. A more attractive mechanism for moving applications is live

migration, because it is completely application independent (no matter stateless or stateful)

and it allows VMs to be transparently moved between physical hosts without interrupting

any running applications.

1.1 Live Migration

Live migration refers to the process of moving a running virtual machine or application

between different physical machines without disconnecting the client or application. Live

migration is controlled by the hypervisors running on the source and destination. Full

migration of a virtual machine includes the following aspects:

1. the running state of the virtual machine (i.e., CPU state, memory migration)

2. the storage or virtual disks used by the virtual machine

6

Figure 1.2 : The progress of live migration

3. existing client connections

Figure 1.2 shows the progress of full virtual machine migration. First, the source hy-

pervisor (Hypervisor 1) copies all the disk state of the virtual machine from source to

destination while the virtual machine is still running on the source hypervisor. If some disk

blocks change during this process, they will be re-copied again. When the number of disk

blocks is smaller than a threshold, the source hypervisor starts copying the memory state.

Once the disk and memory state have been transferred, the source hypervisor briefly pause

the virtual machine for the final transition of disk, memory, processor and network states to

the destination hypervisor (Hypervisor 2). Then the virtual machine will resume running

on the destination hypervisor.

In 2011, F5 network [Mur11] and VMware released the first product to enable the

live CPU/memory and storage migration of virtual machines across distant data centers.

That is, however, not to say live migration is a solved problem. Cloud providers use live

migration in many cloud management tasks for saving the operating cost and improving

the application performance. However, they found the existing live migration solutions

7

do not run as expected in two aspects as follows: (1) migration progress management (2)

multi-tier application migration.

1.2 Lack of migration progress management

The use of live migration in cloud computing has exposed a fundamental weakness in

existing solutions, namely the lack of migration progress management – ability to predict

migration time and control migration time. Without the capability to predict and control

migration time, management tasks may not achieve the expected performance. Let us see

the following examples.

• Case 1: A system operator plans to take down one physical machine for maintenance.

He evicts all the running VMs to another physical machine using live migration. He

tells the maintenance groups that maintenance could start at 8PM by guessing that

the migration will complete by that time. Unfortunately, some VMs’ migrations take

much longer time than expected and all dependent tasks are delayed. The entire

maintenance work-flow might be irrecoverably disrupted and the company needs to

provide excess overtime pay to system operators.

• Case 2: There are many failure detection and prediction systems proposed and ap-

plied to detect the abnormal performance of physical servers. Once the potential

failures are reported, system operators will migrate VMs as soon as possible to other

machines. However, the operator observes that migration cannot go any faster than a

static upper bound which is a configured speed in the live migration system.

• Case 3: Migration is also used in load balancing. A new IT strategy called follow-

ing the sun provisioning for project teams that span multiple continents also leverages

live migration. The scenario assumes multiple groups spanning different geographies

that are collaborating on a common project and that each group requires low-latency

access to the project applications and data during normal business hours [Woo11].

They need to decide which server to migrate and whether the migration will finish by

8

the expected time before the normal business hours start. Load balancing decisions

are time-sensitive. If a migration takes too long to complete, the resource usage dy-

namics may have already changed, rendering the original migration decision useless.

These scenarios expose the weakness in today’s live migration solutions. They can be

summarized as the lack of live migration management system. Several related questions

are frequently asked on numerous online forum discussions.

• How long does migration take? – is a popular question in live VM migra-

tion FAQs [Pad10, Tec11, Ste10]. Unfortunately, there is no simple formula for

calculating the answer because it depends on many dynamic run-time variables

that are not known a priori, including application I/O workload intensity, net-

work speed, and disk speed. As numerous online forum discussions indicate (e.g.

[VMw11a, VMw11b, VMw11c, VMw09, Xen08a, Xen11a, Xen11b, Xen08b]),

users routinely try to guess why migration is slow and whether it could be sped up,

and how long they might have to wait.

• How to avoid application components getting split between distant datacenters

during migration? – This question is of paramount importance to enterprises be-

cause their applications often consist of multiple interacting components perform-

ing different functions (e.g. content generation, custom logic, data management,

etc. [HSS+10]). Without the ability to manage migration progress, individual ap-

plication components could complete migration at very different times and become

split over distant cloud datacenters for arbitrary periods. The resulting large inter-

component communication delay guarantees detrimental performance impact. Per-

haps a stopgap method is to configure the migration speeds for different components

proportional to their virtual memory/storage sizes. Unfortunately, we already know

that migration time depends on a large number of dynamic run-time variables, so this

stopgap method is bound to fail (see Section 3.4).

9

• How to control the trade-off between application performance and migration

time? – is another popular question raised by users [VMw12, VMw11d]. Stud-

ies [WZ11, BKR11, ASR+10, VKKS11] have shown that live migration can degrade

application performance, and slowing migration down helps [BKR11]. Although

the administrator might be willing to slow down migration to some extent to im-

prove application performance, a migration task must still be completed by a certain

deadline or else other dependent tasks cannot proceed. Unfortunately, no solution

exists for managing the progress of a migration to finish at a desired time (neither

sooner nor later). Perhaps a stopgap method is to configure the data migration speed

to data sizedesired time . But again, this method is bound to fail due to the dynamic run-time

variables (see Section 3.4).

1.3 Lack of coordination for multi-tier application migration

Today’s datacenter usually runs applications in the multi-tier architecture in which pre-

sentation, application processing, and data management functions are logically separated.

Multi-tier architectures are widely employed in virtualized cloud computing environments

because they provide a model by which developers can create flexible and reusable ap-

plications. The cloud providers provides VMs and a wide range of features such as load

balancers, content-distribution networks, DNS hosting, etc, resulting in a complex ecosys-

tem of interdependent systems operating at multiple layers of abstractions [HFW+13]. By

segregating an application into tiers, developers acquire the option of modifying or adding

a specific layer, instead of reworking the entire application [wikd].

The goal in optimizing the migration of multi-tier applications is very different from the

goal in optimizing the migration of a single VM. Previous VM migration research focuses

on optimizing the migration of a single VM. Their goal is minimizing total migration time

and minimizing down time; however, they are insufficient for multi-tier migration because

the problem is unique in the migration of multi-tier applications.

Given the fact that the VMs running a multi-tier application are highly interactive, a se-

10

Figure 1.3 : An example of multi-tier application migration

rious issue is that during migration, the application’s performance can degrade significantly

if the dependent components of an application are split between the source and destination

sites by a high latency and/or congested network path. The goal in the migration of multi-

tier applications is to minimize the cost introduced by splitting the two components into

two distant sites. We will formulate the problem later in more details. We give an example

to illustrate the problem.

Figure 1.3 shows an example of migrating a 3-tier e-commerce application from one

cloud to another. The application has 4 VMs (shown as ovals) implementing a web server,

two application servers, and a database server. An edge between two components in the

figure indicates that those two components communicate with one another. Before mi-

11

gration, the latency of web requests is small because all the four VMs run in the same

data-center. During the migration, the four VMs are migrated one by one in the sequence

of web server, application server 1, application server 2 and database server. During the

migration of web server, it still runs in the original datacenter and the latency remain small.

However, when the web server finishes migration and starts running at the destination dat-

acenter, the communication between web server and application servers go across the wide

area network and that results in a long latency. There is a time period during which com-

municating components are split over the source and destination sites. When such a split

happens, certain inter-component communications must be conducted over the bandwidth

limited and/or high latency network, leading to degraded application performance. When

application servers finish migration, the communication between web server and applica-

tion servers are inside the same datacenter. However, some web requests to this service

still experience a long latency because the database server is still in the original datacenter

and all the database requests from application servers to the database server will go across

the wide area network. At the end of the database server migration, the whole set of VMs

run in the destination datacenter and the request latency is as small as the latency before

migration.

Although existing live migration techniques [KVM, CFH+05, NLH05] are able to mi-

grate a single VM efficiently, those techniques are not optimized for migrating related VMs

in a multi-tier application. They either migrate the VMs in a sequential order or migrate

the VMs at the same time and ignore whether they could finish at the same time. Simply

migrating all related VMs in parallel is not enough to avoid such degradation. Specifically,

two existing migration strategies, sequential and parallel migration, may result in poor per-

formance. Sequential migration, which migrates each VM one by one, results in a long

performance degradation time from when the first VM finishes migration until the last VM

finishes migration. Parallel migration, which starts migration of multiple VMs at the same

time, is not able to avoid the degradation either. This is because the amount of data to

migrate for each VM is different and therefore the VMs in general will not finish migration

12

simultaneously. The application will experience performance degradation until all VMs

have completed migration. Furthermore, if the bandwidth required for migrating all VMs

in parallel exceeds the actual available bandwidth, additional performance problems will

result.

1.4 Contributions

The contribution of this thesis is in two parts. The first part is Pacer – the first migration

progress management system (MPMS). Pacer effectively addresses all the aforementioned

issues in Section 1.2. While much details of Pacer’s constituent techniques will be dis-

cussed in Chapter 3, they share the following key strengths:

• Robust and lightweight run-time measurements – Pacer is highly effective thanks

to its use of continuously measured application I/O workload intensity (both mem-

ory and disk accesses) and measured bottleneck migration speed (no matter it is in

the network or in the disk) at run-time. Pacer enhances the robustness of the mea-

surements by employing a novel randomized sampling technique (see Section 3.2.3).

Furthermore, these techniques are implemented to minimize run-time overhead as

shown in Section 3.4.

• Novel efficient & accurate analytic models for predictions – Such analytic models

are used to (1) predict the amount of remaining data to be migrated as a function of

the application’s I/Oworkload characteristics and the migration progress, and (2) pre-

dict the finish time of a migration (i.e. addressing the first MPMS issue) as a function

of the characteristics of each migration stage (i.e. disk, dirty blocks, CPU/memory,

etc.).

• Online adaptation – Addressing the second and third MPMS issues require certain

migration objectives to be met: the former requires multiple application components

to complete migrations simultaneously; the latter requires a migration to complete at

a specific time. No fixed migration settings can successfully meet such objectives due

13

to run-time dynamics. Pacer continuously adapts to ensure the objectives are met. In

the former case, Pacer adapts the targeted migration finish time for all components

given what is measured to be feasible. In the latter case, Pacer adapts the migration

speed to maintain a targeted migration finish time in face of application dynamics.

The second contribution is COMMA (Coordinated migration of Multi-tier Applica-

tions) – the first coordination system for the live migration of multi-tier applications. Note

that this inter-cloud migration example is not the only usage scenario for COMMA. In gen-

eral, any migration scenario that requires crossing a network with limited bandwidth and/or

high latency could benefit from using COMMA. The limited bandwidth scenario can arise

within a campus or even within a machine room.

We will discuss the general challenges for live VM migration first, and then describe

the unique challenges for multi-tier application migration.

• Convergence. VM migration runs in a shared resource environment (e.g. disk I/O

bandwidth and network bandwidth). In this thesis, we define the term “available

migration bandwidth” as the maximal migration speed that migration could achieve.

It could be bottleneck either in network bandwidth or disk I/O bandwidth. If the

available resource is not allocated properly, the migration could fail because the ap-

plication may generate new data that needs to be migrated at a pace that is faster than

the migration available bandwidth. For example, if the available migration bandwidth

is 10MBps but the VM migration generates the new data at the speed of 30MBps,

migration will not converge in the dirty iteration stage and migration will fail. For

a single VM migration, the mechanism to handle non-convergence is either to set

a timeout to stop migration and report failure or to throttle the write operation and

slow down the new data generation rate. All of those mechanisms will hurt applica-

tion performance. For multiple VMmigrations, the problem is more complicated but

also more interesting.

14

• Dynamicity and interference The migration time for different VMs is different be-

cause migration time depends on many factors and some of them are dynamic. For

example, VM image size and memory size are static information, but actual work-

load and available resources (e.g. disk I/O bandwidth and network bandwidth) are

dynamic information. Assume that migration has pre-allocated network bandwidth

which will not be shared with other applications, then we can leverage Pacer to pre-

dict the migration time and control the migration time for a single VM migration.

Unique challenges for multi-tier application migration.

• Higher order control. Fundamentally, each individual VM migration process can

only be predicted and controlled to a certain extent (as shown by Pacer). Therefore,

if we continue to rely on an architecture where all VM migration processes act inde-

pendently, there is no way of achieving the multi-tier migration goal. It is necessary

to design a new architecture where a higher order control mechanism governs all VM

migration activities.

• Inter-VM-migration resource contention and allocation For multiple VM migra-

tions, the convergence issue is more complicated but also more interesting. If the net-

work bandwidth is smaller than any single VM’s new data generation rate, the prob-

lem is degraded to sequential migration. If the network bandwidth is large enough to

migrate all VMs together, the problem is easily handled by parallel migration. When

the network bandwidth sits in between the previous two cases, we need a mechanism

to check whether it is possible to migrate multiple VMs at the same time, how to

combine multiple VMs in groups that can be migrated together and how to schedule

the migration start and finish time of each group to achieve the goal of minimizing

communication cost.

• Inter-VM-migration dynamicity and interferenceWhen network bandwidth is re-

served for migration, Pacer can help to predict and control single VM migration.

15

However, the problem for multi-tier VM migration is more complicated due to inter-

ference between multiple VM migrations. When multiple VM migrations occur in

the same period, they will share the available resources. Pacer has no idea about when

other VMs’ migration will start and finish and how to solve the problem when there

is not enough available resource to migrate all VM’s at the same time. Therefore,

simply leveraging Pacer to control the migration time for multiple VM migration

without coordination is not a right solution.

• System design and efficiency The computation complexity for an optimal solution

to coordinate a multi-tier application could be very high. It is important that coordi-

nation system is efficient and has low overhead. How to formulate the problem and

make a tradeoff between efficiency and optimality are worth investigating for a good

system.

To tackle the above challenges, this thesis proposes an original migration coordina-

tion system for multi-tier applications. The system is based on a scheduling algorithm

for coordinating the migration of VMs that aims to minimize migration’s impact on inter-

component communications.

The systemworks in two stages. In the first stage, it coordinates the speeds of the migra-

tion of static data of different VMs, such that all VMs complete their static data migration

at nearly the same time. In the second stage, it coordinates the migration of dynamically

generated data by organizing the VMs into feasible migration groups and deciding the best

schedule for migrating these groups. Furthermore, the system schedules the migration of

VMs inside a group to fully utilize the available migration bandwidth. We have imple-

mented the proposed system and have conducted a number of preliminary experiments to

demonstrate its potential in optimizing the migration of a 3-tier application. The results are

very encouraging. Compared to a simple parallel migration strategy, our system is able to

reduce the number of data bytes affected by migration up to 475 times.

While much details of COMMA’s constituent techniques will be discussed in Chap-

ter 4.3, they share the following key strengths:

16

• Problem formulation and novel approach – We formulate the multi-tier applica-

tion migration problem, and presents a new communication-cost-driven coordinated

approach, as well as a fully implemented system on KVM that realizes this approach.

The multi-tier application migration problem is to minimize the performance degra-

dation caused by splitting the communicating components between source and des-

tination sites during the migration. To quantify the performance degradation, we

define the unit of cost as the volume of traffic between VMs that need to crisscross

between the source and destination sites during migration.

• Two stage scheduling The algorithm works in two stages. In the first stage, it co-

ordinates the migration speed of the static data of VMs so that all VMs complete

the precopy phase at nearly the same time. In the second stage, it coordinates the

migration of dynamically generated data by inter-group and intra-group scheduling.

COMMA iteratively computes and updates a desirable schedule for migrating VMs

based on both runtime VM workload characteristics as well as static configuration

information (i.e., memory, disk size) of each VM. The evaluation of COMMA shows

that it is able to greatly reduce migration’s impact on inter-component communica-

tions. We also demonstrate its potential in optimizing the migration of a 3-tier ap-

plication. Similar to the Pacer’s design, we also leverage online adaptation to ensure

the migration objectives are met according to the scheduling algorithm.

• Efficiency with heuristic algorithm – In order to minimize the performance degra-

dation cost, COMMA needs to compute the optimal group combination and migra-

tion sequence. We propose two algorithms: a brute- force algorithm and a heuristic

algorithm. The brute-force algorithm can find the optimal solution but its compu-

tation complexity is high. The heuristic algorithm can achieve a sub-optimal result

with low computation complexity.

17

1.5 Thesis Organization

The rest of this thesis is organized as follows. Chapter 2 introduces the background

about live migration. Chapter 3 presents the constituent techniques in Pacer for migra-

tion progress management. It includes the migration time model, key algorithms and sys-

tem designs. It also presents experimental results demonstrating the capability and benefit

of Pacer. Chapter 4 presents the techniques in COMMA for coordinating the migration

of multi-tier applications. It also shows the experiments conducted to demonstrate how

COMMA performs in migrating a 3-tier application. Chapter 5 summarizes the contribu-

tion of the thesis and discusses the future work.

18

Chapter 2

Background

2.1 Process Migration

During the 1980s, process migration attracted significant attentions in system re-

search [PM83, TLC85, JLHB88, DO91, BL98]. Process migration is the relocation of

a process from the processor on which it is executing to another processor.

Process migration has proved to be a difficult feature to implement in operating systems

until 1982. Powell et al implemented process migration in DEMOS/MP operating system.

In the system, a process can be moved during its execution, and continue on another pro-

cessor, with continuous access to all its resources. Messages are correctly delivered to the

process’s new location, and message paths are quickly updated to take advantage of the

process’s new location. The kernel can participate in message send and receive operations

in the same manner as a normal process [PM83].

Theimer et al leverage process migration with network transparency in the design of a

remote execution facility which allows a user of a workstation-based distributed system to

offload programs onto idle workstations, thereby providing the user with access to compu-

tational resources beyond that provided by his personal workstation [TLC85].

Jul et al proposed fine-grained mobility for small data objects (such as arrays, records,

and integers) as well as objects with processes. Thus, the unit of mobility can be

much smaller than in process migration systems which typically move entire address

spaces [JLHB88].

Douglis et al designs an automatic system for transparent process migration in Sprite

operating system. It could automatically identify idle machines, invoke eviction and use

process migration mechanism to offload work onto idle machines, and also to evict mi-

19

grated processes when idle workstations are reclaimed by their owners [DO91].

Barak et al used a preemptive process migration for load-balancing and memory ush-

ering, in order to create a convenient multi-user time-sharing execution environment for

HPC, particularly for applications that are written in PVM or MPI [BL98].

Process migration enables dynamic load distribution, fault resilience, eased system ad-

ministration, and data access locality. Despite these goals and ongoing research efforts,

migration has not achieved widespread use [MDP+00]. Milojicic et al [MDP+00] gives

a thorough survey of possible reasons for this, including the problem of the residual de-

pendencies that a migrated process retains on the machine from which it migrated. Clark

et at [CFH+05] points out that the residual dependency problem cannot easily be solved

in any process migration scheme - even modern mobile run-times such as Java and .NET

suffer from problems when network partition or machine crash causes class loaders to fail.

The migration of entire operating systems inherently involves fewer or zero such depen-

dencies, making it more resilient and robust. Virtualization led to techniques for virtual

machine live migration.

2.2 Live Migration of Virtual Machine

Live migration provides the capability to move VMs from one physical location to another

while the VMs are still running without any perceived degradation. It is called ”live”

migration because it incurs downtime of only tens of milliseconds. Many hypervisors

support live migration within the LAN [NLH05, CFH+05, Red09, WSVY07, JDWS09,

HG09]. It usually requires the two physical machines have a shared storage. However,

migrating across the wide area presents more challenges specifically because of the large

amount of data that needs to be migrated under limited network bandwidth. In order to

enable live migration over the wide area, full migration of a virtual machine which includes

VM’s storage state, CPU state, memory state and network connections.

The memory migration and network connection migration for the wide area have been

demonstrated to work well [BKFS07, WSG+09]. However, the storage migration inher-

20

ently faces significant performance challenges because of its much larger size compared

the size of the memory. We will introduce the live migration with shared storage in the

following section. Then we will discuss the live storage migration with different mod-

els. Finally we will summarize the optimization techniques for live migration in different

aspects.

2.2.1 VMMemory/CPU Migration

VM migration technologies focused only on capturing and transferring the run-time, in-

memory state of a VM in a LAN in the early stage. It assumes that all physical machines

involved in a migration are attached to the same SAN or NAS server.

Clark et al. [CFH+05] implemented a live migration system on Xen [XEN09] for the

local-area migration. It migrates the memory and CPU state of VM without support for

migrating local block devices. During the memory migration, a pre-copy approach is used

in which pages of memory are iteratively copied from the source machine to the destination

host. Page-level protection hardware is used to ensure a consistent snapshot is transferred.

The final phase pauses the virtual machine, copies any remaining pages to the destination,

and resumes execution there.

VMware also issued a live migration function called VMotion [NLH05] to their virtual

center management software. The approach is generally similar to the previous one. They

rely on storage area networks or NAS to migrate connections to SCSI devices.

2.2.2 Network Connection Migration

For the livemigration in the local network, a virtual Ethernet network interface card (VNIC)

is provided as part of the virtual platform. Each VNIC is associated with one or more phys-

ical NICs. Since each VNIC has its own MAC address that is independent of the physical

NIC’s MAC address, virtual machines can be moved while they are running between ma-

chines and still keep network connections alive as long as the new machine is attached to

the same sub-net as the original machine [NLH05].

21

Wood et al. [WSKdM11] uses existing VPN technologies to provide the live migration

infrastructure across wide area network. They present a new signaling protocol that allows

endpoint reconfiguration actions that currently take hours or days, to be performed in tens

of seconds.

When migration takes place between servers in different networks, the migrated VM

has to obtain a new IP address and thus existing network connections break. Brad-

ford [BKFS07] uses a temporary network redirection scheme to overcome this by com-

bining IP tunneling with dynamic DNS.

2.2.3 Storage Migration

KVM, VMware and Xen are three popular virtualization platforms. They use different

approaches to migrate storage of VM.

Snapshot: The approach based on snapshots was introduced in VMware ESX

3.5 [MCGC11]. The migration begins by taking a snapshot of the base disk, and all new

writes are sent to this snapshot. Concurrently, the approach copies the base disk to the des-

tination volume. After finishing copying the base disk, another snapshot is taken, and then

the approach consolidates the first snapshot into the base disk. This process is repeated

until the amount of data in the snapshot becomes lower than a threshold. Finally the VM

is suspended and the final snapshot is consolidated into the destination disk, and the VM

is resumed on the destination volume. This approach has two major limitations. Firstly,

migration using snapshots is not atomic. Secondly, there are performance and space costs

associated with running a VM with several levels of snapshots.

Dirty Block Tracking: This approach is based on an iterative copy with a Dirty Block

Tracking mechanism. It is widely used in KVM, VMware ESX 4.0 and refined in ESX

4.1[KVM, MCGC11]. It uses a bitmap to track modified blocks on the source disk and it-

eratively copy those blocks to the destination disk. The process is repeated until the number

of dirty blocks below a threshold or remaining at each cycle stabilizes. VMware live storage

migration and live migration with shared storage are two separate functions. At this point,

22

VMware live storage migration suspends VM and copy the remaining dirty blocks. KVM

live migration will start memory migration at the end of storage migration. Dirty block

tracking overcome the limitations of snapshot. It makes new optimizations possible and

guarantees atomic switch-over between the source and destination volumes. [MCGC11].

IO Mirroring: VMware ESX 5.0 uses synchronous IO mirroring in live storage mi-

gration. It works by mirroring all new writes from the source to the destination concurrent

with a bulk copy of the base disk [MCGC11].

Xen does not support storage migration. In order to support storage migration in Xen,

a solution is proposed in [WSKdM11] to integrate the DRBD storage migration system

into Xen. The solution employs a hybrid technique combining dirty block tracking and I/O

mirroring. Our system is implemented on KVM, but the proposed models and algorithms

are general to VMware ESX 4.0 and DRBD on Xen. For VMware ESX 5.0, I/O mirroring

would cause more traffic through the network. The traffic can be estimated by monitoring

the application workload. Adapting Pacer to I/O mirroring migration systems could be an

area of future work.

In this thesis, our migration time model is based on the dirty block tracking approach

which is the most popular approach across different virtualization platforms. However, it

is easy to adapt our time model and our system to the IO Mirroring approach. We discuss

this in Section 3.

2.2.4 Full VMMigration

For wide-area migration, where common network-attached storage, accessible by the

source and destination servers, is not available [BKFS07]. Therefore, live migration of

VM’s local storage state is necessary. Previous work in storage migration can be classified

into three migrationmodels: pre-copy, post-copy and pre+post-copy. In the pre-copy model,

storage migration is performed prior to memory migration whereas in the post-copy model,

the storage migration is performed after memory migration. The pre+post-copy model is a

hybrid of the first two models.

23

In the pre-copy model as implemented by KVM [KVM10] (a slightly different variant

is also found in [BKFS07]), the entire virtual disk file is copied from beginning to end

prior to memory migration. During the virtual disk copy, all write operations to the disk

are logged. The dirty blocks are retransmitted, and new dirty blocks generated during this

time are again logged and retransmitted. This dirty block retransmission process repeats

until the number of dirty blocks falls below a threshold, then memory migration begins.

During memory migration, dirty blocks are again logged and retransmitted iteratively. The

strength of the pre-copy model is that VM disk read operations at the destination have

good performance because blocks are copied over before the time when the VM starts

running at the destination. However, the pre-copy model has weaknesses. Firstly, pre-

copying may introduce extra traffic. Some transmitted blocks will become dirty and require

retransmissions, resulting in extra traffic beyond the size of the virtual disk. Secondly, if

the I/O workload on the VM is write-intensive, write-throttling is employed to slow down

I/O operations so that iterative dirty block retransmission can converge. While throttling is

useful, it can degrade application I/O performance.

In the post-copy model [HNO+09, HON+09], the storage migration is executed after

the memory migration completes and the VM is running at the destination. Two mecha-

nisms are used to copy disk blocks over: background copying and remote read. In back-

ground copying, the simplest strategy proposed by Hirofuchi et al. [HON+09] is to copy

blocks sequentially from the beginning of a virtual disk to the end. During this time if

the VM issues an I/O request, it is handled immediately. If the VM issues a write oper-

ation, the blocks are directly updated at the destination storage. If the VM issues a read

operation and the blocks have yet to arrive at the destination, then on-demand fetching is

employed to request those blocks from the source. We call such operations remote reads.

With the combination of background copying and remote reads, each block is transferred at

most once, ensuring that the total amount of data transferred for storage migration is min-

imized. However, remote reads incur extra wide-area delays, resulting in I/O performance

degradation.

24

In the hybrid pre+post-copy model [LZW+08], the virtual disk is copied to the desti-

nation prior to memory migration. During disk copy and memory migration, a bit-map of

dirty disk blocks is maintained. After memory migration completes, the bit-map is sent to

the destination where a background copying and remote read model is employed for the

dirty blocks. While this model still incurs extra traffic and remote read delays, the amount

of extra traffic is smaller compared to the pre-copy model and the number of remote reads

is smaller compared to the post-copy model.

The post-copy and the pre+post-copy models can potentially reduce network traffic, but

they cannot recover from a network failure during the migration. Therefore, widely used

systems such as KVM [KVM], Xen [CFH+05], and VMware [NLH05, svm] are all based

on the pre-copy model.

Due to the obvious drawbacks in the post-copy and the pre+post-copy models above,

the system design is based on the pre-copy model in this thesis. We implement a complete

system on KVM. The idea is also easy to apply to other hypervisors.

2.3 Optimization of Live Migration

Many optimization techniques of virtual machine live migration have been proposed. We

will discuss how these techniques could be integrated into our system in the future work

(Section 5.

2.3.1 Compression and Deduplication

Sapuntzakis et al. [PCP+02] developed techniques to reduce the amount of data sent over

the network: copy-on-write disks track just the updates to VM disks, “ballooning” zeros

unused memory, demand paging fetches only needed blocks, and hashing avoids sending

blocks that already exist at the remote end.

Jin et al. [JDW+09] uses memory compression to provide fast, stable virtual machine

migration. Based on memory page characteristics, they design an adaptive zero-aware

25

compression algorithm for balancing the performance and the cost of virtual machine mi-

gration.

Hacking et al. [HH09] also proposes similar ideas in leveraging compression in the live

migration of large enterprise applications.

Zhang et al. [ZHMM10] introduces data deduplication into migration by utilizing the

self-similarity of run-time memory image and hash based fingerprints. It employs run

length encode to eliminate redundant memory data during migration.

In this thesis, we do not apply any compression or decuplication in our time model. Our

algorithm and system targets at the migration system without compression and deduplica-

tion. We will adapt our time model to the system with compression and deduplication in

the future.

2.3.2 Reordering Migrated Block Sequence

Our previous research about live storage migration shows that existing solutions for wide-

area migration incur too much disruption as they will significantly slow down storage I/O

operations during migration. The resulting increase in service latency could be very costly

to a business. We proposed a novel storage migration scheduling algorithm[ZNS11] to

improve storage I/O performance during wide-are migration. Our algorithm is unique in

that it considers individual virtual machine’s storage I/O workload such as temporal lo-

cality, spatial locality and popularity characteristics to compute an efficient data transfer

schedule. Using a fully implemented system on KVM and a trace-driven framework, we

show that our algorithm provides large performance benefits across a wide range of popular

virtual machine workloads.

VMware also presents a similar optimization solution that detects frequently written

blocks and defers copying them [MCGC11]. It looks into the distribution of disk IO repe-

tition and enables multi-stage filter for hot blocks.

Similar to the above discussion, we do not apply the block sequence reordering algo-

rithm in the thesis. Our algorithm and system targets at the general migration system with

26

the default block migration sequence. We will adapt our time model to the system with

block reordering algorithm in the future work.

27

Chapter 3

Migration Time Prediction and Control

3.1 Overview

Figure 3.1 presents an overview of the design of Pacer. The three ovals represent the three

management functions in Pacer. The rectangles represent key modules. Migration time

prediction and migration time control share three key modules. Migration time control is

supported by three additional modules. Coordination of concurrent migration leverages

prediction and time control.

3.1.1 Predicting Migration Time

Predicting migration time is a challenging problem as it depends on many dynamic fac-

tors such as application workload, competing resource consumption by other virtual and

physical machines, and network performance. We show in Chapter 3.4.3.1 that a back-of-

the-envelope estimate based on image and memory size, and network bandwidth is inaccu-

rate. Pacer addresses several significant technical challenges in order to accurately predict

migration time for full VM migration.

• Migration time model for each migration phase: To predict migration time, Pacer

models migration behavior for each individual phase of migration such as disk copy,

dirty iteration, memory migration, etc. We develop a set of equations that quantita-

tively capture the relationship between time and speed. By solving these equations

based on observed conditions, Pacer can accurately predict migration time.

• Detailed remaining work estimation: In order to predict the migration time, it is

crucial to determine the remaining amount of bytes that need to be migrated, because

28

Figure 3.1 : Pacer design overview.

both memory pages and disk blocks can be written to by the VM after they have

been copied to the destination. Any of these that have changed at the source after

they have been copied over are called dirty blocks and will need to be re-copied in

the dirty iteration phase. The total number of dirty pages/blocks is not known before

migration completes as it depends on how the application is accessing memory and

storage (i.e., the application workload).

• Speed measurement: Prediction of migration time also depends on how fast can

migration run. It is crucial to do smooth and robust speed measurement during mi-

gration.

3.1.2 Controlling Migration Time

To take control of migration time, we also need to address these challenges.

29

• Speed tuning: The interference brought in by the migrated VM’s workload and other

additional competing workloads may degrade the migration speed. Solutions that do

not consider the interference and assume the real migration speed exactly matches the

configured migration speed may not finish in time due to the lower achieved speed in

reality. Pacer incorporates an algorithm to try its best to reach the desired migration

speed despite the interference, and an algorithm to estimate the maximal migration

speed that can be realized under the interference.

• Adaptation: The VM’s disk I/O workload and additional competing disk I/O work-

loads can change dynamically throughout migration. For example, when the VM’s

disk writing pattern changes, the previously dirty block prediction may no longer be

accurate. For another example, when the additional competing disk I/O workloads

vary, the maximal feasible migration speed which can be realized may change. Pacer

is thus designed to be continuously adaptive to address such workload dynamics.

3.2 Predicting Migration Time

Pacer performs predictions periodically (default configuration is every 5 seconds). To pre-

dict the remaining time during the migration, three things must be known: (1) what oper-

ation is performed in each phase of migration, (2) how much data there is to migrate, (3)

how fast the migration is progressing. This section will address these three issues. In the

formulas, we use bold font for constants and regular font for variables.

3.2.1 Migration Time Model

The total migration time T can be modeled to four distinct parts: tPrecopy for the pre-copy

phase, tDirtyIteration for the period after pre-copy but before memory migration, tMemory for

the period from the beginning of the memory migration until the time the VM is suspended,

and TDowntime for a small downtime needed to copy the remaining dirty blocks and dirty

pages once they drop below a configured threshold. TDowntime is considered as a constant

30

Phase 1 Phase 2 Phase 3 Phase 4

Content Storage Storage Memory Remaining Storage

Precopy Dirty iteration Memory, CPU, Network Connections

Amount of migrated data Known Unknown Unknown Known

DISK SIZE Threshold

Speed Measure Measure Measure -

Table 3.1 : Migrated data and speed in four phases of migration

because the remaining data to be migrated is fixed (e.g. downtime is 30ms in KVM).

T = tPrecopy + tDirtyIteration + tMemory + TDowntime (3.1)

For the pre-copy phase, we have:

tPrecopy =DISK SIZE

speedPrecopy(3.2)

where DISK SIZE is the VM virtual disk size obtained directly from the VM configuration

and speedPrecopy is the migration speed for the pre-copy phase.

At the end of pre-copy, a set of dirty blocks need to be migrated. The amount is de-

fined as DIRTY SET SIZE. The variable is crucial to the prediction accuracy during the

dirty iteration. However, its exact value is unknown until the end of pre-copy phase. It

is very challenging to know the dirty set size ahead-of-time while the migration is still in

the pre-copy phase, and there is no previous solution. We propose a novel algorithm in

Section 3.2.2 to solve this problem.

In the dirty iteration, while dirty blocks are migrated and cleaned, the clean blocks may

be overwritten concurrently and get dirty. The number of blocks getting dirty per second

is called the dirty rate. The dirty rate depends on the number of clean blocks (fewer clean

blocks means fewer blocks can become dirty later) and the workload of the VM. Similar to

the need for dirty set size estimation, we need to predict the dirty rate while migration is

31

Name Description

T a given migration time

DISK SIZE the size of the VM disk storage

MEM SIZE the size of configured memory for VM

phase the phase of migration, PRE-COPY, DIRTY-ITERATION or MEMORY MIGR

remain time the remaining time before migration deadline

past time the time past since the beginning of migration

remain precopy size the size of remaining disk data

in the pre-copy phase

dirty dsize the actual amount of dirty blocks so far

remain msize the size of remaining memory to be copied

dirty set size the estimated size of dirty set at the end of pre-copy

dirtyrate disk the rate of blocks dirtied in the dirty iteration

dirtyrate mem the rate of memory dirtied in memory migration

speed next expected the expected speed for next round

speed expected the expected speed for this round

speed observed the observed speed in this round

speed scaleup flag the flag to indicate whether speed is scaled up

interval the time for each round, e.g. 5s

estimated max speed the estimated maximal speed for migration

FULL SPEED a configured extremely high speed to exhaust the disk I/O bandwidth

NETWORK SPEED the available network bandwidth

Table 3.2 : Variable definitions.

32

still in pre-copy. We propose an algorithm in Section 3.2.2 to predict the average dirty rate

(AV E DIRTY RATE). Then, we have

tDirtyIteration =DIRTY SET SIZE

speedDirtyIteration −AV E DIRTY RATE(3.3)

where speedDirtyIteration is the migration speed for the dirty iteration phase.

Memory migration typically behaves similarly to the storage migration dirty itera-

tion. All memory pages are first marked dirty, then dirty pages are iteratively migrated

and cleaned, while pages can become dirty again after being written. We propose an

algorithm in Section 3.2.2 that is effective for estimating the average memory dirty rate

(AV E MEM DIRTY RATE).

During memory migration, different systems have different behaviors. For KVM, the

VM still accesses storage in the source and disk blocks could get dirty during memory mi-

gration. Thus, in KVM, memory migration and storage dirty iteration may take turn. Then,

denoting the size of memory as MEM SIZE and memory migration speed as speedMemory,

we have

tMemory = MEM SIZE/(speedMemory

−AV E MEM DIRTY RATE

−AV E DIRTY RATE) (3.4)

Other variants: The previous derivation assumes some behaviors that are specific to

KVM. However, the model can readily be adapted to other systems. As an example, for

VMware, storage migration and memory migration are two separate tasks. At the end of

storage dirty iteration, the VM is suspended and the remaining dirty blocks are copied to

destination. From then on, storage I/O requests go to the destination storage so no more

dirty blocks are generated, but the VM’s memory and CPU are still at the source so stor-

age I/O accesses are remote until migration completes. The speed for memory migration

in VMware would be lower than that in KVM because the network bandwidth is shared

33

between migration and remote I/O requests. Therefore, for VMware, Equations 3.4 will be

adjusted as follows:

tMemory =MEM SIZE

speedMemory − AV E MEM DIRTY RATE(3.5)

The above migration time model describes how the time is spent on each phase of live

migration. The next question is address is the amount of data to migrate.

3.2.2 Dirty Set and Dirty Rate Estimation

Migrated data consists of two parts. The first part is the original disk and memory, the size

of which is known. The second part is the generated dirty blocks and dirty pages during

migration, the size of which is unknown. We now present algorithms for predicting this

unknown.

Disk dirty set estimation: Dirty block tracking is based on the block size configured

in the migration system (1MB in KVM). For each block, we record the average write

interval, the variance of write interval (used in dirty rate estimation), and the last written

time. When a write operation is issued, Pacer updates the record for all the blocks accessed

by the operation.

There are three subsets of the dirty set. SET1 is the migrated blocks which are already

dirty. SET2 is the migrated blocks which are clean right now, but we estimate that it will

get dirty before the end of pre-copy. SET3 is the non-migrated blocks which are estimated

to get dirty after its migration time and before the end of pre-copy.

FUNCTION getEstimatedDirtyBlockSet(remain precopy size,

speed expected)

SETDirty = {}

SET1 = {blocki| already migrated and marked as dirty }

Tend = current time + remain precopy sizespeed expected

SET2 = {blocki| already migrated and marked as clean }

34

∩{blocki|∃k : tlast written(blocki)

+k · ave write interval(blocki)

∈ [current time, Tend]} // k is an integer

SET3 = {blocki|not migrated yet}

Estimate the expected migration time ti for each blocki ∈ SET3

SET3 = SET3 ∩ {blocki|∃k : tlast written(blocki)

+k · ave write interval(block i) ∈ [ti, Tend]}

SETDirty = SET1 ∪ SET2 ∪ SET3

return SETDirty

An example is shown in Figure 3.2. The first 4 blocks are already migrated to desti-

nation. t1 is the current time when the dirty set estimation algorithm is invoked, and t2 is

the estimated pre-copy finish time. Among the migrated blocks, block 2 is known to be

dirty and is in SET1. Block 4 is migrated and is clean so far, but we estimate that it will

get dirty before t2, so block 4 is in SET2. Among the non-migrated blocks, block 5 was

accessed before, and we predict that it will be written after its migration time and before

t2. Block 5 is in SET3. Thus the dirty set is {2, 4, 5}.

Disk dirty rate estimation: We develop an analytic model of the dirty iteration to

estimate disk dirty rate. Let t be the time budgeted for dirty iteration. Consider the state

of the disk at the beginning of the dirty iteration. Let N be the number of dirty blocks

in SETDirty and M be the number of clean blocks in SETClean, and let dblocki be the

i-th block in the dirty set and cblocki be the i-th block in the clean set. Abstractly, during

each time interval t′ = tN , Pacer needs to perform the work to migrate one of the N dirty

blocks and any newly generated dirty blocks during this time interval. In the first interval t′,

dblock1 is migrated. The expected number of new generated dirty blocks that are assumed

to be cleaned immediately during this first interval (D1) is computed as follows:

35

Figure 3.2 : An example of disk dirty set estimation.

D1 =∑ t′

ave write interval(blocki)

∀blocki ∈ SETClean ∪ {dblock1}

(3.6)

Note that dblock1 is included because it becomes clean. In general, the expected number

of new generated dirty block during the k-th interval is as follows:

36

Dk =∑ t′

ave write interval(blocki)

∀blocki ∈ SETClean ∪ {dblock1, dblock2, ..., dblockk}

(3.7)

Thus, the average dirty rate can be computed as follows:

AV E DIRTY RATE =

NP

i=1

Di

t· BLOCKSIZE

=M

X

i=1

BLOCKSIZE

ave write interval(cblocki)

+N

X

k=1

(N + 1 − k) · BLOCKSIZE

N · ave write interval(dblockk)

(3.8)

Our previous research about I/O characteristic in typical virtualization work-

loads [ZNS11] shows that the disk write rate is stable over long time scales. Therefore,

the disk dirty rate prediction is able to perform well. To further optimize the algorithm,

we add the following mechanism to remove inactive blocks from dirty rate calculation.

For simplicity, assume the write intervals of a block follow a normal distribution [SGM90]

∼ N(µ, σ). The possibility that the next arrival time is in [µ − 2σ, µ + 2σ] is 95%. There-

fore, if the time since the last write is already longer than 2σ for a block, that block can

be safely considered inactive. The average write interval for such a block is set to infinity.

This mechanism significantly improves the accuracy of dirty rate prediction.

FUNCTION getDirtyRateDisk(dirty set)

SETDirty = dirty set

N =the cardinality of SETDirty

SETClean = all written blocks − SETDirty

For each blocki ∈ SETClean⋃

SETDirty

37

Figure 3.3 : An example of sampling for memory dirty rate estimation

IF(current time − tlast written(blocki)

< 2 ∗ σ(blocki))

dirtyrate(blocki) = blocksizeave write interval(blocki)

ELSE

dirtyrate(blocki) = 0

ENDIF

END FOR

Sort SETDirty by block id sequence

dirtyrate = (∑

dirtyrate(blocki) ∀blocki ∈ SETClean)

+(∑N

k=1(N+1−k)∗dirtyrate(blockk)

N ∀blockk ∈ SETDirty)

return dirtyrate

Memory dirty rate estimation:

The disk dirty rate estimation algorithm would incur high overhead if it is applied to

38

Figure 3.4 : Trade-off of sampling interval

memory dirty rate estimation. Therefore, we propose a sampling-based algorithm to trade

precision for reduced overhead. The idea is that Pacer periodically takes a snapshot of the

dirty bitmap of memory pages, resets the dirty bitmap, and updates two types of informa-

tion. Figure 3.3 shows an example of a bitmap for 9 memory pages. In the example, 5

pages are written during the interval. Two types of information is updated. The first is a

cumulative write access counter for each page. If a page is written to during this period,

this counter is incremented. The second is the number of unique written pages u during

this period obtained by counting the number of set bits. In the example, the write access

counters for page 2, 4, 5, 8, 9 are incremented by 1. With these information, we can

estimate the average dirty rate as follows. We define the access ratio for each page i as

access ratio(i) = write access counter(i)P

write access counter(i),i∈{all pages}

Denote the sampling interval to be ts and then the rate at which unique write pages are

generated is uts. This rate is an upper bound for the true dirty page rate, and it corresponds

to the worst case scenario where all pages were clean at the beginning of the interval. With

access ratio representing the contribution of a page to the overall dirty rate, the dirty rate

for page i can be estimated as d(i) = uts· access ratio(i). Similar to the analysis for the

disk dirty iteration, when we migrate the n-th page, the dirty rate isn∑

i=1d(i). The average

dirty rate is thereforeNP

k=1

kP

i=1

d(i)

N where N is the total number of memory pages.

The selected sampling interval would affect the accuracy of the estimation. For exam-

39

ple, if we sample at 2s and there is a page written every one second, its estimated dirty rate

is lower than the real dirty rate. A way to increase the accuracy is to reduce the sampling

interval in consecutive rounds and see whether the estimated dirty rate increases. If the

dirty rate increases, the sampling interval is reduced further until the rate stabilizes or the

interval meets a configured minimal interval. Figure 3.4 shows the tradeoff for sampling

interval between accuracy and overhead. In Pacer, the sampling interval starts at 2s and is

reduced by half if needed. To bound the overhead, we set a minimum sampling interval to

0.25s.

FUNCTION getDirtyRateMEM()

∀page i in memory

access ratio(i) = write access counter(i)P

write access counter(i),∀i∈{all pages}

d(i) = uts· access ratio(i)

ave dirty rate =

NP

k=1

kP

i=1

d(i)

N

IF(ts > MIN SAMPLE TIME)

&&(ave dirty rate > previous dirty rate)

ts = ts/2

previous dirty rate = ave dirty rate

END IF

return ave ditry rate

3.2.3 Speed Measurement

Smoothing measurements: In each interval, we measure the migrated data and compute

the average actual speed during the interval. In order to smooth out short time scale vari-

ations of the measured actual speed, we apply the commonly used exponential smoothing

average method to update the measured actual speed. The smoothing weight α represents

the degree of weighting decrease, a constant smoothing factor between 0 and 1. A lower

α discounts older observations faster and does not smooth-out short term fluctuation well.

40

We ran some experiments to test α in [0.5, 0.9] and found 0.8 to be a reasonable choice.

speedsmooth = α · speedsmooth + (1 − α) · speedmeasured (3.9)

Robustness of measurements: To predict the overall migration time accurately, the

measured speed must not be biased during certain periods. However, the measured speed

can be highly biased if the application’s disk accesses are concentrated in a region of the

disk because this causes a bias in disk seek time. During migration, the disk handles the in-

terleaving I/O requests from the application and from the migration. The disk arm therefore

moves back and forth between the active region of the application and the migration. As the

migrated blocks get farther away from the application’s active region, seek time increases

and migration speed decreases. This bias in measured migration speed hurts the accuracy

of prediction. To improve the robustness of the speed measurement, instead of migrat-

ing disk blocks sequentially, we divide the virtual disk into stripes and generate a pseudo

random ordering for visiting these stripes to perform block migration. We use a pseudo

random sequence rather than a true random sequence because it allows the computation

of the expected migration time for a specific block in the dirty set estimation algorithm in

Section 3.2.2. With this optimization, the robustness of measurement is greatly improved.

Results on the benefit of this technique are presented in Section 4.4.

3.3 Controlling Migration Time

Pacer divides the migration time into rounds of small intervals as Figure 3.5 shows. In

each round, Pacer adapts migration speed to maintain a targeted migration finish time. It

updates the estimation of dirty block set (dirty set size), dirty disk rate (dirtyrate disk)

and dirty memory rate (dirtyrate mem) based on the algorithms in Section 3.2.2, and then

Pacer compute proper migration speed in the way that the following section describes. The

speed is adjusted later based on the algorithms that handle I/O interference in Section 3.3.2.

41

Figure 3.5 : Each round of adaption for controlling migration time

3.3.1 Solving for Speeds in Each Phase of Migration

For a specific desired migration time T , many combinations of migration speeds in each

phase are feasible. Pacer aims to control the migration progress in a systematic and stable

way, which leads to the following speed solutions.

Migrating memory pages generally will not generate disk I/O because for performance

consideration the memory of the VM is usually mapped to the memory of the physical ma-

chine, and thus the speed of memory migration is limited by the available network band-

width (NETWORK SPEED which can be directly measured) and so

speedMemory =NETWORK SPEED (3.10)

With this simplification, only two variables need to be solved: speedPrecopy and

42

speedDirtyIteration. There are still many combinations of such speeds that can finish mi-

gration in time T . However, to minimize the severity of disk I/O interference caused by

migration, we seek to minimize the maximum migration speed used. This policy implies

that

speedPrecopy = speedDirtyIteration (3.11)

where speedDirtyIteration is the average speed for the dirty iteration in storage migration.

Thus, the appropriate speedPrecopy can finally be solved by substituting and rearranging

terms in Eq. (3.1).

INPUT OF ALGORITHM: T,DISK SIZE,MEM SIZE

INITIALIZATION

remain precopy size = DISK SIZE

remain msize = MEM SIZE

remain time = T

phase =PRE-COPY

speed scaleup flag = FALSE

speed expected = remain precopy size+remain msizeremain time

Other variables are initialized to be 0

LOOP

Set storage migration speed to be speed expected

Perform migration for the time indicated by interval unless it finishes

At the end of the period

past time = past time + interval

remain time = T − past time

IF(phase changes)

update phase

43

ENDIF

IF(phase is PRE-COPY)

dirty set = getEstimatedDirtyBlockSet(remain precopy size,

expected speed)

dirty set size =the cardinality of dirty set

dirtyrate disk = getDirtyRateDisk(dirty set)

ELSE

dirty set size = 0

dirtyrate disk = #blocks dirtied in previous round·blocksizeinterval

END IF

IF(phase is PRE-COPY or DIRTY-ITERATION)

dirtyrate mem = getDirtyRateMem()

ELSE

dirtyrate mem = #pages dirtied in previous round·pagesizeinterval

END IF

speed observed = the amount of transferred trafficinterval

estimated max speed = getEstimatedMaxSpeed(speed observed,

speed expected, estimated max speed)

speed next expected = getExpectedSpeed(phase, remain time,

remain precopy size, dirty dsize, remain msize, dirty set size,

dirtyrate disk, dirtyrate mem, estimated max speed)

speed next expected = getSpeedScaleup(speed next expected,

speed scaleup flag, speed expected, speed observed)

speed expected = speed next expected

More precisely, during the pre-copy phase, at the beginning of each interval, we solve

the following equations to obtain the migration speed (speedPrecopy or s1 for short) to use

for the interval. NETWORK SPEED is measured in the previous interval and passed into the

44

equations as a constant.

Solve following equations. We use t1, t2, t3

to represent tPrecopy, tDirtyIteration, tMemory

and s1, s2 to represent speedPrecopy,speedDirtyIteration

remain time is the remaining migration time before deadline

remain precopy size is the remaining disk data in the precopy

t1 + t2 + t3 = remain time − Tdowntime

t3 = remain msizeNETWORK SPEED−dirtyrate mem−dirtyrate disk

s1 = remain precopy sizet1

dirty set size + dirtyrate disk · t2 = s2 · t2

s1 = s2

s1, s2 ≥ 0

0 ≤ t1, t2 ≤ remain time − TDowntime − t3

During the dirty iteration, we have the total bytes of current dirty blocks dirty dsize.

The migration speed consists of two parts. One part is to migrate the current dirty blocks in

the remaining time before memory migration. The other part is to migrate newly generated

dirty blocks at the rate of dirtyrate disks.

speedDirtyIteration =dirty dsize

remain time − tMemory+ dirtyrate disk (3.12)

During the memory migration, the migration speed is set to the available network band-

width.

We apply an algorithm which will be described in Section 3.3.2 for computing the max-

imal feasible migration speed that can be realized under interference. When Pacer detects

45

Figure 3.6 : An example of migration speeds in different phases.

that the computed speed is higher than the maximal feasible speed, it knows finishing by

the desired time is not feasible. Then it migrates as fast as possible by transferring data as

fast as possible without rate limiting, and computes a new finish time prediction and reports

it to the user. Furthermore, it employs disk I/O throttling to upper bound the disk dirty rate

to a configurable fraction of the achievable migration speed.

Figure 3.6 illustrates how the migration speed might be controlled by Pacer during dif-

ferent migration phases. During pre-copy, Pacer aims to maintain a stable speed but adapts

to workload changes if necessary. During dirty iteration, the migration speed depends on

the dirty set size and the dirty rate. At the beginning of dirty iteration, the dirty set already

includes the most frequently written blocks, so few new blocks will get dirty, corresponding

46

to a low dirty rate. As more dirty blocks become clean, the dirty rate increases.

The shape of the curve in practice depends on the workload. Pacer aims to migrate

the dirty set at a stable pace. This results in a dirty iteration migration speed curve that is

parallel to the dirty rate curve. Finally, during memory migration, migration can typically

proceed at a higher speed than in the previous two phases because its bottleneck is most

likely in the network.

Other variants: Similar to the discussion in Section 3.2.1 for migration time model,

the speed control can readily be adapted to other systems. As an example, for VMware,

Equations 3.10 will be adjusted as follows:

speedMemory =NETWORK SPEED−IO RATE (3.13)

where IO RATE denotes the bandwidth consumed by remote storage I/O and can be esti-

mated by monitoring the application workload.

3.3.2 Maximal Feasible Speed Estimation and Speed Tuning

Due to interference (no matter from disk or network), the achieved migration speed may

vary. It is therefore important to estimate the true maximal feasible migration speed and

ensure the desired migration speed is realized.

We estimate the maximal feasible speed by comparing the wanted speeds as specified

by Pacer and the observed speeds in reality. When migration starts, if we detect that the

observed speed cannot reach the wanted speed, we will record this speed values pair. In sub-

sequent rounds, if the new observed speed is lower than or equal to the recorded observed

speed and the new wanted speed is higher than the recorded wanted speed, we estimate that

the maximal feasible speed has been reached. The maximal feasible speed is updated by the

current observed speed. In the future, when any observed speed is higher than the maximal

feasible speed, the maximal feasible speed is updated. In order to smooth out short time

scale variations on the maximal feasible speed, we use an exponential smoothing average

for updating the maximal feasible speed. The smoothing weight β in Pacer is set to 0.8.

47

FUNCTION getEstimatedMaxSpeed(speed observed,

speed expected, estimated max speed)

IF(no recorded pairs

&&speed observed < speed expected)

speed pair expected = speed expected

speed pair observed = speed observed

ELSE IF(speed observed < estimated max speed)

IF(speed observed ≤ speed pair observed

&&speed expected > speed pair expected)

estimated max speed = α · estimated max speed+

(1 − α) · speed observed



ELSE IF(speed observed > speed pair observed)



END IF

ELSE

estimated max speed = α · estimated max speed+

(1 − α) · speed observed



END IF

return estimated max speed

When the observed speed cannot reach the wanted speed in a round, Pacer will scale

up the wanted speed for the next round and set a scale-up flag to indicate that the speed has

been scaled up. In the next round, if the new observed speed is not higher than the previous

48

observed speed, that means the scaling up did not help. Pacer then does not perform scale

up for the next round.

FUNCTION getSpeedScaleup(

speed next expected, speed scaleup flag, speed observed,

speed expected)

IF(speed scaleup flag == FALSE

&&speed observed < speed expected

&&speed next expected ≥ speed observed)

speed scaleup flag = TRUE

ELSE IF(speed scaleup flag == TRUE

&&speed observed < speed record observed)

speed scaleup flag = FALSE

ENDIF

IF(speed scaleup flag == TRUE)

speed next expected = speed next expected+

(speed expected − speed observed)

speed record observed = speed observed

ENDIF

return speed next expected

3.4 Evaluation

3.4.1 Implementation

Pacer is implemented on the kernel-based virtual machine (KVM) platform. KVM consists

of a loadable kernel module, a processor specific module, and a user-space program –

a modified QEMU emulator. QEMU performs management tasks for the VM. Pacer is

implemented on QEMU version 0.12.50 with about 2500 lines of code. Two options are

49

added to the migration command: (1) an option to enable migration prediction and report

the predicted migration time periodically (2) an option to specify the desired migration time

and let Pacer control the migration progress to achieve the specified desired finish time.

3.4.2 Experiment Setup

The experiments are set up on two physical machines. Each machine has a 3GHz Quad-

core AMD Phenome II X4 945 processor, 8GB RAM, 640GB WD Caviar Black SATA

hard drive, and Ubuntu 9.10 with Linux kernel (with the KVM module) version 2.6.31.

In all experiments (unless specified), the migration speed is restricted to be no more than

32MBps to mimic the level of available bandwidth in inter-datacenters scenarios.

In our test platform, the I/O write speed on the destination disk for migration is at most

15MBps, while RAID is widely used in commercial clouds to increase the I/O speed to

be over a hundred MBps. To fully measure the prediction accuracy with a wide range of

configured speeds, and to meet the time control requirement of various desired migration

time, we modify QEMU at the destinationmachine not to write the received data to the disk.

To ensure that the result is not biased by the disabled writing, we run a set of experiments of

enabling and disabling writing at the destination, vary the number of clients, and compare

the average prediction error in both cases. The difference is less than 1s. We vary the

desired migration time and compare the difference between the actual migration time and

desired time in both cases. The difference is less than 1s again. The results show that

disabling writing does not bias the experiment results.

The experiment VMs run VMmark Virtualization Benchmark [VMW10]. VMmark

consists of five types of workloads: file server, mail server, database server, web server,

and java server, with each representing different types of applications. Table 3.3 shows the

configuration of those servers. We vary the number of client threads to generate different

levels of workload intensity. A simple program is used to generate competing disk I/O

traffic on the source machine to create more challenging test scenarios that are more repre-

sentative of multi-tenancy clouds. It randomly accesses the disk by generating read/write

50

Workload VM Configuration Server Default#

Name Application Clients

File SLES 10 32-bit dbench 45

Server (fs) 1 CPU,256MB RAM,8GB disk

Mail Windows 2003 32-bit Exchange 1000

Server (ms) 2 CPU,1GB RAM,24GB disk 2003

Java Windows 2003 64-bit SPECjbb 8

Server (js) 2 CPU,1GB RAM,8GB disk @2005-based

Web SLES 10 64-bit SPECweb 100

Server (ws) 2 CPU,512MB RAM,8GB disk @2005-based

Database SLES 10 64-bit MySQL 16

Server (ds) 2 CPU,2GB RAM,10GB disk

Table 3.3 : VMmark workload summary.

I/O requests. Three models are applied to control the I/O rate by varying the interval be-

tween two I/O requests. The static model generates I/O with a constant interval. Two

dynamic models generate I/O following an exponential distribution (λ = 10, 50 or 90) or

Pareto distribution (PAR(α, k) where α = 2 and k = 10, 50, or 90). Each experiment is

run for 3 times with different random number seeds. The results show very little variance

(< 0.1%). We believe that is because the VMmark workload is quite stable from run to run

as our previous research [ZNS11] about VMmark workload shows.

The performance of prediction is evaluated by prediction error. The predictor com-

putes and reports its prediction tpi every N seconds from the beginning of migration until

the migration finishes. After the migration, we evaluate the accuracy of the prediction by

computing the absolute difference between the actual migration time t and the reported

prediction time, and then report the average of those absolute differences:P

|tpi−t|t/N . We

optimize Pacer to avoid prediction spikes due to some sudden temporary workload shifts

by generating a cumulative average predicted time over all past individual predicted times

51

and using it as the reported prediction time.

3.4.3 Prediction of migration time

3.4.3.1 VM-size based predictor and progress meter do not work

In the following experiment, we will show that the VM-size based prediction method and a

more dynamic method, progress meter, fail to give an accurate prediction of the migration

time.

The VM-size based predictor uses the formula storage size+memory sizeconfigured migration speed . This approach

is commonly used when users want to predict the migration time.

Another dynamic predictor is also implemented for comparison. The predictor is called

progress meter, which is based on the migration progress reported by QEMU. Whenever

the migration progress increases by 1%, the predictor records the current migration time t

and the progress x%, computes the progress rate x%t , and uses that rate to predict the finish

time 100%∗tx% dynamically.

The experiment runs on two types of VM image sizes to represent the typical image

sizes in industrial environments. 160GB is the size of an Amazon EC2 small instance

and 8GB is the size of the VMmark file server image. We use a micro benchmark that

repeatedly writes to a data region of the VM’s virtual disk at a specified write rate. The size

of the written region and the write rate vary to create different dirty set sizes and dirty rates

during the migration.

Table 3.4 shows the results. The prediction errors for the VM-size based predictor

and the progress meter are several orders of magnitude larger than those of Pacer, mainly

because those two methods do not predict the time of the dirty iteration and memory mi-

gration. The prediction errors of those two methods scales up with higher write rates and

larger written region sizes, while Pacer always achieves small prediction errors in all cases.

52

(a) VM-160GB

Predictor Prediction Error

Vary Write Rate Vary Written Region Size

(Written Region Size 10GB) (Write Rate 20MBps)

5MBps 15MBps 25MBps 5GB 15GB 25GB

VM size-based 326s 395s 519s 185s 698s 1157s

Predictor

Progress 316s 382s 510s 169s 687s 1149s

Meter

Pacer 6s 5s 8s 8s 10s 9s

(b) VM-8GB

Predictor Prediction Error

Vary Write Rate Vary Written Region Size

(Written Region size 1GB) (Write Rate 20MBps)

5MBps 15MBps 25MBps 512MB 1GB 2GB

VM size-based 43s 74s 99s 46s 60s 122s

Predictor

Progress 41s 70s 94s 45s 51s 114s

Meter


Table 3.4 : Prediction errors for the VM size-based predictor and the progress meter areseveral orders of magnitude higher than Pacer.

53

0

100

200

300

400

500

600

700

800

0 100 200 300 400 500 600

Mig

ratio

n Ti

me

(s)

Time (s)

Predicted Migration TimeReal Migration Time

Figure 3.7 : The prediction of a VM (file server-30clients) migration. Pacer achieves accu-rate prediction from the very beginning of the migration.

3.4.3.2 Pacer in face of uncertain dynamics

We vary multiple dimensions in the migration environment to demonstrate that Pacer per-

forms well under different scenarios. We use the file server VM with 8GB storage as the

representative workload in many experiments, because it is the most I/O intensive workload

in VMmark and it challenges Pacer the most. Pacer computes and reports a predicted time

every five seconds.

Figure 3.7 shows an example of the prediction process during migration. The experi-

ment is based on the migration of a file server with 30 clients. There is additional competing

traffic on the same hypervisor. The additional competing traffic follows the exponential dis-

tribution of average 10ms sleeping time. The actual migration time is 596s. In the first 20

54

seconds, Pacer predicts the migration time as 400 seconds because it does not have enough

data for an accurate prediction. From 20 seconds onwards, its prediction time is very close

to the actual migration time. The prediction error is [0s, 26s] excluding the first 20 seconds.

The average prediction error is 14s over the entire migration period and 7s for the period

excluding the first 20 seconds.

Table 3.5 shows more scenarios for evaluating Pacer under different dynamic changes.

The first three experiments have no additional competing traffic.

Vary configured speed: This experiment is based on the file server with the workload

of 15 clients. We vary the configured migration speed from 30MBps to 50MBps. As

Table 3.5 shows, the average prediction error varies from 2s to 7s.

Vary the number of clients: This experiment is based on the file server with the default

configured speed of 32MBps. We vary the number of clients from 0 to 30 to represent light

workload, medium workload, and heavy workload. The average prediction error ranges

from 2s to 6s. The results show that Pacer achieves good prediction even with heavy work-

load.

Vary workload type: We vary the workload types with the default configured speed of

32MBps. The average prediction error varies from 1s to 8s across four types of workload.

Vary additional competing traffic: This experiment is based on file server with 15

clients. We vary the intensity of additional competing traffic based on the Pareto model of

average 50ms and 90ms sleeping time. The average prediction errors are 4s and 6s.

According to the results and observations, an advantage of Pacer is that Pacer achieves

accurate prediction from the very beginning of the migration. We take the prediction values

in the first minute and compute the average prediction error for each experiment above. The

resulting errors are within the range of [2s, 12s], which is slightly larger than the average

prediction error of the entire migration. The reason why Pacer achieves accurate predic-

tion from the very beginning is because of the effective dirty set and dirty rate prediction

algorithms. We will quantify the benefits of these algorithms in Section 3.4.4.3.

In summary, Pacer provides accurate average prediction in various scenarios. The pre-

55

Actual Average

Migration Prediction

Time Error

Vary configured speed

(fs-15 clients)

30 MBps 309s 5s

40 MBps 234s 2s

50 MBps 201s 7s

Vary the number of client

(Configured speed 32MBps)

0 client 263s 2s

15 client 288s 2s

30 client 331s 6s

Vary workload types

ms-200 client 794s 8s

js-16 client 264s 1s

ws-100 client 269s 2s

ds-16 client 402s 8s

Vary additional competing traffic

(fs-15 clients)

Pareto 50ms 319s 4s

Pareto 90ms 299s 6s

Table 3.5 : Prediction with Pacer

diction error ranges from 1s to 8s across all the above scenarios.

56

3.4.4 Best-effort migration time control

3.4.4.1 Dirty block prediction is critical for effective time control

We implement an adaptive time controller without dirty block prediction. The migration

speed is computed by the formula remain pre copy+existing dirty blocksremain time . Similar to the setup in

Section 3.4.3.1, the experiment uses two types of image size, 160GB and 8GB. The micro

benchmark is leveraged to generate dynamic write workload on VM. The desired migration

time is 6500s for the migration of VM (160GB) and is 400s for the migration of VM (8GB).

Table 3.6 shows the migration time deviation. The actual migration time of Pacer is

very close to the desired time, with maximal deviation of [-1s,+6s]. The migration time of

the controller without dirty block prediction exceeds the desired time up to 1528s and the

deviation gets larger when the workload is more write intensive, because the controller lack

the capability to predict the amount of remaining blocks for migration and thus it selects a

wrong speed. We will show how the key components in Pacer help to reduce the deviation

later in Section 3.4.4.3.

3.4.4.2 Pacer in face of uncertain dynamics

Similar to the experiments for prediction, we vary multiple dimensions in the migration

environment to show that Pacer can perform adaptive pacing to realize the desired migration

time.

Vary desired migration time: This experiment is based on the file server with the

workload of 30 clients. We vary the desired migration time from 150s to 400s. The Fig-

ure 3.8 shows that when the desired time is within the range of [200s, 400s], the migration

time in the three runs is very close to the desired time, with maximal deviation of [−2s, 2s].

When we decrease the desired migration time way beyond anything feasible, the I/O be-

comes the bottleneck, and consequently Pacer will hit its minimal migration time of 176s,

while the default QEMU with the configured speed of 32MBps can finish the migration in

362s.

57

(a) VM-160GB

Migration Migration Time Deviation

Time Vary Write Rate Vary Written Region Size

Controller (Written Region Size 10GB) (Write Rate 20MBps)


Controller 282s 309s 327s 264s 1004s 1528s

w/o dirty block prediction


(b) VM-8GB

Migration Migration Time Deviation

Time Vary Write Rate Vary Written Region Size

Controller (Written Region Size 1GB) (Write Rate 20MBps)


Controller 31s 47s 59s 54s 88s 110s

w/o dirty block prediction

Pacer 1s 2s -1s 1s 1s 2s

Table 3.6 : Migration time deviation for Pacer is much smaller than the controller withoutdirty block prediction.

58

150

200

250

300

350

400

450

150 200 250 300 350 400 450

Mig

ratio

n Ti

me

(s)

Desired Migration Time (s)

Pacerdefault QEMU

Ideal

Figure 3.8 : Migration with different desired finish times. Pacer almost matches the idealcase when the desired time is larger than 176s. The deviation is very small in [-2s,2s].

Vary the number of clients: We vary the number of clients from 0 to 60 on the file

server. As Figure 3.9 shows, there exists a lower bound for migration time (minimal migra-

tion time) because of the I/O bottleneck. Pacer can adaptively pace the migration to achieve

any target migration time in the feasible region above the smallest possible time for migra-

tion to complete, while default QEMU can only achieve one migration time for a specific

number of clients. Moreover, when the number of clients increases above 35, QEMU can-

not converge and the migration time becomes infinite. The reason is that QEMU uses a

configured constant speed that will not increase when the I/O bandwidth becomes higher.

We choose six different desired migration times from 144s to 400s in the feasible re-

gion, and migrate VM with different number of clients with those different desired migra-

tion times. The results in Table 3.7 show that Pacer can achieve the desired time in all cases

59

Figure 3.9 : Migration with different degrees of workload intensity. Any point in thefeasible region can be realized by Pacer. The lower bound for migration time is limited byI/O bottleneck. Default QEMU can only follow a narrow curve in the region.

with maximal deviation of [−2s, 1s].

Vary workload type: We perform live migration with Pacer for five types of VMmark

workloads. In order to guarantee that the default QEMU can converge in the migration, we

decrease the number of clients. We run default QEMU first and get the migration time, and

then we set this time to be Pacer’s desired migration time. Table 3.8 shows that Pacer can

achieve desired migration time with a small deviation in [−2s, +2s].

Vary additional competing traffic: To test whether Pacer can achieve desired migra-

tion time when different levels of I/O interference exist, we run the following experiment

with the program in Section 3.4.2 to generate additional competing I/O traffic. The mi-

60

Desired 10 20 30 40 50 60

Time Clients Clients Clients Clients Clients Clients

144s [−1, 0] [0, 0] – – – –

176s [0, 0] [−1, 1] [0, 1] – – –

203s [−1, 1] [−2, 1] [0, 0] [0, 1] – –

222s [0, 0] [0, 1] [−1, 0] [−1, 0] [0, 1] –

305s [0, 0] [−2, 1] [−1, 0] [−2, 0] [0, 0] [0, 0]

400s [0, 0] [−1, 0] [−2, 0] [−2, 0] [−1, 1] [−2, 0]

Table 3.7 : Deviation of migration time on Pacer with different workload intensities. Thenumber in the bracket represents the worst earliest and latest deviation in Pacer. For ex-ample, [−1, 1] means at most early by 1s and late by 1s. ”-” means the time is beyond thefeasible region.

Workload Desired Pacer

Migr Migr

Time(s) Time(s)

fs-30 clients 362 360

ms-200 clients 897 899

js-16 clients 274 275

ws-100 clients 287 287

ds-16 clients 471 473

Table 3.8 : Migration time on different types of workload. Pacer can achieve the desiredmigration time.

61

Sleeping Run1 Run2 Run3

Time MigrTime MigrTime MigrTime

Dev(s) Dev(s) Dev(s)

No Add Traffic -1 0 0

Static 50ms 0 -5 1

Expo (ave 50ms) -5 0 -4

Pareto (ave 50ms) 0 -2 3

Static predictor 90ms -3 0 -5

Expo (ave 90ms) -5 -2 1

Pareto (ave 90ms) 0 2 1

Table 3.9 : Migration time for Pacer when the additional competing traffic varies. Pacercan achieve the desired migration time with a small finish time deviation.

grated VM runs file server with 30 clients. The desired migration time is 264s. Table 3.9

shows the results for three runs. Pacer can achieve the desired time when the I/O inter-

ference varies. The deviation is [−5s, 3s] which is small comparing to the desired time of

264s.

3.4.4.3 Benefits of key components in Pacer

Dirty set and dirty rate prediction: In order to understand the benefit of key components

in Pacer, we design an experiment to compare the systems with and without dynamic dirty

set and dirty rate prediction to evaluate the effectiveness of those algorithms. The workload

is file server. As Table 3.10 shows, the actual migration time will exceed the desired mi-

gration time significantly in the case that there is no prediction algorithm. When only the

dynamic dirty set prediction algorithm is added into the system, the accuracy of migration

time improves but still exceeds the desired time. When both the dirty set and dirty rate

prediction algorithms are used in Pacer, Pacer can perform adaptive pacing with very little

deviation [−2s,−1s].

62

Work Desired Pacer without Pace with Pacer

dirty set/rate only dirty set

load Time(s) prediction(s) prediction(s) (s)

30 clients 200 216 206 198

60 clients 400 454 431 399

Table 3.10 : Importance of dynamic dirty set and dirty rate prediction. Without any of thesealgorithms, it is hard to achieve desired migration time.

Desired With speed Without speed

Time tuning tuning

200s 198s 284s

300s 300s 380s

400s 399s 553s

Table 3.11 : Importance of speed scaling up algorithm.

Speed measurement and tuning: We design an experiment to run Pacer with and

without maximal speed prediction. The VM runs file server with 30 clients. Additional

competing traffic is generated by constant interval 10ms. Without maximal speed predic-

tion, migration runs in 697s when the desired time is 600s. With prediction, migration can

finish in time. Moreover, we design another experiment to run migration with and with-

out the speed scale-up algorithm on the file server with 30 clients, but without additional

competing traffic on the disk. We set the desired migration time to be 200s, 300s and 400s.

The results are shown in Table 3.11. Without the speed scale-up algorithm, migration will

considerably exceed the desired time in all three experiments.

63

3.4.5 Overhead of Pacer

In this experiment, we measure the overhead introduced by Pacer in terms of time and

space. For example, for best effort time control, we run migration with Pacer for the file

server workload with 60 clients and a desired migration time of 400s. We measure the

computation time of Pacer in each round. We observe that the computation time is 28.24ms

at the beginning of migration. As the migration progresses and more blocks in the dirty set

are determined, the computation time drops to below 1ms in the final stage of migration.

Overall, Pacer on average only incurs 2.4ms of computation time for each 5 second interval.

The overhead is 0.05% ,which is negligible. The space overhead in terms of additional

memory required to run Pacer compared to default QEMU is less than 1MB. Prediction

consumes less computation resource than best-effort time control.

We also evaluate the overhead introduced by Pacer for each disk I/O write operation

during migration. The default QEMU already has a dirty block tracking function to track

each disk write operation during migration. Pacer just leverages the existing tracking sys-

tem and performs a simple update for average write interval. We ran experiments to mea-

sure the disk I/O write latency with and without Pacer. The average disk I/O latency at

millisecond accuracy and throughput at MB/s accuracy is the same with and without Pacer.

We also measure the application throughput and response time on the file server during

migration with and without Pacer. The results show no side effect on application perfor-

mance with Pacer. In summary, the overhead of Pacer is small and has no impact on the

performance of the application.

3.4.6 Potential robustness improvements

Pacer could be improved further by including mechanisms to mitigate the negative im-

pact of rare case when migration environment variables are not steady. Firstly, Pacer is an

adaptive system with a fixed adaptation interval (5s) in the current design. Instead, a flex-

ible interval can be applied when Pacer detects that the workload intensity or the network

available bandwidth varies significantly. Reducing the adaptation interval will improve the

64

adaptivity but it also incurs more overhead. By adjusting the adaptation interval, we can

make a trade-off between the speed of adaptation and overhead. Secondly, we can test

the migration environment, e.g. network bandwidth, against expected patterns to find out

whether any increasing or decreasing trend exists. These mechanisms will be considered

in our future work.

3.5 EC2 Demonstration

Figure 3.10 : VM migration from Rice campus to Amazon EC2.

To demonstrate the functions of Pacer in a commercial hybrid cloud environment, we

conduct a set of experiments using the Amazon EC2 cloud. In these experiments we mi-

grate VMs from the Rice campus network to EC2. On EC2, we use High-CPU Medium

instances running Ubuntu 12.04. EC2 instances do not support KVM, thus we use the

“no-kvm” mode in QEMU in EC2. The downside is that without KVM’s hardware virtual-

ization support, a QEMU VM’s performance is reduced.

65

Workload

intensity none Low Medium Heavy

Actual

Migration 227s 240s 255s 250s

Time

Average

Prediction 6.35 5.39 4.71 7.76

Error

Table 3.12 : Prediction accuracy with Pacer.

3.5.1 Network and Disk Speed Measurements

We characterize the network and disk speed that can be achieved between Rice and EC2

and make several interesting observations. First, we use iperf to measure TCP network

throughput for 200s. We find that when transmitting data from Rice to EC2, the through-

put increases gradually and linearly for a surprisingly long 30s before it maximizes at

roughly 60MBps. More specifically, 50% of the speed samples fall between 58MBps to

62MBps. After the initial 30s, 5% of the speed samples are below 40MBps and 3.5% are

below 30MBps. Based on these findings, we cap the migration speed in the experiments to

50MBps. Second, we use scp to copy a 8GB file from Rice to EC2 to measure achievable

disk speed. We sample the scp reported speed every 0.5s. The average speed achieved is

30.9MBps. Thus, disk speed is the most likely bottleneck for migration in these EC2 exper-

iments. To compute the degree of disk speed variation, we compute the absolute difference

between each speed sample and the average speed. We find the amount of variation to be

significant, with the average absolute difference being 5MBps.

66

Desired time 500 600 700 800

Deviation [-2 +2] [-2 +2] [-1 +2] [-3 0]

Table 3.13 : Migration time control in EC2.

3.5.2 Use Case 1: Prediction of Migration Time

To measure the accuracy of Pacer’s prediction, we migrate one VM that runs the file server

from Rice to EC2. We vary the number of clients to emulate different workload intensities

of the VM server. The CPU utilization rate is 30-45% for the low workload, 45-55% for

the medium workload, and 55-70 for the high workload.

For each intensity of the workload we run three sets of experiments and report the

average prediction error in Table 3.12. The first observation is that the accuracy of the

prediction does not decrease as the workload increases. Secondly, given the fact that the

network and disk speeds are quite unstable, Pacer still can predict with an average absolute

error of about 5s. We find that if disk writes at the destination are disabled to eliminate

the impact of disk speed variation, the average prediction error is reduced to 2s. Given the

disk speed typically fluctuates 16% from the average speed, the obtained average prediction

error ranging from 2% to 3% of the actual migration time is quite desirable.

3.5.3 Use Case 2: Best-effort Migration Time Control

In this experiment we migrate the 8GB file server with medium workload and vary the

desired migration time from 500s to 800s. For each desired time we run three experiments

and report the range of the deviations in Table 3.13. Although we have reported that the

network and disk speeds between Rice and EC2 are not very stable, Pacer still works very

well in controlling the migration time to within 3s of the desired time.

67

3.6 Related Work

3.6.1 Live Migration

While to our knowledge no previous work is directly comparable to Pacer, there exists

some related work on setting the speed or estimating the time of CPU/memory-only VM

migration. Breitgand et al. [BKR11] propose a cost function for computing the network

bandwidth allocated to CPU/memory-only migration in order to minimize the theoretical

number of application delay bound violations as given by a queuing theory model.

Akoush et al. [ASR+10] simulate the execution of the iterative data copy algorithm of

CPU/memory-only migration so as to estimate the required migration time. The simula-

tion makes certain simplifying assumptions such as fixed network bandwidth and fixed or

historically known memory page dirty rate.

Relative to these previous works, not only does Pacer address a different set of prob-

lems in migration progress management for full VM migration, Pacer also takes a system-

building approach based on real measurements and run-time adaptation, which are found

to be crucial to cope with workload and performance interference dynamics, to realize a

complete system.

3.6.2 I/O Interference in Virtualized Environment

In the virtualized environment, multiple VMs may coexist in the same system. They share

the same underlying I/O resources, and thus the I/O interference comes. The I/O from

one VM may affect the performance of the I/O of other VMs. Two types of solutions

are proposed to mitigate the I/O interference: performance isolation and resource adap-

tation. I/O scheduling algorithms have been proposed for fair sharing and performance

isolation among multiple VMs [GMV10]. On the other hand, resource adaptation algo-

rithms adjust the allocation of resources among VMs when performance degradation is

detected [NKG10, PHS+09]. These performance isolation or resource adaptation solutions

could potentially assist Pacer to maintain a desired migration speed with accuracy. With-

68

out such underlying supports, Pacer uses simple run-time migration speed measurement

as inputs to dynamically adjust the aggressiveness of migration, attain the desired migra-

tion speed, and apply the maximal possible migration speed if it cannot reach the desired

migration time due to the I/O interference.

3.6.3 Data Migration Technologies

A related area is data migration quality of service. In [LAW02], Lu et al. presents Aque-

duct, a data migration system that minimizes the impact on the application performance.

However, Aqueduct simply treats the migration as a low-priority task and does not provide

a predictable migration time.

Dasgupta et al. [DGJ+05] and Zhang et al. [ZSS06] present different rate controlling

schemes that attempt to meet a data migration time goal and evaluate them through simu-

lations. However, both of these proposals do not consider the dirty data generated by write

operations during the migration nor the need for the iterative dirty block migration. The

possible reason for the above imperfections is that data migration solutions including those

commercial ones (e.g. IBM DB2 UDB [PRH+03]) are designed to migrate data from one

local storage device to another. Those solutions are based on the assumption that upon the

migration of a data block, all accesses to that block is redirected to the destination device

without incurring much penalty, and thus the notion of dirty data does not exist. However,

the same idea of redirection will lead to terrible performance degradation in live VM mi-

gration if the migration is carried out over a long distance. Moreover, it is possible for

network outage to interrupt the migration process, and thus the redirection technique also

runs the risk of losing the latest copy of the migrated data.

3.6.4 Performance Modeling and Measurement

As live migration becomes common in cloud management, the way it performs is thus im-

portant to users. Therefore, performance modeling and measurement for VM livemigration

is proposed [WZ11, BKR11, ASR+10, VBVB09, ZF07, CCS10].

69

Wu et al. [WZ11] conducted a series of experiments on Xen to profile the time for

migration a DomU VM running different resource-intensive applications while Dom0 is

allocated different CPU shares for processing the migration. Regression methods are then

used to create the performance model based on the profiling data.

Breitgand et al. [BKR11] introduces a new model to quantify the trade-off between

minimizing the copy phase duration and maintaining an acceptable quality of service during

the pre-copy phase for CPU/memory-only migration.

Akoush et al. [ASR+10] characterize the parameters affecting live CPU/memory-only

migration with particular emphasis on the Xen virtualization platform. They provide two

simulation models to predict memory migration time.

Voorsluys et al [VBVB09] present a performance evaluation on the effects of live mi-

gration of virtual machines on the performance of applications running inside Xen VMs.

It shows that in most cases, migration overhead is acceptable but cannot be disregarded,

especially in systems where service availability and responsiveness are governed by strict

Service Level Agreements.

Zhao et al [ZF07] seeks to provide a model that can characterize the VM migration

process and predict its performance, based on a comprehensive experimental analysis.

Checconi et al [CCS10] addresses the issue of how to meet the strict timing constraints

of (soft) real-time virtualized applications while the virtual machine hosting them is un-

dergoing a live migration. It introduces a stochastic model for the migration process and

reserves resource shares to individual VMs.

70

Chapter 4

Coordinated Migration of Multi-tier Applications

Although existing live migration techniques [KVM, CFH+05, NLH05] are able to migrate

a single VM efficiently, those techniques are not optimized for migrating related VMs in a

multi-tier application. When the entire virtual disk is being migrated, the amount of data

to move and the time it takes to perform such a move become non-trivial. Given the fact

that the VMs running a multi-tier application are highly interactive, a serious issue is that,

during the migration, the performance of the application can degrade significantly if the

dependent components of an application are split between the source and the destination

sites by a high latency and/or congested network path.

Figure 4.1 shows an example of migrating a 3-tier e-commerce application from one

cloud to another. Note that this inter-cloud migration example is not the only scenario that

can suffer from performance degradation. The limited bandwidth scenario can arise within

a campus or even within a machine room.

In this example, the application has 4 VMs (shown as ovals) implementing a web server,

two application servers, and a database server. An edge between two components in the

figure indicates that those two components communicate with one another. We define a

performance metric called the performance degradation time, which is the time period dur-

ing which any communicating components are split over the source and destination sites.

When such a split happens, certain inter-component communications must be conducted

over the bandwidth limited and/or high latency network, leading to degraded application

performance. Specifically, this example shows that two existing migration strategies, se-

quential and parallel migration, may result in poor performance. Sequential migration,

which migrates each VM one by one, results in a long performance degradation time from

71

when the first VM finishes migration until the last VM finishes migration. Parallel mi-

gration, which starts migration of multiple VMs at the same time, is not able to avoid the

degradation either. This is because the amount of data to migrate for each VM is different

and therefore the VMs in general will not finish migration simultaneously. The application

will experience performance degradation until all VMs have completed migration. Fur-

thermore, if the bandwidth required for migrating all VMs in parallel exceeds the actual

available bandwidth, additional performance problems will result (See the Challenge 2 in

section 4.3).

(a) Before migration

(b) With two migration strategies

Figure 4.1 : Sequential and parallel migration of a 3-tier web application across clouds.

72

In this chapter, we formulate the problem in the livemigration of multi-tier applications.

At the same time, we show the quantitative impact of uncoordinated multi-tier application

migration. We present a new communication-cost-driven coordinated approach, as well as

a system called COMMA (Coordinated migration of multi-tier applications) that realizes

this approach. Experimental results show the great capability and benefits of the coordi-

nation system. We also demonstrate the functions of COMMA in EC2, the most popular

commercial cloud environment.

4.1 Problem Formulation

Let n be the number of VMs in the multi-tier application and the set of VMs be

{vm1, vm2, ..., vmn}. The goal of the multi-tier application migration problem is to min-

imize the performance degradation caused by splitting the communicating components

between source and destination sites during the migration. Specifically, we propose a

communication-cost driven approach. To quantify the performance degradation, we de-

fine the unit of cost as the volume of traffic between VMs that need to crisscross between

the source and destination sites during migration. More concretely, by using the traffic

volume to measure cost, components that communicate more heavily are treated as more

important. While many other metrics could be selected to evaluate the cost, e.g. the end-to-

end latency of requests, the number of affected requests, performance degradation time, we

do not adopt them for different reasons. We do not adopt the end-to-end latency of requests

and the number of affected requests because it is application dependent and requires extra

support for measurement from application level. We do not adopt performance degradation

time because it ignores the communication rate between components. We define the cost

as the volume of traffic which does not require any extra support from application and is

application independent.

Let traffic matrix TM represent the communication traffic rate between any two VMs

prior to the start of migration. Figure 4.2 shows an example about how to compute the cost.

There are 3 VMs and any of the two VMs talks to each other. Our cost model is based

73

Figure 4.2 : An example about cost computation for 3 VMs

on the traffic prior to migration rather than the traffic during migration. During migration,

the traffic rate of the application may be distorted by a variety of factors such as network

congestion between the source and destination sites and I/O congestion caused by the data

copying activities. Therefore, we cannot optimize against the traffic rate during migration

because the actual importance of the interaction between components could be lost through

such distortions. Let migration finish time for vmi be ti. Our goal is to minimize the total

cost of migration, where:

cost =n

∑

i=1

n∑

j>i

|ti − tj | ∗ TM [i, j] (4.1)

4.2 Quantitative Impact of Uncoordinated Multi-tier Application Mi-

gration

Multi-tier applications can involve many components. Amazon web service architecture

center [Amab] provides users with the necessary guidance and best practices to build appli-

cations in the cloud. It also provides architectural guidance for design and implementation

of systems that run on the Amazon web service infrastructure. There are common config-

urations from Amazon for different types of applications. Figure 4.3 shows the reference

three-tier architecture for highly-scalable and reliable web or mobile-web applications.

The three tiers are web server tier, application server tier and database server tier.

There is inter-tier communication between webserver/application server tiers and appli-

74

cation server/database server tiers. HTTP requests are handled by load balancing servers

which automatically distributes incoming application traffic across multiple web servers.

If the request requires further processing from application servers, web servers will send

requests to the load balancers between web servers and application servers. Once the ap-

plication server gets the requests, it may need to query database servers for more required

data. Besides talking to application servers, the database servers have intra-tier commu-

nication. Database servers talk to each other for greater fault tolerance with master/slave

mode.

The number of servers deployed in the application is adjusted up or down according to

user-defined conditions. For a popular application with large traffic demand, more VMs are

deployed to maintain performance. Otherwise, for a less popular application, fewer VMs

are deployed to minimize costs.

Figure 4.3 shows examples of multi-tier e-commerce applications in Amazon

AWS [Amab] with different numbers of VMs. In order to gain quantitative insights, we

perform numerical analysis to illustrate the potential performance degradation experienced

when such applications are migrated from one cloud to another.

Assume that the VMs have the characteristics in Table 4.1, and the available migra-

tion bandwidth is 256Mbps which is shared by all VM’s migrations. The parameters that

we select are image size, memory size, dirty set size and max dirty rate. These are four

key parameters for determining the migration time as we discussed in the migration time

model in Chapter 3. The four parameters depend on different types of configuration and

workload. We select a set of common configurations to demonstrate the problem. The

image size and memory size follow the recommendation from VMware benchmark con-

figuration [VMW10]. Dirty rate is the rate of generated dirty blocks during dirty iteration.

Dirty set is the size of dirty blocks at the end of pre-copy stage. Different workloads have

different dirty rate and dirty sets. We use a higher value for database server to mimic the

intensive write disk operations in database servers. Degradation time is defined as the total

time when two interacting components are split over two clouds.

75

Two migration approaches based on existing techniques are considered in the analysis.

The first approach is sequential migration in which VMs are migrated one by one. The

migration speed is assumed to reach the available migration bandwidth. The total migration

time is the sum of each VM’s migration time. Each VM’s migration time is computed asImage Size+Mem Size

Bandwidth + Dirty Set

(Bandwidth−Max Dirtyrate2

)

The second approach is parallel migration, which starts concurrent migration of all VMs

at the same time. We allocate the migration bandwidth to each VM according to its maximal

dirty rate for convergence. If the available migration bandwidth is larger than the sum of

VMs’ maximal dirty rates, the remaining bandwidth is evenly distributed to each VM. The

migration time for each VM is computed by applying the VM’s specific information and

available migration bandwidth into above equation. The total migration time for all VMs is

decided by the longest migration time for a single VM. The performance degradation time

is obtained by computing the difference between the migration finish times for VMs.

The results in Table 4.2 shows that existing solutions lead to large performance degra-

dation time. Sequential migration incurs long degradation time, which increases with the

size of the application topology. Although parallel migration has a shorter degradation

time, it is still very significant (tens of seconds). Furthermore, a serious problem is that

migration cannot converge when the number of parallel migrated VMs exceeds 5, because

the sum of the VMs’ dirty rates is greater than the available bandwidth and thus dirty data

copying cannot finish unless I/O throttling is employed to reduce the dirty rates.

In summary, quantitative analysis of migration impact on multi-tier applications shows

that multi-tier application could be very complex and today’s migration methods lead to

significant performance degradations.

4.3 System Design

To address the problem introduced above, we design a system called COMMA to coordi-

nate the migration of multi-tier applications. COMMA is the first migration coordination

system for multiple VMs using a series of scheduling algorithms.

76

Component Type Image Mem Dirty Max Dirty

Size Size Set Rate

Web/App Server 8GB 1GB 100MB 2MBps

Load Balancer

Database 8GB 1GB 1GB 15MBps

Table 4.1 : Example VM and workload parameters. Dirty set is defined as the data byteswritten on the VM’s virtual disk at the end of disk image copy. Dirty rate is defined as thespeed at which VM’s virtual disk and memory is written.

Sequential Parallel

Migration Migration

Migration Degradation Migration Degradation

Time(s) Time(s) Time(s) Time(s)

2VM 620 328 620 32

3VM 912 1238 912 58

4VM 1205 1820 1205 50

5VM 1498 2111 1498 44

6VM 1826 4042 INF INF

7VM 2118 4624 INF INF

8VM 2410 4915 INF INF

9VM 2739 8158 INF INF

Table 4.2 : Degradation time with sequential and parallel migration. INF means migrationcould not converge and thus the migration time is infinite.

77

Figure 4.3 : Examples of multi-tier web services.

Figure 4.4 : An example of coordinating the migration with COMMA

The system consists of two parts: 1) a centralized controller running on a hypervisor

on the migration source network, and 2) a local process per VM running on each VM’s

hypervisor to govern each VM migration. The local process provides two functions: 1) it

periodically reports to the controller about the migration status to let the controller make

the progress management decision, and 2) it exposes a control interface, which receives

78

messages from the controller and adjusts the migration progress accordingly. The reported

migration status includes actual migration speed, actual dirty blocks, actual dirty rate, pre-

dicted dirty set at the end of pre-copy and predicted maximal dirty rate. Based on the

migration status, the controller periodically executes the scheduling algorithm to compute

the proper settings for each VM migration process in order to achieve COMMA’s perfor-

mance objective. The controller sends control messages to the local processes. The control

messages include the migration speed and when the migration speed should be set. Then

each local process would implement the controller’s decisions to achieve the overall ob-

jective of finishing the migration with minimal degradation cost. More details about the

control interface is introduced in Pacer’s implementation in Chapter 3.4.1.

The full migration of multiple VMs is scheduled into small intervals. Before migra-

tion, the user provides the list of VMs to be migrated as well as the source hypervisors

and destination hypervisors to COMMA. COMMA queries the source hypervisors for each

VM’s image size and memory size. At the same time, COMMA uses iperf [ipe] to measure

the available network bandwidth between the source and destination sites and uses iptraf

[ipt05] to measure the traffic matrix for the communication rates among VMs. At the begin-

ning, the measured network bandwidth is considered as the available migration bandwidth.

However, we do not only rely on this measurement. We break the migration time into short

intervals where we update and recompute the available migration bandwidth in each inter-

val. In each interval, we assume the bandwidth is fixed, and then the system computes the

new estimated available migration bandwidth for the scheduling of next interval.

4.3.1 Subsystem: Pacer

COMMA focuses on the coordination of multiple VMs’ migration where each VM’s mi-

gration progress is handled by Pacer. Pacer provides two types of interfaces to COMMA:

query and control. Pacer is able to response to the query for its actual migration speed,

migration progress, predicted dirty set and predicted dirty rate in the pre-copy phase, and

actual dirty set and dirty rate in the dirty iteration phase. Pacer also provides control inter-

79

faces for COMMA to start migration, stop migration and set desired migration speed.

4.3.2 Challenges and Solutions

In Chapter 1, we have discussed the challenges for multi-tier application migration. We

will discuss how we tackle these challenges with our solutions.

• Higher order control. Fundamentally, each individual VM migration process can

only be predicted and controlled to a certain extent (as shown by Pacer). It is nec-

essary to design a new architecture where a higher order control mechanism governs

all VM migration activities. COMMA designs a centralized controller to coordinate

the migration of VMs in the multi-tier application.

• Inter-VM-migration resource contention and allocation For multiple VM migra-

tions, the convergence issue is more complicated but also more interesting. We need a

mechanism to check whether it is possible to migrate multiple VMs at the same time,

how to combine VM migration in a group for convergence and how to schedule the

migration start and finish time of each group to achieve the goal of minimizing com-

munication cost. COMMA introduces the concept of “valid group” to decide how

to combine VMs into groups for convergence consideration. Then it performs inter-

group scheduling for valid VM groups and intra-group scheduling for each VM in a

group. Inter-group scheduling is to ensure feasibility given the available bandwidth

and to guarantee convergence. Intra-group scheduling is to maximize bandwidth uti-

lization.

• Inter-VM-migration dynamicity and interference Interference among multiple

VM migrations exists. When multiple VM migrations occur in the same period,

they will share the available resource. COMMA collects the actual migration speed

and progress from each VM and makes adjustment based on the feedback.

• System design and efficiency The computation complexity for an optimal solution

to coordinate a multi-tier application could be very high. It is important that coordi-

80

nation system is efficient and with low overhead. COMMAwith a heuristic algorithm

in the scheduling is able to reduce the computation overhead by 99% while achieving

96% of the optimal performance in our experiments 4.4.

4.3.3 Scheduling Algorithm

The algorithm works in two stages. In the first stage, it coordinates the migration speed of

the static data of VMs (Phase 1) so that all VMs complete the precopy phase at nearly the

same time. In the second stage, it coordinates the migration of dynamically generated data

(Phase 2, 3, 4) by inter-group and intra-group scheduling. The definition of four phases in

migration is in Chapter 3.

Phase 1 migrates static content. Thus there is no inherent minimum speed requirement.

Phase 2 and 3 migrate dynamically generated content. The content generation rate implies

a minimum migration speed which must be achieved or else throttling might become nec-

essary (which causes application performance degradation). Therefore, we should dedicate

as much of the available bandwidth to phase 2 and 3 in order to prevent application per-

formance degradation. This clearly implies that the phase 1 migration activities should not

overlap with phase 2 and 3. More discussion about adapting to changing dirty rate and

bandwidth is in Section 4.3.6.

4.3.3.1 First stage

The goal of the first stage is to migrate VMs in parallel and finish all VMs phase 1 at the

same time. Assuming the data copying for each VM is performed over a TCP connection,

it is desirable to migrate VMs in parallel because the aggregate transmission throughput

achieved by the parallel TCP connections tend to be higher than a single TCP connection.

In this stage, the amount of migrated data is fixed. The controller adjusts each VM’s

migration speed according to its virtual disk size (See Equation 4.2).

During the migration, the controller periodically gathers and analyzes the actual avail-

able network bandwidth, migration speeds and the progresses of VMs. Then it leverages

81

Figure 4.5 : An example of valid group.

the maximal speed prediction and tuning algorithms in our previous migration progress

management system Pacer to pace the migration of the whole set of VMs.

speedvmi=

DISK SIZEi∗BANDWIDTH

TOTEL DISK SIZE

(4.2)

Figure 4.4 shows an example of migrating 4 VMs with COMMA. The user submits a

migration request to the controller with the logical topology of the application, VM config-

uration, traffic matrix, possible network bandwidth and destination hypervisors’ addresses.

The controller coordinates the migration of 4 VMs such that their precopy phases complete

at the same time. At the end of the first stage, each VM has recorded a set of dirty blocks

which require retransmission in the next stage.

82

4.3.3.2 Second stage

In the second stage, we introduce the concept of “valid group” to overcome the second chal-

lenge above. COMMA performs inter-group scheduling to minimize the communication

cost and intra-group scheduling to efficiently use network bandwidth.

To satisfy the convergence constraint, the VMs in the multi-tier application are divided

into valid groups according to the following rule: the sum of VM’s maximal dirty rate in a

group is no larger than the available network bandwidth (See Equation 4.3). The maximal

dirty rate is usually achieved at the end of dirty iteration, since at this time most blocks

are clean and they have high probability of getting dirty again. The maximal dirty rate is

needed before the second stage but it is unknown until the migration finishes, and thus we

leverage a dirty rate estimation algorithm which has been shown to work well in Chapter 3

to estimate the maximal dirty rate before the second stage starts.

∑

vmi∈group

{Max dirty ratei} ≤ BANDWIDTH (4.3)

Figure 4.5 shows an example about how to compute valid groups for the 3 VMs demon-

strated in Figure 4.2. The maximal dirty rates for VM1, VM2 and VM3 are 20MBps,

5MBps and 10MBps. There are six valid groups. Group {V M1, V M2, V M3} is not a

valid group because the sum of maximal dirty rates is larger than the available network

bandwidth 30MBps. The six valid groups could generate four possible combinations of

migration sequence.

4.3.4 Inter-group Scheduling

In order to minimize the performance degradation cost, COMMA needs to compute the

optimal group combination and migration sequence. We propose two algorithms: a brute-

force algorithm and a heuristic algorithm. The brute-force algorithm can find the optimal

solution but its computation complexity is high. The heuristic algorithm reduces the com-

putation overhead by 99% while achieving 96% of the optimal performance in our experi-

83

ments 4.4.

4.3.4.1 Brute-force algorithm

The brute-force algorithm lists all the possible combinations of valid groups, performs the

permutation for different migration sequence and computes the performance degradation

cost. It records the group combination and migration sequence which generates theminimal

cost.

Given a set of VMs, the algorithm generates all subsets first, and each subset will be

considered as a group. The algorithm eliminates the invalid groups that do not meet the

rule above. It then computes all combinations of valid groups that exactly add up to a

complete set of all VMs. Figure 4.4 shows one such combination of two valid groups that

add up to a complete set: {vm1, vm2} and {vm3, vm4}. Next the algorithm permutes each

of such combination to get sequences of groups, and those sequences stand for different

migration orders. The algorithm then computes the communication cost of each sequence

based on the traffic matrix and the migration time reported from the intra-group scheduling

algorithm. Finally the algorithm will select the group combination and the sequence with

the minimal communication cost.

Let n be the number of VMs in the application. The time complexity for the brute-force

algorithm is O(2n ∗ n!), because it takes O(2n) to compute all the subsets and takes O(n!)

to perform permutation for each combination.

4.3.4.2 Heuristic algorithm

Our heuristic algorithm tries to estimate the minimal cost by prioritizing VMs that need

to communicate with each other the most. Given the traffic matrix, we can get a list

L of the communication rates between any two VMs. Each element in L includes

(rate, V Mi, V Mj). It represents the communication between node V Mi and node V Mj

with rate. The heuristic algorithm takes the traffic matrix as input and generates the VM

group set S as follows. Figure 4.6 shows an example of migrating 4 VMs based on heuristic

84

Figure 4.6 : An example for heuristic algorithm

algorithm.

• Step 1: Sort the communication rates in L by descending order. S is empty

at the beginning. In the example, the list L of communication rates is

{(80, V M3, V M4), (50, V M3, V M1), (20, V M1, V M2), (10, V M2, V M4)}.

• Step 2: Repeatedly take the largest rate element (rate, V Mi, V Mj) from L. Check

whether V Mi and V Mj are already in S

– Case1: Neither V Mi nor V Mj is in S: If the two VMs can be combined into

a valid group, insert a new group {V Mi, V Mj} into S. Otherwise, insert two

groups {V Mi} and {V Mj} into S.

– Case2: Only one VM is in S. For example, V Mi is in S and V Mj is not in

S. Find the group which includes V Mi. Check whether V Mj can be merged

into the group based on the convergence constraint in Equation 4.3. If it is still

a valid group after merging, then V Mj is merged into the group. Otherwise, a

85

new group {V Mj} is inserted into S. For the case that V Mj is in S and V Mi

is not, it is similar.

– Case3: Both V Mi and V Mj are in S: If the two groups can be merged into one

group with convergence constraint, then merge the two groups.

In the example, we take the maximal rate (80, V M3, V M4) first. Neither V M3

nor V M4 is in S. It matches case 1. (V M3, V M4) can be combined into a valid

group. We insert a new group (V M3, V M4) into S. Then we take the second rate

(50, V M1, V M3). It matches case 2 because only V M3 is in S. V M1 cannot be

merged into the valid group (V M3, V M4). A new group V M1 is inserted into S.

Then, we take the rate (20, V M1, V M2). Similarly, it matches case 2 again because

only V M1 is in S. V M2 can be merged into the valid group (V M1). For the last rate

(10, V M2, V M4), both VMs are in S. It matches the case 4. The two groups can not

be merged into one group with convergence constraint, so we will not merge the two

groups.

• Step 3: At the end of step 2, we have S which includes the valid group of VMs. The

algorithm then compares permutations on the groups to find the one with minimal

cost.

The time complexity for the heuristic algorithm is O(n2logn + n2 + n!). Sorting in the

step 1 takes O(nlogn). In the worst case, there are n2 elements in the list L which means

every VM communicate with the other VMs. The permutation in step 3 takes O(n!) in the

worst case when each VM forms a group.

4.3.5 Intra-group Scheduling

To migrate the VMs in a valid group, one possible solution is to allocate bandwidth equals

to the VM’s maximal dirty rate to the corresponding VM. Then, we start the migration of

all VMs in the group at the same time. The definition of valid group guarantees that we

have enough bandwidth to support all VMs in the group migrating concurrently.

86

Figure 4.7 : Intra-group scheduling. (a) Start VM migrations at the same time, but finishat different times. Result in long performance degradation time. (b) Start VM migrationsat the same time and finish at the same time. Result in long migration time due to theinefficient use of migration bandwidth. (c) Start VMmigrations at different times and finishat the same time. No performance degradation and short migration time due to efficient useof migration bandwidth.

However, starting the VMs’ migration at the same time is not an efficient use of avail-

able migration bandwidth. Figure 4.7 shows the migration of three VMs in the dirty it-

eration with different mechanisms to illustrate this inefficiency. Figure 4.7(a) shows that

3 VMs start dirty iteration of migration at the same time. Different VM have different

migration speed and dirty rate. Therefore, they finish migration at different times without

coordination. For example, V M1 takes 5 minutes to migrate most of the dirty blocks/pages.

Then it could enter phase 4 to pause the VM and switch over to run in the destination. V M3

may take 10 minutes to finish. That result sin 5 minutes of performance degradation. Re-

calling that the goal of COMMA is to reduce the performance degradation cost during

migration. Therefore, the ideal case is that the VMs in the group finish migration at the

same time. In order to make them finish at the same time, we could force V M1 and V M2

to hold in the dirty iteration and continue migrating new generated dirty blocks until V M3

is done as Figure 4.7(b) shows. This mechanism is not efficient because it wastes a lot of

migration bandwidth in holding V M1 and V M2 in the dirty iteration.

To efficiently use the migration bandwidth, the algorithm schedules the migration of

87

VMs inside a group to finish at the same time but it allows them to start the dirty iteration

at different times as Figure 4.7(c) shows.

The design is based on the following observations in practice. (1) Delaying the start

time of VMs with light workload can allow for more bandwidth to be allocated to VMs

with heavy workload. (2) At the end of the first stage, most of the VM’s frequently written

blocks are already marked as dirty blocks, and the dirty rate is low at this time. Therefore,

delaying the start time of dirty iteration will not significantly increase the number of dirty

blocks. (3) Once the dirty iteration starts, it is better to finish migration as soon as possible

to save the bandwidth.

We run migrations of a file server with 30 clients to demonstrate our observations.

Figure 4.8 shows the dirty rate for the two experiments. Figure 4.8(a) shows the migration

without any delay for the dirty iteration. From 0 to 280s, migration is in the pre-copy

phase and its dirty rate is very stable around 32KBps. Dirty iteration start from 280s to

350s. The dirty rate is very low at the beginning and increases as dirty iteration proceeds.

Figure 4.8(b) show the migration with 35s delay on the start of dirty iteration. During this

period, we can see the dirty rate is almost zero. It means there is no more clean blocks

getting dirty.

Initially we assume that the minimal required speed for each VM is set equal to the

VM’s maximal dirty rate, and then the migration time for each VM is estimated based

on a time model in Chapter 3. The algorithm would schedule different starting times for

VMs according to their estimated migration time so that every VM is expected to finish the

migration at the same time.

Available network bandwidth may be larger than the sum of VM’s minimal required

migration speed. If there is extra available bandwidth, the bandwidth will be further allo-

cated to the VMs to minimize the total migration time of the group. This allocation is done

iteratively. Suppose the group has N VMs, the extra available bandwidth is first allocated

to vmN , where the subscript indicates the VM’s start time order in the schedule. That is,

vmN is the VM that starts the latest in the schedule. The allocation of this extra bandwidth

88

0

10000

20000

30000

40000

50000

60000

0 50 100 150 200 250 300 350 400

Dirt

y R

ate

(B/s

)

Migration Time (s)

No delaying on the start of dirty iteration

(a) No delay

0

10000

20000

30000

40000

50000

60000

70000

80000

0 50 100 150 200 250 300 350 400 450 500

Dirt

y R

ate

(B/s

)

Migration Time (s)

Delay 35s on the start of dirty iteration

(b) Delay 35s

Figure 4.8 : An example of delaying the start of dirty iteration for the migration.

89

reduces vmN ’s migration time, and thus its start time can be moved closer to the finish

time target in the schedule. Next, the extra available bandwidth prior to the start of vmN

is given to vmN−1. vmN−1’s migration time is thus reduced also. Then the extra available

bandwidth prior to the start of vmN−1 is given to vmN−2 and so on, until the migration

time for the first VM to start is also minimized.

4.3.6 Adapting to changing dirty rate and bandwidth

The maximal dirty rate and migration bottleneck are the key input parameters in the

scheduling algorithm. When those parameters fluctuates, COMMA is able to handle it with

adaptation. COMMA will periodically estimate the maximal dirty rate, measure the avail-

able bandwidth and recompute the schedule for not-yet-migrated groups. When COMMA

detects that available bandwidth is smaller than the sum of any two VM’s maximal dirty

rate, the migration will be degraded to sequential migration to ensure convergence. In the

extremely rare case, if the available bandwidth is smaller than a single VM’s maximal dirty

rate, throttling is performed to that VM such that the dirty rate is reduced and migration

could converge.

4.3.7 Putting it all together

In order to put all algorithms together, we show an example (see Figure 4.9) of applying

scheduling on the migration of an application with 4 virtual machines. In the first stage, we

migrate static content. The goal is to migrate VMs in parallel and finish all VMs phase 1 at

the same time. Then the second stage starts. In the second stage, we introduce the concept

of valid group to address the second challenge about convergence. COMMA performs

inter-group scheduling to minimize the communication cost and intra-group scheduling

to efficiently use network bandwidth. In the example, we compute all the possible valid

groups and then select the optimal group combination {V M3, V M4} and {V M1, V M2}.

With inter-group scheduling, the valid group {V M3, V M4} will be migrated before the

valid group {V M1, V M2}. To efficiently use the migration bandwidth, COMMA schedules

90

Figure 4.9 : An example of scheduling algorithm (Put all together)

the migration of VMs inside a group to finish at the same time but it allows them to start the

dirty iteration at different times. The overall purpose is still to minimize the performance

degradation cost. We demonstrate a possible migration scheduling in the example for the

two valid groups.

4.4 Evaluation

4.4.1 Implementation

The implementation is based on the kernel-based virtual machine (KVM) platform. KVM

consists of a loadable kernel module, a processor specific module, and a user-space program

– a modified QEMU emulator. QEMU performs management tasks for the VM. COMMA

is implemented on QEMU version 0.12.50. A centralized controller is implemented in

C++.

91

4.4.2 Experiment Setup

The experiments are set up on six physical machines. Each machine has a 3GHz Quad-

core AMD Phenome II X4 945 processor, 8GB RAM, 640GB WD Caviar Black SATA

hard drive, and Ubuntu 9.10 with Linux kernel (with the KVM module) version 2.6.31.

4.4.3 Migration of a 3-tier Application

In this experiment, we migrate RUBiS [RUB], a well-known benchmark for auction web-

site. RUBiS includes one web server, two application servers and one database server, with

the topology as Figure 4.3 shows. The 4 VMs are deployed on at most 3 physical machines

with different placement setup to mimic the randomness of VM placement policy on public

clouds. The 4 VMs have the same image size of 8GB and the memory size is 2GB for web

server and application server, 512MB for database server. The workload is 300 clients. The

migration bottleneck is caused by I/O write speed on the destination disk which is at most

15MBps. Table 4.3 shows the result.

Sequential migration has the longest migration time and the highest cost in all cases.

More than 2GB data are affected by sequential migration. Parallel migration reduces the

cost to less than 1GB, but it is still much higher than the cost with COMMA. COMMA has

up to 475 times of reduction on the number of data bytes affected by migration.

COMMA has a slightly larger migration time than parallel migration. It is because

COMMA tries to make VMs’ migrations finish at the same time but parallel migration

does not. When some VMs finish earlier, the other undergoing VMs which share the same

resources can take advantages of the released resource and make migration finish earlier.

4.4.4 Manual Adjustment does not Work

While the above experiment runs sequential and parallel migration, one could try to ad-

just sequential and parallel migration to better support multi-tier application migration by

reordering the migration sequence in sequential migration and manually configuring the

migration speed in parallel migration based on static migration info, e.g. VM disk size.

92

VM Sequential Parallel COMMA

Placement Migration Migration Migration

Migration Cost Migration Cost Migration Cost

Time(s) (MB) Time(s) (MB) Time(s) (MB)

{web,app1,app2,db} 2289 2267 2155 13 2188 7

{web,db},{app1,app2} 2479 2620 918 72 1043 2

{web,app1},{db,app2} 2425 2617 1131 304 1336 2

{web}{app1,app2}{db} 2330 2273 914 950 926 2

{web,app1}{app2}{db} 2213 1920 797 717 988 4

{web}{app1}{app2,db} 2310 2151 1012 259 1244 5

Table 4.3 : Comparisons of three approaches on 3-tier applications. {...} represents theVM set on one physical machine.

However, in this experiment, we show that such mechanism cannot achieve the goal of

minimizing the degradation cost.

The experiment is based on SPECweb2005, which is another popular web service

benchmark [SPE]. It contains a frontend Apache server and a backend server that works

as a database. The image sizes for the two VMs are 8GB and 16GB respectively. The

workload has 50 clients. Table 4.4 shows the results of six migration methods. The first

two methods are sequential migrations with different VM orders. The sequential migration

method causes large degradation cost 265MB and 139MB respectively for two different

migration orders.

The next three methods are based on parallel migration. In the first experiment, both

VMs are configured with the same migration speed of 32MBps, and they do not finish at

the same time, with the cost of 116MB. In the second experiment, the speed for frontend

VM (8GB) is set to be 16MBps and keep the speed for the backend VM (16GB) to be

32MBps. With the change of configured migration speed proportional to the image size, the

user may expect the two migrations to finish at the same time. However, the result is not as

93

Sequential Migr. Parallel Migr. COMMA

frontend backend 32/32 16/32 5/10

first first MBps MBps MBps

Cost(MB) 265 139 116 122 9 0.2

Migr.

Time(s) 1584 1583 1045 1043 1697 1043

Table 4.4 : Manual adjustment on the configured speed is hard in achieving low cost andsmall migration time.

expected because the migration can not achieve the configured speeds most of the time (I/O

bottleneck is 15MBps). To further reduce the time gap in the previous parallel migration

method, a conservative solution is to reduce the configured speeds for both VMs. In the

third parallel migration experiment the configured speeds are set to be 5MBps and 10MBps.

The degradation time reduces to 36s and the cost reduces to 9MB, but the low speed brings

a side effect of longer migration time. The three experiments show that it is impossible for

users to set the migration speed statically to achieve low performance degradation cost and

timely migration simultaneously. In a real cloud environment, with additional competing

traffic or more intensive workload, guessing the proper speed configurations will be even

harder. With COMMA, the controller can coordinate the migration progress of the two

VMs automatically. The two VMs finish the migration as fast as possible and have only

0.2MB of cost.

4.4.5 Algorithms in Inter-group Scheduling

In the experiment, we evaluate the brute-force algorithm and the heuristic algorithm in

performance degradation cost and the computation time. We run simulations to evaluate

the different migration approaches on the multi-tier web services as Figure 4.3 shows. We

generate random number between 0 to 100KBps as the communication rate when there is a

94

Sequential Parallel COMMA-Bruteforce COMMA-Heuristic

Migration Migration Migration Migration

2VM 28 3 0 0

3VM 84 3 0 0

4VM 114 3 0 0

5VM 109 3 0 0

6VM 222 INF 1 2

7VM 287 INF 2 2

8VM 288 INF 1 2

9VM 424 INF 9 13

Table 4.5 : Performance degradation cost (MB) with different migration approaches

link between two VMs. Each experiment is run for 3 times with different random number

seeds. Table 4.5 shows the average results. In the first four cases (V M ≤ 5), all VMs can

be coordinated to finish at the same time and the cost is 0. In larger scales of (V M ≥ 6),

the coordination algorithm will try its best to schedule VM’s migration and achieve the

minimal cost. The coordination with the brute force algorithm achieves a slightly lower cost

than the coordination with the heuristic algorithm. Comparing to the sequential migration,

COMMA with the brute-force algorithm could reduce the cost by 97.9% and COMMA

with the heuristic algorithm could reduce the cost by 96.9%. COMMA use the heuristic

algorithm in the scheduling achieves 96% of the optimal performance in our experiments.

Figure 4.10 shows the computation time for the brute-force algorithm and the heuristic

algorithm. When the number of VM increases to 9, the computation time for the brute-

force algorithm sharply increases to 32 seconds while the computation time for the heuristic

algorithm is still very low, 274 micro seconds. COMMA with a heuristic algorithm in the

scheduling reduces the computation overhead by 99%.

95

10

100

1000

10000

100000

1e+06

1e+07

1e+08

2 3 4 5 6 7 8 9

Com

puta

tion

Tim

e (u

s)

The number of VMs

Brute-forceHeuristic

Figure 4.10 : Computation time for brute-force algorithm and heuristic algorithm

4.5 EC2 demonstration

To demonstrate COMMA in a real commercial hybrid cloud environment, we conduct an

experiment using Amazon EC2 public cloud. We migrate two VMs from the Rice Univer-

sity campus network to EC2 instances with the same settings as the experiment in the last

section except that, in the SPECweb2005 setting, a client number of 10 is used as the de-

fault workload. The SPECweb 2005 webserver cluster consists of one 8G front webserver

and another 16G database server. For the EC2 setting, we use High-CPUMedium instances

running Ubuntu 12.04. EC2 instances do not support KVM, and thus QEMU runs on the

”no-kvm” mode. The reason we decrease the application workload in this EC2 experiment

is that the performance of the VM running on EC2 instance is very slow without KVM

kernel support, and the decreased workload ensures that the dirty iteration migration stage

96

converges.

The result is in Table 4.6. There are three different migration approaches other than

COMMA. In the sequential approach, the performance degradation time is equal to the

time of migrating the last VM, because the two VMs will stay in two different data centers

from the time when the first VM finishes the migration until when the second VM finishes

the migration. For the parallel approach with the same configured migration speed for both

VMs, the degradation cost is still 19 MB, which is not that different from 28MB and 17MB

in the sequential approach. The reason for the degradation cost is that the parallel approach

still has no control over the migration progress, and multiple VMs could conduct the mi-

gration at different speeds depending on network dynamics and thus finish the migration at

different times. The second and third approaches in parallel migration is to set the migra-

tion speed to be proportional to the size of the VM image, with the expectation of finishing

the migration of all VMs at the same time. We set the migration speed to 32MBps/16MBps

and 10MBps/5MBps in two experiments. The degradation cost decreases to 6MB, which is

much smaller than that in the previous two approaches. However, these approaches do not

fully utilize the available bandwidth, and thus the migration time could increase, especially

in the case with the migration speed of 5/10 MBps which could be less than the available

bandwidth. For COMMA, the degradation cost is only 0.1MB, and the migration time is

the shortest because it utilizes bandwidth more efficiently. The above results show that

COMMA is able to successfully coordinate the migration of multi-tier applications across

two data centers with extremely low degradation cost. COMMA reduces the degradation

cost by 190 times compared to parallel migration.

4.6 Related work

To the best of our knowledge, we are the first group to address the problem of live migration

of multi-tier applications. There is no previous work directly comparable. There is related

work on applying prefetch strategy [NC12] and deduplication techniques [AKSSR11] in

multiple simultaneous migrations.

97

Sequential Migr. Parallel Migr. COMMA

Migration Migration

frontend backend 32/32 16/32 5/10

first first MBps MBps MBps

Cost(MB) 28 17 19 6 6 0.1

Migr.

Time(s) 871 919 821 885 1924 741

Table 4.6 : Migration methods on EC2 experiment.

Nicolae et al. [NC12] proposes a hypervisor-transparent approach for efficient live mi-

gration of I/O intensive workloads. The focus of their work is on optimizing single VM

migrations. It relies on a hybrid active push-prioritized prefetch strategy to speed up mi-

gration and reduce migration time, which makes it highly resilient to rapid changes of disk

state exhibited by I/O intensive workloads.

AI-Kiswany [AKSSR11] employs data deduplication in live migration to reduce the mi-

gration traffic. It presents VMFlockMS, a migration service optimized for cross-datacenter

transfer and instantiation of groups of virtual machine images. VMFlockMS is designed to

deploy a set of virtual appliances by making efficient use of the available cloud resources

to locally access and deduplicate the images and data in a distributed fashion with minimal

requirements imposed on the cloud API to access the VM image repository. Deduplication

is an orthogonal approach to COMMA in that the problem is to reduce migration traffic

rather than minimize performance degradation cost.

98

Chapter 5

Conclusion and Future Work

Cloud computing is a rapidly expanding, multi-billion-dollar business. Amazon’s abil-

ity to challenge IBM for a $600-million federal cloud project signals this new era [clob].

Smaller and more nimble cloud-based services providers will spur competition and enable

agency transformation [clob]. More than 700 IT professionals in six countries across the

globe agreed virtualization technology contributes significantly to data center management

challenges, indicating the impact is undeniable and vast [vira]. Live migration of virtual

machine serves as a powerful management tool for planned maintenance, load balancing,

avoiding single-provider lock-in and enterprise IT consolidation. Surprisingly, two prob-

lems exist in today’s live migration system.

Firstly, as far as we know, none of the existing live migration systems can accurately

predict the migration time or control migration time. It is hard for the system administrators

to schedule migration with reasonable time reservation. This fact led us to the adaptive pac-

ing approach which makes the migration finish time predictable and controllable. Our first

contribution is Pacer – the first system capable of accurately predicting the migration time,

coordinating the migrations of multiple application components to minimize the perfor-

mance degradation, and managing the progress so that the actual migration finishing time

is as close to a desired finish time as possible. Through extensive experiments including

a real-world commercial cloud scenario with Amazon EC2, we show that Pacer is highly

effective. The approach is crucial in many VM migration use cases. Just as importantly,

as Pacer demonstrates, the adaptive pacing approach can be realized successfully in prac-

tice. The details in modeling the migration, estimating the remaining work, controlling the

speed, and adapting to dynamics in Pacer are intricate, but the resulting system is highly

99

effective and has negligible overhead.

Recently we have extended Pacer by providing a new function for prediction migration

time before migration begins. The main addition is to monitor the disk I/O workload and

to measure the available network bandwidth for a short period of time, e.g. 3 minutes, and

to use these observations for migration time prediction. We have found that the prediction

accuracy is as good as the prediction during migration. The new function is helpful for

operators for planning and scheduling cloud management tasks.

Secondly, multi-tier application architectures are widely employed in today’s virtual-

ized cloud computing environments. Although existing solutions are able to migrate a

single VM efficiently, little attention has been devoted to migrating related VMs in multi-

tier applications. Ignoring the relatedness of VMs during migration can lead to serious

application performance degradation if the dependent components of an application are

split between the source and destination sites by a high latency and/or congested network

path. Simply migrating all related VMs in parallel is not enough to avoid such degradation.

To tackle the above problem, we propose an original migration coordination system for

multi-tier applications. The system is based on a scheduling algorithm for coordinating the

migration of VMs that aims to minimize migrations impact on inter-component communi-

cations. We formulate the problem of livemigration of multi-tier applications and introduce

COMMA, a migration coordination system that minimizes the performance degradation of

the application during migration. Using a fully implemented system on KVM, we show

that the system is highly effective in decreasing the performance degradation cost and min-

imizing migration’s impact on inter-component communications.

We intend to make Pacer and COMMA freely available to the community. There are

many possibilities for extending Pacer and COMMA in the future.

5.1 Migration Progress Management with Optimization Techniques

In the Chapter 2, we discuss many optimization techniques for live migration with schedul-

ing, compression, and deduplication. Current designs for Pacer and COMMA are based on

100

the migration time model without optimization. It will be interesting to extend the design

of Pacer and COMMA to support the live migration with optimization techniques. We take

the migration with block reordering sequence for example.

Our previous research about workload-aware storage migration scheduling algo-

rithm [ZNS11] aims at improving storage I/O performance during wide-area migration

by reordering block migration sequence. Rather than copying the storage from beginning

to end, the algorithm deliberately compute a schedule to transfer storage at the appropri-

ate granularity which we call chunk and in the appropriate order to minimize performance

degradation. To integrate Pacer and COMMA in the a migration systemwith block reorder-

ing sequence, we just need to simply adjust the way to computeDIRTY SET SIZE and

AV E DIRTY RATE in the Equation (3.3) of the migration time model in Section 3.

Dirty set estimation and dirty rate estimation algorithms need to be extended to leverage

the new sequence. For example, block1 is the first block which will be migrated without

reordering. It has a high possibility to get dirty during pre-copy stage. With recording, it

may be the last block to be migrated in the pre-copy stage and may be not written in the

pre-copy stage. In such case, this block will not be included in the dirty set. Therefore,

Pacer just need to take the reordering migration block sequence as input and run the algo-

rithms for dirty set and dirty rate estimation to update the amount of dirty blocks and the

time of dirty iteration.

5.2 Migration Progress Management with Migration Planning

In this thesis, we focus on the migration process itself. However, live migration is a cloud

management operation. Before live migration operation is issued, the cloud administra-

tor needs to do planning for migration. For example, when the administrator detects that

some machines have hardware issues and he wants to shutdown these machines for main-

tenance. He need to prepare the a set of new machines as the migration destination. The

placement of VMs on the destination is an interesting question. It had better keep the same

performance metrics as the source hypervisor. Moreover, migration will require a lot of

101

resource, e.g. network bandwidth, disk bandwidth, CPU and memory of hypervisors both

in the source and destination. The migration will affect other applications running in the

source or destination. In order to minimize the impact on the application performance, the

migration must be conducted in a proper time when the application workload is low. In

our system, our time model does not include the time for planning destination machines.

It will be helpful to take the planning, scheduling and migration all together to control the

expected finish time.

5.3 Migration Progress Management with Task Prioritization

Cloud administrators leverage live migration in many management tasks to improve or

maintain the performance of applications running on the source site. However, the migra-

tion will compete for the network resource, and the prioritization of migration and other

management tasks should therefore be explicitly considered. Let’s assume that there is a

key management task provisioning in the cloud. Provisioning requires to quickly prepare

the VM image, reserve the resource and install required software on the guest OS for users.

If provisioning task is more urgent than live migration task, the cloud administrator could

assign a higher priority to provisioning process. Then live migration could use the remain-

ing bandwidth to do the pre-copy and dirty iteration stages. At the end of dirty-iteration

stage, if the system monitors that migration dirty rate is higher than the available migration

bandwidth, it should automatically increase the task priority for migration. A manage-

ment system with task prioritization is very helpful to favor urgent tasks and also guarantee

migration’s convergence.

102

Bibliography

[AFGea09] Michael Armbrust, Armando Fox, Rean Griffith, and et. al. Above the clouds:

A berkeley view of cloud computing. Technical Report UCB/EECS-2009-28,

EECS Department, University of California, Berkeley, Feb 2009.

[AKSSR11] Samer AI-Kiswany, Dinesh Subhraveti, Prasenjit Sarkar, and Matei Ripeanu.

Vmflock: Virtual machine co-migration for the cloud. In HPDC, 2011.

[Amaa] Amazon Web Service. http://aws.amazon.com.

[Amab] Amazon. Aws reference architecture. http://aws.amazon.com/

architecture/.

[Ash12] Warwick Ashford. Hybrid clouds most popular with UK business, survey re-

veals. http://tinyurl.com/868pxzd, February 2012.

[ASR+10] Sherif Akoush, Ripduman Sohan, Andrew Rice, Andrew W.Moore, and Andy

Hopper. Predicting the performance of virtual machine migration. In IEEE

18th annual international symposium on modeling, analysis and simulation of

computer and telecommunication systems. IEEE, 2010.

[BKFS07] Robert Bradford, Evangelos Kotsovinos, Anja Feldmann, and Harald

Schioberg. Live wide-area migration of virtual machines including local per-

sistent state. In ACM VEE, June 2007.

[BKR11] David Breitgand, Gilad Kutiel, and Danny Raz. Cost-aware live migration of

services in the cloud. In USENIX Workshop on Hot Topics in Management of

Internet, Cloud, and Enterprise Networks and Services. USENIX, 2011.

103

[BL98] Amnon Barak and Oren La’adan. The mosix multicomputer operating sys-

tem for high performance cluster computing. Future Generation Computer

Systems, 13(4):361–372, 1998.

[Blo08] ”Amazon Web Services Blog”. Animoto - Scaling Through Viral Growth.

http://aws.typepad.com/aws/2008/04/animoto—scali.html, April 2008.

[Bri11] North Bridge. Future of Cloud Computing Survey. http://tinyurl.com/7f4s3c9,

June 2011.

[CCS10] Fabio Checconi, Tommaso Cucinotta, and Manuel Stein. Real-time issues

in live migration of virtual machines. In Euro-Par 2009–Parallel Processing

Workshops, pages 454–466. Springer, 2010.

[CFH+05] Christopher Clark, Keir Fraser, Steven Hand, Jacob Gorm Hansen, Eric Jul,

Christian Limpach, Ian Pratt, and Andrew Warfield. Live migration of virtual

machines. In NSDI’05: Proceedings of the 2nd conference on Symposium on

Networked Systems Design & Implementation, pages 273–286, Berkeley, CA,

USA, 2005. USENIX Association.

[Cloa] . http://www.forbes.com/sites/reuvencohen/2013/04/16/the-cloud-hits-the-

mainstream-more-than-half-of-u-s-businesses-now-use-cloud-computing/.

[clob] Mid-year review: 10 predictions for cloud computing.

http://gcn.com/Articles/2013/08/21/cloud-predictions.aspx?Page=3.

[Def] Cloud computing. http://www.ibm.com/cloud-computing/us/en/what-is-

cloud-computing.html.

[DGJ+05] Koustuv Dasgupta, Sugata Ghosal, Rohit Jain, Upendra Sharma, and Akshat

Verma. Qosmig: Adaptive rate-controlled migration of bulk data in storage

systems. In Proc. ICDE, 2005.

104

[DO91] Fred Douglis and John Ousterhout. Transparent process migration: Design al-

ternatives and the sprite implementation. Software: Practice and Experience,

21(8):757–785, 1991.

[Gar12] Gartner. Gartner says worldwide cloud services market. http://www.

gartner.com/newsroom/id/2163616/, 2012.

[GMV10] Ajay Gulati, Arif Merchant, and Peter Varman. mclock:handling throughput

variability for hypervisor io scheduling. In OSDI, October 2010.

[Goo] Googleappengine. https://developers.google.com/appengine/.

[Got08] Derek Gottfrid. The New York Times Archives + Amazon Web Services

= TimesMachine. http://open.blogs.nytimes.com/ 2008/05/21/the-new-york-

times-archives-amazon-web- services-timesmachine/, May 2008.

[HFW+13] Keqiang He, Alexis Fisher, Liang Wang, Aaron Gember, Aditya Akella, and

Thomas Ristenpart. Next stop, the cloud: Understanding modern web service

deployment in ec2 and azure. In IMC, 2013.

[HG09] Michael R. Hines and Kartik Gopalan. Post-copy based live virtual machine

migration using adaptive pre-paging and dynamic self-ballooning. In VEE ’09:

Proceedings of the 2009 ACM SIGPLAN/SIGOPS international conference on

Virtual execution environments, 2009.

[HH09] Stuart Hacking and Benoit Hudzia. Improving the live migration process

of large enterprise applications. In VTDC’09: Proceedings of the 3rd Inter-

national Workshop on Virtualization Technologies in Distributed Computing,

2009.

[HNO+09] Takahiro Hirofuchi, Hidemoto Nakada, Hirotaka Ogawa, Satoshi Itoh, and

Satoshi Sekiguchi. A live storage migration mechanism over wan and its per-

formance evaluation. In VIDC’09: Proceedings of the 3rd International Work-

105

shop on Virtualization Technologies in Distributed Computing, Barcelona,

Spain, 2009. ACM.

[HON+09] Takahiro Hirofuchi, Hirotaka Ogawa, Hidemoto Nakada, Satoshi Itoh, and

Satoshi Sekiguchi. A live storage migration mechanism over wan for relo-

catable virtual machine services on clouds. In CCGRID’09: Proceedings of

the 2009 9th IEEE/ACM International Symposium on Cluster Computing and

the Grid, Shanghai, China, 2009. IEEE Computer Society.

[HSS+10] M. Hajjat, X. Sun, Y.W.E. Sung, D. Maltz, S. Rao, K. Sripanidkulchai, and

M. Tawarmalani. Cloudward bound: planning for beneficial migration of en-

terprise applications to the cloud. In ACM SIGCOMM Computer Communica-

tion Review, 2010.

[IBM] IBM.

[ipe] iperf. http://sourceforge.net/projects/iperf/.

[ipt05] iptraf. http://iptraf.seul.org/, 2005.

[JDW+09] Hai Jin, Li Deng, Song Wu, Xuanhua Shi, and Xiaodong Pan. Live virtual

machine migration with adaptive memory compression. In IEEE International

Conference on Cluster Computing, 2009.

[JDWS09] Hai Jin, Li Deng, Song Wu, and Xuanhua Shi. Live virtual machine migration

integrating memory compression with precopy. In IEEE International Confer-

ence on Cluster Computing, 2009.

[JLHB88] Eric Jul, Henry Levy, Norman Hutchinson, and Andrew Black. Fine-grained

mobility in the emerald system. ACM Transactions on Computer Systems

(TOCS), 6(1):109–133, 1988.

[KVM] KVM. Kernel based virtual machine. http://www.linux-kvm.org/

page/Main_Page.

106

[KVM10] KVM. QEMU-KVM code. http://sourceforge.net/projects/kvm/files, January

2010.

[LAW02] Chenyang Lu, Cuillermo A. Alvarez, and John Wilkes. Aqueduct: Online data

migration with performance guarantees. In Proc. of the USENIX Conference

on File and Storage Technologies (FAST), 2002.

[LZW+08] Yingwei Luo, Binbin Zhang, Xiaolin Wang, Zhenlin Wang, Yifeng Sun, and

Haogang Chen. Live and Incremental Whole-System Migration of Virtual

Machines Using Block-Bitmap. In IEEE International Conference on Cluster

Computing, 2008.

[MCGC11] Ali Mashtizadeh, Emre Celebi, Tal Garfinkel, and Min Cai. The design and

evolution of live storage migration in vmware esx. In Proceedings of the an-

nual conference on USENIX Annual Technical Conference. USENIX Associa-

tion, 2011.

[MDP+00] Dejan S Milojicic, Fred Douglis, Yves Paindaveine, Richard Wheeler, and

Songnian Zhou. Process migration. ACM Computing Surveys (CSUR),

32(3):241–299, 2000.

[Mic12] Microsoft. Hyper-V live migration FAQ. http://technet.microsoft.com/en-

us/library/ff715313(v=ws.10).aspx, January 2012.

[Mur11] Alan Murphy. Enabling Long Distance Live Migration with F5 and VMware

vMotion. http://tinyurl.com/7pyntvq, 2011.

[NC12] Bogdan Nicolae and Franck Cappello. Towards efficient live migration of I/O

intensive workloads: A transparent storage transfer proposal. In ACM HPDC,

2012.

[NKG10] Ripal Nathuji, Aman Kansal, and Alireza Ghaffarkhah. Q-clouds:managing

107

performance interference effect for qos-aware clouds. In Proceedings of Eu-

roSys, Paris,France, 2010.

[NLH05] Michael Nelson, Beng-Hong Lim, and Greg Hutchins. Fast transparent mi-

gration for virtual machines. In USENIX’05: Proceedings of the 2005 Usenix

Annual Technical Conference, Berkeley, CA, USA, 2005. USENIX Associa-

tion.

[Pad10] Pradeep Padala. Understanding Live Migration of Virtual Machines.

http://tinyurl.com/24bdaza, June 2010.

[PCP+02] Constantine P.Sapuntzakis, Ramesh Chandra, Ben Pfaff, Jim Chow, Monica

S.Lam, and Mendel Rosenblum. Optimizing the migration of virtual comput-

ers. In OSDI’02: Proceedings of the 5th Symposium on Operating Systems

Design and Implementation, 2002.

[PHS+09] Pradeep Padala, Kai-Yuan Hou, Kang G. Shin, Xiaoyun Zhu, Mustafa Uysal,

Zhikui Wang, Sharad Singhal, and Arif Merchant. Automated control of mul-

tiple virtualized resources. In Proceedings of EuroSys, 2009.

[PM83] Michael L Powell and Barton P Miller. Process migration in DEMOS/MP,

volume 17. ACM, 1983.

[Poe09] Chris Poelker. Why virtualization is the foundation of cloud computing.

http://tinyurl.com/cdtcyqz, 2009.

[PRH+03] S. Parekh, K. Rose, J. L. Hellerstein, S. Lightstone, M. Huras, and V. Chang.

Managing the performance impact of administrative utilities. In 14th

IFIP/IEEE International Workshop on Distributed Systems: Oper ations and

Management, 2003.

[Red09] IBM Redbooks. IBM Powervm Live Partition Mobility IBM International

Technical Support Organization. Vervante, 2009.

108

[RUB] RUBiS. Rice university bidding system. http://rubis.ow2.org.

[SGM90] Carl Staelin and Hector Garcia-Molina. Clustering active disk data to improve

disk performance. Technical Report CS-TR-283-90, Department of Computer

Science, Princeton University, Sep 1990.

[SPE] SPEC. Specweb2005. http://www.spec.org/web2005/.

[Ste10] Colin Steele. Virtual machine migration FAQ: Live migration, P2V and more.

http://tinyurl.com/cxavodk, August 2010.

[svm] VMWare Storage vMotion. http://www.vmware.com/products/storage-

vmotion/overview.html.

[Tec11] Dell TechCenter. Hyper-V R2 Live Migration FAQ.

http://tinyurl.com/c8rayf5, November 2011.

[TLC85] Marvin M Theimer, Keith A Lantz, and David R Cheriton. Preemptable re-

mote execution facilities for the V-system, volume 19. ACM, 1985.

[Tof11] Kevin C. Tofel. Forget public; private clouds: The future is hybrids!

http://tinyurl.com/bsmsj9p, June 2011.

[VBVB09] William Voorsluys, James Broberg, Srikumar Venugopal, and Rajkumar

Buyya. Cost of virtual machine live migration in clouds: A performance eval-

uation. In Cloud Computing, pages 254–265. Springer, 2009.

[vira] Five New Virtualization Challenges Impacting IT Pros and Data Cen-

ter Management. http://www.marketwatch.com/story/five-new-virtualization-

challenges-impacting-it-pros-and-data-center-management-2013-08-22.

[Virb] Virtustream. http://www.virtustream.com/.

109

[VKKS11] Akshat Verma, Gautam Kumar, Ricardo Koller, and Aritra Sen. Cosmig: mod-

eling the impact of reconfiguration in a cloud. In IEEE 19th annual inter-

national symposium on modeling, analysis and simulation of computer and

telecommunication systems. IEEE, 2011.

[Vmwa] Virtualize Your IT Infrastructure. http://www.vmware.com/virtualization/.

[VMwb] VMware ESX. http://www.vmware.com/products/vsphere/esxi-and-

esx/index.html.

[VMw09] VMware Forum. http://tinyurl.com/7gttah2, 2009.

[VMW10] VMWare. VMmark Virtualization Benchmarks.

http://www.vmware.com/products/vmmark/, January 2010.

[VMw11a] VMware Forum. http://tinyurl.com/ccwd6jg, 2011.

[VMw11b] VMware Forum. http://tinyurl.com/cr6tqnj, 2011.

[VMw11c] VMware Forum. http://tinyurl.com/bmlnjqk, 2011.

[VMw11d] VMware Forum. http://tinyurl.com/d4qr2br, 2011.

[VMw12] VMware Forum. http://tinyurl.com/7azb3xt, 2012.

[Wika] Cloud Computing. http://en.wikipedia.org/wiki/Cloud computing.

[Wikb] Data Center. https://en.wikipedia.org/wiki/Data center.

[Wikc] Stateless protocol. http://en.wikipedia.org/wiki/Stateless protocol.

[wikd] wiki. Multi-tier architecture. http://en.wikipedia.org/wiki/

Multitier_architecture.

[Woo11] Timothy Wood. Improving data center resource Management, deployment,

and availability with virtualization. PhD thesis, University of Massachusetts

Amherst, 2011.

110

[WSG+09] Timothy Wood, Prashant Shenoy, Alexandre Gerber, K.K. Ramakrishnan,

and Jacobus Van der Merwe. The Case for Enterprise-Ready Virtual Private

Clouds. In Proc. of HotCloud Workshop, 2009.

[WSKdM11] Timothy Wood, Prashant Shenoy, K.K.Ramakrishnan, and Jacobus Van der

Merwe. Cloudnet: Dynamic pooling of cloud resources by live wan migration

of virtual machines. In ACM VEE, 2011.

[WSVY07] Timothy Wood, Prashant Shenoy, Arun Venkataramani, and Mazin Yousif.

Black-box and gray-box strategies for virtual machine migration. In NSDI,

2007.

[WZ11] Yangyang Wu and Ming Zhao. Performance modeling of virtual machine live

migration. In Proceedings of the 2011 IEEE 4th International Conference on

Cloud Computing. IEEE, 2011.

[Xen08a] Xen Forum. http://tinyurl.com/d5v8j9p, 2008.

[Xen08b] Xen Forum. http://tinyurl.com/c37he9g, 2008.

[XEN09] XEN. XEN Project. http://www.xen.org, January 2009.

[Xen11a] Xen Forum. http://tinyurl.com/d477jza, 2011.

[Xen11b] Xen Forum. http://tinyurl.com/c7tyg94, 2011.

[ZF07] Ming Zhao and Renato J Figueiredo. Experimental study of virtual machine

migration in support of reservation of cluster resources. In Proceedings of the

2nd international workshop on Virtualization technology in distributed com-

puting, page 5. ACM, 2007.

[ZHMM10] Xiang Zhang, Zhigang Huo, Jie Ma, and Dan Meng. Exploiting data dedu-

plication to accelerate live virtual machine migration. In IEEE International

Conference on Cluster Computing, 2010.

111

[ZNS11] Jie Zheng, T.S.Eugene Ng, and Kunwadee Sripanidkulchai. Workload-aware

live storage migration for clouds. In ACM VEE, April 2011.

[ZSS06] Jianyong Zhang, Prasenjit Sarkar, and Anand Sivasubramaniam. Achieving

completion time guarantees in an opportunistic data migration scheme. In

Proc. SIGMETRICSREVIEW, 2006.

Virtual Machine Live Migration in Cloud Computing

Documents

Transcript of Virtual Machine Live Migration in Cloud Computing