Hypervisor-Assisted Application Checkpointing for High Availability
description
Transcript of Hypervisor-Assisted Application Checkpointing for High Availability
Hypervisor-Assisted Application Checkpointing for High AvailabilityMin Lee
Joint work with A. S. Krishnakumar, P. Krishnan, Navjot Singh, Shalini Yajnik
© 2009 Avaya Inc. All rights reserved. 2
Introduction
Virtualization technology– Gets adopted widely– Proves its usefulness– Most applications run well
• Natively run
Some important applications don’t run well– Certain operations cannot run natively– Instead they use hypercalls– Our target: Application-checkpointing
© 2009 Avaya Inc. All rights reserved. 3
Xen Virtual Machine Monitor
Xen Hypervisor
ModifiedGuest OS
ModifiedGuest OS
ModifiedGuest OS
… …Virtual
machines
Virtual hardware (vCpu, vDisk, vNic, vMemory etc.)
Physical hardware (Cpu, Disk, Nic, Memory etc.)
ApplicationsApplications Applications
(Taken/adapted from ‘Xen and co.’ slides)
© 2009 Avaya Inc. All rights reserved. 4
High Availability Approaches
Categories– Application-transparent
• No changes to application or guest• Xen-specific: Remus, Kemari
– Application-assisted• Application implements the checkpointing logic• Flexible and light-weight
We are targeting– Application-assisted under virtualization
• Xen-specific• Applicable to general hypervisors
© 2009 Avaya Inc. All rights reserved. 5
Hypervisor-Assisted Application Checkpointing
Application checkpointing– Provides transactional properties to the traditional heap
• Make high available heap
– Processes survive failures– Has performance issues in Xen
Our technique improves application-checkpointing performance in Xen
© 2009 Avaya Inc. All rights reserved. 6
High Availability
List_add()
List_del()
Magical mirror
changes
changes
List_add()Crash
TakeoverList_add()
© 2009 Avaya Inc. All rights reserved. 7
Transaction APIs
List of dirty-pages– Written pages
Mprotect() system call– Write-protect– SIGSEGV signal
Tstart();List_add();Tend();
int declare(addr, size);void undeclare(Tid);void Tstart(Tid);void Tend(Tid, dirty_pages);
List_add();
Tstart();List_add();List_del();List_add();List_del();Tend();
List_add();List_del();List_add();List_del();
Examples:
APIs:
© 2009 Avaya Inc. All rights reserved. 8
PT – Existing Approach
Get dirty pages123456789
101112
5 List_add();handler() {
mprotect(unprotect);add_to_dirty_pages();
}
5
List_add();7
7
Tstart();
Tend();…
Declare() {}
Undeclare() {}
123456789
101112
Process’ view(virtual pages)
© 2009 Avaya Inc. All rights reserved. 9
PT Call-Flow
Pure User-level
User
OS
Hypervisor
Mprotect()Mprotect()
Page fault
Signal
For every dirty page
TLB flush TLB flush
© 2009 Avaya Inc. All rights reserved. 10
Approaches
PT-based Emulation-based Scan-based
Pure user space PT(Exisiting)
Hypervisor-assisted
© 2009 Avaya Inc. All rights reserved. 11
Approaches
PT-based Emulation-based Scan-based
Pure user space PT(Exisiting) Emulation
Hypervisor-assisted PTxen Emulxen Scanxen
Our approaches
© 2009 Avaya Inc. All rights reserved. 12
Our Approaches
© 2009 Avaya Inc. All rights reserved. 13
Emulation
Under the condition– Most transactions are small
123456789
101112
List_add();handler() {
emulate();log_to_write_buffer();
}
(Addr1,100)
List_add();
(Addr2,200)
Tstart() {}
Tend();…
Declare();
Undeclare();
123456789
101112
Process’ view(virtual pages)
© 2009 Avaya Inc. All rights reserved. 14
Hypervisor-Assisted:User-to-hypervisor call
Overhead through OS unnecessary– Directly talk to Xen
Move checkpointing to Xen level– Add new interrupt vector
• 0x80: system call• 0x82: hypercall from guest OS• 0x84: hypercall from user (Newly added)
Xen-based approaches without any changes to guest OS.
© 2009 Avaya Inc. All rights reserved. 15
Hypervisor-Assisted:User-to-hypervisor call
User-to-Hypervisor Call
© 2009 Avaya Inc. All rights reserved. 16
PTxen
Implement PT in Xen123456789
101112
5 List_add();
page_fault() {mprotect(unprotect);add_to_dirty_pages();
}
5
List_add();7
7
Tstart() {}
Tend();…
Declare();
Undeclare() {}
123456789
101112
Process1, (1-12)
----- Xen -----
Process’ view(virtual pages)
© 2009 Avaya Inc. All rights reserved. 17
Emulxen
Emulation in Xen
List_add();
List_add();
Tstart() {}
Tend();…
Declare();
Undeclare();
123456789
101112
Process1, (1-12)
page_fault() {emulate();log_to_write_buffer();
}
(Addr1,100)(Addr2,200)
----- Xen -----
123456789
101112
Process’ view(virtual pages)
© 2009 Avaya Inc. All rights reserved. 18
Scanxen Idea
– Scan page table rather than trapping writes– Hardware marks dirty bit
List_add();
5
List_add();
7
Tstart() {}
Tend();…
Declare();
Undeclare();
123456789
101112
Process1, (1-12)
----- Xen -----
= Dirty-bit in page table
scan_page_table() {collect_dirty_bit(); add_to_dirty_pages();
}
Process’ view(virtual pages)
© 2009 Avaya Inc. All rights reserved. 19
Microbenchmark
10000 transactions10MB heap size
© 2009 Avaya Inc. All rights reserved. 20
Microbenchmark
Transactional heap size– For simplicity, whole heap is protected
Transaction– Write per pages (wpp)
• # of writes per pages– Page per transaction (ppt)
• # of unique pages written– # of writes = wpp * ppt
Scanxen– Impacted by only heap size– Not wpp, ppt, or transaction size
© 2009 Avaya Inc. All rights reserved. 21
PT vs PTxen
PTxen shows 10x speedup PT, PTxen get impacted by ppt
1 2 3 4 5 6 7 8ppt
0
2
4
6
8
10
12
14
Tim
e in
sec
PT(wpp = 4, 8, 16 overlapped)
PTxen(wpp = 4, 8, 16 overlapped)
© 2009 Avaya Inc. All rights reserved. 22
Emulation vs emulxen
16 32 48 64 80 96 112 128Transaction size
0
5
10
15
20
25
30
35
40
45
emul wpp 4emul wpp 8emul wpp 16emulxen wpp 4emulxen wpp 8emulxen wpp 16
Tim
e in
sec
Emul-based gets impacted by transaction size Emulxen shows 4x speedup
emul
emulxen
ppt (wpp=16) : 1 2 3 4 5 6 7 8ppt (wpp=8) : 2 4 6 8 10 12 14 16ppt (wpp=4) : 4 8 12 16 20 24 28 32
© 2009 Avaya Inc. All rights reserved. 23
PT Call-Flow
Pure User-level
User
OS
Hypervisor
Mprotect()Mprotect()
Page fault
Signal
User
OS
Hypervisor Page fault
declare()
For every dirty page
TLB flush TLB flush TLB flush
Xen-assisted
© 2009 Avaya Inc. All rights reserved. 24
Evaluation
Source from the book “Data Structures and Algorithm Analysis in C (Second Edition),” by
Mark Allen Weiss
© 2009 Avaya Inc. All rights reserved. 25
Data StructuresOPS_PER_T=1 writes pages
avg min max avg min max
aa (AA-trees) insert 21.9836 5 63 4.9481 1 7
delete 20.4053 2 63 6.0642 1 9
avl (AVL trees) insert 30.5609 6 39 5.1021 1 9
bin (Binomial queues) insert 27.9985 25 64 2.0735 1 10
dsl (Deterministic skip list) insert 10.4176 7 23 3.1421 1 5
hashquad (Quadratic probing hash) insert 11.3983 2 47023 1.0146 1 68
hashsepchain (Separate chaining hash) insert 4 4 4 1.9696 1 3
leftheap (Leftist heap) insert 23.5673 5 31 3.0665 1 6
delete 34.0132 0 59 9.2518 0 15heap (binary heaps) insert 2.8693 2 14 2.4009 1 5
delete 12.5523 2 15 2.7349 1 5list (Linked list) insert 4 4 4 1.0029 1 2
delete 1 1 1 1 1 1queue (Queues) insert 3 3 3 1.8984 1 2
delete 2 2 2 1 1 1
rb (Red black tree) insert 13.7011 10 28 4.6102 1 9
splay (Splay trees) insert 20.0851 4 5262 4.7745 1 34delete 7.7604 3 15001 3.0258 1 40
tree (Binary search tree) insert 720.7852 4 1436 5.4576 1 10delete 1.7139 0 3 1.7139 0 3
© 2009 Avaya Inc. All rights reserved. 26
Evaluation Results 1
mprotect emul emulxen mprotxen0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4queue insertqueue deletelist insertlist delete
Tim
e in
sec
mprotect emul emulxen mprotxen0
0.050.1
0.150.2
0.250.3
0.350.4
0.450.5
hashquad inserthashsepchain insert
Tim
e in
sec
mprotect emul emulxen mprotxen0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
dsl insert
Tim
e in
sec
mprotect emul emulxen mprotxen0
0.10.20.30.40.50.60.70.80.9
1
bin insert
Tim
e in
sec
PTXenPT
PTXenPT
PTXenPT
PTXenPT
© 2009 Avaya Inc. All rights reserved. 27
Evaluation Results 2
mprotect emul emulxen mprotxen0
0.10.20.30.40.50.60.70.80.9
1
splay insertsplay delete
Tim
e in
sec
mprotect emul emulxen mprotxen0
0.2
0.4
0.6
0.8
1
1.2
1.4aa insertaa delete
Tim
e in
sec
PTXenPT
mprotect emul emulxen mprotxen0
5
10
15
20
25
tree inserttree delete
Tim
e in
sec
mprotect emul emulxen mprotxen0
0.20.40.60.8
11.21.41.61.8
2leftheap insertleftheap delete
Tim
e in
sec
PTXenPT
PTXenPTPTXenPT
© 2009 Avaya Inc. All rights reserved. 28
Evaluation Results 3
mprotect emul emulxen mprotxen0
0.1
0.2
0.3
0.4
0.5
0.6
heap insertheap delete
Tim
e in
sec
mprotect emul emulxen mprotxen0
0.2
0.4
0.6
0.8
1
1.2
rb insertavl insert
Tim
e in
sec
Scanxen shows almost constant 2.5sec across all
PTXenPT PTXenPT
© 2009 Avaya Inc. All rights reserved. 29
Evaluation Summary
Emulxen has up to 4x speedup compared to emulation PTxen has up to 13x speedup compared to PT
queu
e-ins
ert
queu
e-dele
te
list-in
sert
list-d
elete
hash
quad
-inse
rt
hash
sepc
hain-
delet
e
dsl-in
sert
bin-in
sert
splay
-inse
rt
splay
-delet
e
aa-in
sert
aa-de
lete
tree-i
nsert
tree-d
elete
lefthe
ap-in
sert
lefthe
ap-de
lete
heap
-inse
rt
heap
-delet
e
rb-ins
ert
avl-in
sert
0
2
4
6
8
10
12
14
16speedup emulxen
speedup mprotxen
Spee
dup
(1=1
00%
)
PTXen
© 2009 Avaya Inc. All rights reserved. 30
Transaction Aggregation
OPT=1– A single operation (e.g. an insert or a delete)
OPT=5– Multiple operations merged into one transaction– # of writes increases linearly– # of unique pages touched remains same in most cases
It should benefit PT-based approaches– Because of their heavy dependence on PPT– Details in the paper
© 2009 Avaya Inc. All rights reserved. 31
Conclusion
Family of application checkpointing techniques introduced
Emulation-based techniques– Useful for small transactions [fewer # of writes]
Hypervisor-Assisted Application Checkpointing– 4x~13x than userspace implementation
© 2009 Avaya Inc. All rights reserved. 32
Thank you!
© 2009 Avaya Inc. All rights reserved. 33
Extra Slides
© 2009 Avaya Inc. All rights reserved. 34
Emulation vs PT
1 2 3 4 5 6 7 8 9 10 11 12write per page
0
5
10
15
20
25
30
35ppt 4 emulppt 4 mprotectppt 8 emulppt 8 mprotect
Tim
e in
sec
Emul-based is good for small transaction– Roughly wpp=5 and wpp=1.3 is breakeven point
1 2 3 4 5wpp
0
0.5
1
1.5
2
2.5
3
3.5
ppt 4 emulxenppt 4 mprotxenppt 8 emulxenppt 8 mprotxen
Tim
e in
sec
Note scale difference
© 2009 Avaya Inc. All rights reserved. 35
Scanxen vs PT
1 2 3 4 5 6 7 8Pages per transaction
0
2
4
6
8
10
12
14
Tim
e in
sec
1 2 3 4 5 6 7 8Pages per transaction
0
0.2
0.4
0.6
0.8
1
1.2
1.4
Tim
e in
sec
For small buffer and large ppt, scanxen might be better– Not the case in our experiments
Note scale difference
1MB
2MB
3MB
4MB
5MB
PT
PTxen
40KB80KB
120KBScanxen heapsize
Scanxen heapsize
© 2009 Avaya Inc. All rights reserved. 36
Scanxen vs emulation
Scanxen might be better than emulation– For big transactions
Scanxen
emul
emulxen
16 32 48 64 80 96 112 128Transaction size
0
5
10
15
20
25
30
35
40
45
scanxen wpp 4emul wpp 4emulxen wpp 4
Tim
e in
sec
© 2009 Avaya Inc. All rights reserved. 37
queu
e-ins
ert
list-in
sert
hash
quad
-inse
rt
dsl-in
sert
splay
-inse
rt
aa-in
sert
tree-i
nsert
lefthe
ap-in
sert
heap
-inse
rt
rb-ins
ert0
0.05
0.1
0.15
0.2
0.25No-HAmprotxen
Tim
e in
sec
© 2009 Avaya Inc. All rights reserved. 38
inse
rtde
lete
inse
rtde
lete
inse
rtde
lete
inse
rtde
lete
inse
rtde
lete
inse
rtde
lete
queue list queue list queue listNoTLBFlush TLBFlush AreaFlush
0
0.01
0.02
0.03
0.04
0.05
0.06
PTxen
inse
rtde
lete
inse
rtde
lete
inse
rtde
lete
inse
rtde
lete
inse
rtde
lete
inse
rtde
lete
queue list queue list queue listNoTLBFlush TLBFlush AreaFlush
0
0.5
1
1.5
2
2.5
3
scanxen
© 2009 Avaya Inc. All rights reserved. 39
Operations per transaction– OPT=5 , Merging transaction
• No impact to emulation-based ones• Some slowdown for scanxen
– Merging transactions• Total # of pages written goes down effectively• PT and PTxen becomes much better than emul/emulxen• Still 13x improvement between PT and PTxen
© 2009 Avaya Inc. All rights reserved. 40
Evaluation
mprotect emul emulxen mprotxen0
0.2
0.4
0.6
0.8
1
1.2
OPT=5, 2000 Transactions
rb insertavl insert
Tim
e in
sec
mprotect emul emulxen mprotxen0
0.2
0.4
0.6
0.8
1
1.2
OPT=1 , 10000 Transactionsrb insertavl insert
Tim
e in
sec
© 2009 Avaya Inc. All rights reserved. 41
Bandwidth : Amount
queu
e-ins
ert
list-in
sert
hash
quad
-inse
rt
dsl-in
sert
splay
-inse
rt
aa-in
sert
tree-i
nsert
lefthe
ap-in
sert
heap
-inse
rt
rb-ins
ert0
50000100000150000200000250000300000350000400000
mprotect-based
Am
ount
of s
ent i
n K
B
queu
e-ins
ert
list-in
sert
hash
quad
-inse
rt
dsl-in
sert
splay
-inse
rt
aa-in
sert
tree-i
nsert
lefthe
ap-in
sert
heap
-inse
rt
rb-ins
ert0
1000
2000
3000
4000
5000
6000
tree-insert; 56311.34375
emul-based
Am
ount
of s
ent i
n K
B
Note that tree-insert is 56311.34375 which is out of scale.
Emul-based mostly less than 2MB– No ‘diff’ process for emul-based
© 2009 Avaya Inc. All rights reserved. 42
Bandwidth : Time
queu
e-ins
ert
list-in
sert
hash
quad
-inse
rt
dsl-in
sert
splay
-inse
rt
aa-in
sert
tree-i
nsert
lefthe
ap-in
sert
heap
-inse
rt
rb-ins
ert0
0.010.020.030.040.050.060.070.080.09
mprotxenmprotectscanxen
Tim
e in
sec
-0.005
0
0.005
0.01
0.015
0.02
emulxenemul
Tim
e in
sec
Emul-based mostly less than 5ms
© 2009 Avaya Inc. All rights reserved. 43
Bandwidth : Percentage
queu
e-ins
ert
list-in
sert
hash
quad
-inse
rt
dsl-in
sert
splay
-inse
rt
aa-in
sert
tree-i
nsert
lefthe
ap-in
sert
heap
-inse
rt
rb-ins
ert0
10203040506070
mprotxen
Perc
enta
ge
-2
0
2
4
6
8
10
12 emulxenemulmprotectscanxen
Perc
enta
ge
Relatively small fraction– Except PTxen --- due to its minimum runtime
© 2009 Avaya Inc. All rights reserved. 44
Microbenchmark
scanxen
PT
PTxen
© 2009 Avaya Inc. All rights reserved. 45
emulxen
emul
scanxen
© 2009 Avaya Inc. All rights reserved. 46
16 32 48 64 80 96 112 128Transaction size (Tsize)
0
10
20
30
40
50
60mprotect wpp 4mprotxen wpp 4scanxen wpp 4emul wpp 4emulxen wpp 4
Tim
e in
sec
emulxenPTxen
PT
emul
scanxen
© 2009 Avaya Inc. All rights reserved. 47
Microbenchmark
writes
Transactional heap Dirty pages in Transactional heap
Tstart() of PT
Tend() of PT Three separate mprotect() calls
writesTstart() of PTxen
Tend() of PTxen Single PTxen() call
© 2009 Avaya Inc. All rights reserved. 48
Main process Diff process
diffdirty page
Backup process
Network