Virtualized Databases?
-
Upload
liz-van-dijk-ameel -
Category
Technology
-
view
909 -
download
2
Transcript of Virtualized Databases?
VIRTUALIZED DATABASES?
Approach: mechanics of virtualization"certain big players" will not be mentionedTalk is general, mostly about hardware issues which are the same for any platform
ME
• Liz van Dijk (@lizztheblizz)
•Working at Sizing Servers Research Lab
• First-timer at FOSDEM!
•Not really a developer, not really a sysadmin, not really a DBA
• I just like knowing how stuff works.
• It’s far too broad a term
• It’s a pretty old concept. (about half a century, actually)
• Its main purposes are abstraction and security
•Making use of the correct CPU execution mode
•Managing Virtual Memory
SO... VIRTUALIZATION, HUH.
History!Broad term, 100 different meaningsFull-system virtualization on the mainframes in the 60'sIBM m44, trap and emulate
Recently:* x86 did not support full virtualization, trap and emulate did not work* multicore hardware, single threaded software. Inefficient datacenters.
Full Virtualization is not the only virtualizationcombination of different methods
Who uses RAID?Who uses Virtual Memory?
2 big issues that all solutions try to work aroundFocus on these, the next steps should be more or less logical
Problem 1: matter of privilegeskernels assume full control over hardwarehow does the hardware deal with this?
layer-based security system (onion)2-bit code in memory address, cpu verifies the code, does or doesn't do the instruction
x86: 4 layerscode 00: supervisor modecode 11: user mode
• It’s far too broad a term
• It’s a pretty old concept. (about half a century, actually)
• Its main purposes are abstraction and security
•Making use of the correct CPU execution mode
•Managing Virtual Memory
SO... VIRTUALIZATION, HUH.
History!Broad term, 100 different meaningsFull-system virtualization on the mainframes in the 60'sIBM m44, trap and emulate
Recently:* x86 did not support full virtualization, trap and emulate did not work* multicore hardware, single threaded software. Inefficient datacenters.
Full Virtualization is not the only virtualizationcombination of different methods
Who uses RAID?Who uses Virtual Memory?
2 big issues that all solutions try to work aroundFocus on these, the next steps should be more or less logical
Problem 1: matter of privilegeskernels assume full control over hardwarehow does the hardware deal with this?
layer-based security system (onion)2-bit code in memory address, cpu verifies the code, does or doesn't do the instruction
x86: 4 layerscode 00: supervisor modecode 11: user mode
• It’s far too broad a term
• It’s a pretty old concept. (about half a century, actually)
• Its main purposes are abstraction and security
•Making use of the correct CPU execution mode
•Managing Virtual Memory
SO... VIRTUALIZATION, HUH.
History!Broad term, 100 different meaningsFull-system virtualization on the mainframes in the 60'sIBM m44, trap and emulate
Recently:* x86 did not support full virtualization, trap and emulate did not work* multicore hardware, single threaded software. Inefficient datacenters.
Full Virtualization is not the only virtualizationcombination of different methods
Who uses RAID?Who uses Virtual Memory?
2 big issues that all solutions try to work aroundFocus on these, the next steps should be more or less logical
Problem 1: matter of privilegeskernels assume full control over hardwarehow does the hardware deal with this?
layer-based security system (onion)2-bit code in memory address, cpu verifies the code, does or doesn't do the instruction
x86: 4 layerscode 00: supervisor modecode 11: user mode
X86 VIRTUALIZATION
• Binary Translation, aka “faking it”
• Applies ring deprivileging, and translates “bad calls” on the fly
• “Full” Hardware Virtualization
• Introduced Ring -1: Hypervisor mode
•Only intervenes when absolutely necessary
BT, old awesome, employed by QEMU and wine.Less relevant now for full-virtualizationring deprivileging, look it up!
Intel/AMD caught up, implemented VT-x and AMD-Vring -1: hypervisorLet OS'es do whatever they want, but use trap and emulateextra roundtrip, extra overhead
CPU has more tasks to perform, but they also take longernewer cpu is better
X86 VIRTUALIZATION
• Binary Translation, aka “faking it”
• Applies ring deprivileging, and translates “bad calls” on the fly
• “Full” Hardware Virtualization
• Introduced Ring -1: Hypervisor mode
•Only intervenes when absolutely necessary
BT, old awesome, employed by QEMU and wine.Less relevant now for full-virtualizationring deprivileging, look it up!
Intel/AMD caught up, implemented VT-x and AMD-Vring -1: hypervisorLet OS'es do whatever they want, but use trap and emulateextra roundtrip, extra overhead
CPU has more tasks to perform, but they also take longernewer cpu is better
X86 VIRTUALIZATION
• Binary Translation, aka “faking it”
• Applies ring deprivileging, and translates “bad calls” on the fly
• “Full” Hardware Virtualization
• Introduced Ring -1: Hypervisor mode
•Only intervenes when absolutely necessary
BT, old awesome, employed by QEMU and wine.Less relevant now for full-virtualizationring deprivileging, look it up!
Intel/AMD caught up, implemented VT-x and AMD-Vring -1: hypervisorLet OS'es do whatever they want, but use trap and emulateextra roundtrip, extra overhead
CPU has more tasks to perform, but they also take longernewer cpu is better
VIRTUAL MEMORY
Mem
0xA
0xB
0xC
0xD
0xE
0xF
0xG
0xH CPU
Managed by software
Actual Hardware
Problem 2: Virtual memory4kb physical segments with physical addressessoftware: pages
very easy to manage in OS, all software gets a continuous blockpage table keeps track of physical to virtual mapping
TLB cache keeps track of these mappings, very fastneeds to flush every context switch.
VIRTUAL MEMORY
Mem
0xA
0xB
0xC
0xD
0xE
0xF
0xG
0xH CPU
Managed by software
Actual Hardware
OS
1
2
3
4
5
Virtual Memory
6
7
8
9
10
11
12
Problem 2: Virtual memory4kb physical segments with physical addressessoftware: pages
very easy to manage in OS, all software gets a continuous blockpage table keeps track of physical to virtual mapping
TLB cache keeps track of these mappings, very fastneeds to flush every context switch.
VIRTUAL MEMORY
Mem
0xA
0xB
0xC
0xD
0xE
0xF
0xG
0xH CPU
Managed by software
Actual Hardware
OS
1
2
3
4
5
Virtual Memory
6
7
8
9
10
11
12
1 | 0xD
2 | 0xC
3 | 0xF
Page Table
6 | 0xG
5 | 0xH
4 | 0xA
8 | 0xE
7 | 0xB
etc.
Problem 2: Virtual memory4kb physical segments with physical addressessoftware: pages
very easy to manage in OS, all software gets a continuous blockpage table keeps track of physical to virtual mapping
TLB cache keeps track of these mappings, very fastneeds to flush every context switch.
VIRTUAL MEMORY
Mem
0xA
0xB
0xC
0xD
0xE
0xF
0xG
0xH CPU
TLB
1 | 0xD
5 | 0xH
2 | 0xC
etc.
Managed by software
Actual Hardware
OS
1
2
3
4
5
Virtual Memory
6
7
8
9
10
11
12
1 | 0xD
2 | 0xC
3 | 0xF
Page Table
6 | 0xG
5 | 0xH
4 | 0xA
8 | 0xE
7 | 0xB
etc.
Problem 2: Virtual memory4kb physical segments with physical addressessoftware: pages
very easy to manage in OS, all software gets a continuous blockpage table keeps track of physical to virtual mapping
TLB cache keeps track of these mappings, very fastneeds to flush every context switch.
SPT VS HAP
Mem
0xA
0xB
0xC
0xD
0xE
0xF
0xG
0xH CPU
VM A
VM B
1 | 0xD
5 | 0xH
2 | 0xC
N
“Read-only”Page Table
12 | 0xB
10 | 0xE
9 | 0xA
etc.
1
2
3
4
5
1
2
3
4
12
Managed by VM OS
Managed by hypervisor
Actual Hardware
2 methodslocked page table, access generates trap, VMM handles memory accessmuch slower memory access
EPT/RVI/HAPMake TLB much bigger, make it smarter, VM-awaremuch more complex to fill up, though. slow initial memory accessfilled TLB is very fast, tho.
SPT VS HAP
Mem
0xA
0xB
0xC
0xD
0xE
0xF
0xG
0xH CPU
VM A
VM B
1 | 0xD
5 | 0xH
2 | 0xC
N
“Read-only”Page Table
12 | 0xB
10 | 0xE
9 | 0xA
etc.
1
2
3
4
5
1
2
3
4
12
Managed by VM OS
Managed by hypervisor
Actual Hardware
B
A
1 | 0xG
5 | 0xD
2 | 0xF
12 | 0xE
10 | 0xB
9 | 0xC
“Shadow” Page Table
2 methodslocked page table, access generates trap, VMM handles memory accessmuch slower memory access
EPT/RVI/HAPMake TLB much bigger, make it smarter, VM-awaremuch more complex to fill up, though. slow initial memory accessfilled TLB is very fast, tho.
SPT VS HAP
Mem
0xA
0xB
0xC
0xD
0xE
0xF
0xG
0xH CPU
TLB
A1 | 0xD
A5 | 0xH
A2 | 0xC
etc.
B12 | 0xB
B10 | 0xE
B9 | 0xA
VM A
VM B
1 | 0xD
5 | 0xH
2 | 0xC
N
“Read-only”Page Table
12 | 0xB
10 | 0xE
9 | 0xA
etc.
1
2
3
4
5
1
2
3
4
12
Managed by VM OS
Managed by hypervisor
Actual Hardware
2 methodslocked page table, access generates trap, VMM handles memory accessmuch slower memory access
EPT/RVI/HAPMake TLB much bigger, make it smarter, VM-awaremuch more complex to fill up, though. slow initial memory accessfilled TLB is very fast, tho.
WHAT DOES THIS TEACH US?
• All “kernel” activity is a lot more costly:• Interrupts• System Calls (I/O)•Memory page management
so, 3 actions are slower in virtualizationInterrupts - hardware asking for attentionSystem Calls - software asking for kernel attentionPage Management - memory access
IN THE WILD...
• From best to worst case scenario...
• Bare-metal (Xen, KVM, ESX, Hyper-V)
• Host-based (VirtualBox, VMware Workstation, etc.)
• Cloud-based (Amazon, Terremark, etc.)
BARE-METAL OPTIONS
• Know your my.cnf inside out
• Use hardware-assisted paging + Large Pages! (InnoDB: large-pages)
•Make use of paravirtualized HW options
• Take care of all your caching levels
• Use DirectIO (innodb_flush_method=O_DIRECT)
smalls mistakes in a native environment get bigger in virtual onememory allocations are expensiveoptimize your my.cnf!!!tools.percona.com good starting pointconnection-specific buffers (join-buffer, sort-buffer, etc)sweet spot = test!!
SWAPPING = EVILswappiness
Large Pages
DirectIO
BARE-METAL OPTIONS
• Know your my.cnf inside out
• Use hardware-assisted paging + Large Pages! (InnoDB: large-pages)
•Make use of paravirtualized HW options
• Take care of all your caching levels
• Use DirectIO (innodb_flush_method=O_DIRECT)
smalls mistakes in a native environment get bigger in virtual onememory allocations are expensiveoptimize your my.cnf!!!tools.percona.com good starting pointconnection-specific buffers (join-buffer, sort-buffer, etc)sweet spot = test!!
SWAPPING = EVILswappiness
Large Pages
DirectIO
BARE-METAL OPTIONS
• Know your my.cnf inside out
• Use hardware-assisted paging + Large Pages! (InnoDB: large-pages)
•Make use of paravirtualized HW options
• Take care of all your caching levels
• Use DirectIO (innodb_flush_method=O_DIRECT)
smalls mistakes in a native environment get bigger in virtual onememory allocations are expensiveoptimize your my.cnf!!!tools.percona.com good starting pointconnection-specific buffers (join-buffer, sort-buffer, etc)sweet spot = test!!
SWAPPING = EVILswappiness
Large Pages
DirectIO
HARDWARE CHOICES
• Choosing the right CPU’s
• Intel 5500/7500 and later types (Nehalem) / All AMD quadcore Opterons (HW-assisted/MMU virtualization)
• Choosing the right NIC’s (VMDQ)
• Choosing the right storage system (iSCSI vs FC SAN)
CPU's listed here support both HW-assist and HAP
virtual machine device queueing
HOST-BASED
• All of the above, if possible :)
• IO becomes the bigger issue on standard client hardware
• Focus on moving database IO away from the same disk you run the host- and guest-OS on.
• Consider installing an SSD :)
Keep in mind all of the previous thingsIO is a bigger issue2 OS'es + DB running on the same disk always a problemseparate disk, maybe iSCSI lun? buy an SSD!
CLOUD-BASED
•No control whatsoever over host-system :(
• Sometimes unreliable IO
• Change strategy! Design for easy sharding and replication!
• Caching caching caching!
• Consider RDS to reduce operational overhead?
Can't escape the hurtunreliable disk IOCACHINGsharding/replication to spread write/read loadvery write-heavy may be more trouble than it's worthasynchronous writes? not very durableUse RDS to cut back operational cost