MySQL Acceleration Through New Flash Storage API Primitives

36
MYSQL ACCELERATION THROUGH NEW FLASH STORAGE API PRIMITIVES TORBEN MATHIASEN PERCONA LIVE 2012 – NEW YORK Running Native on Flash

description

Torben Mathiasen presented at Percona Live 2012 in New York in October 2012 on non-volative memory software interfaces and MySQL atomics support. His presentation includes benchmarks run by Percona with Atomic I/O and directFS pre-release.

Transcript of MySQL Acceleration Through New Flash Storage API Primitives

Page 1: MySQL Acceleration Through New Flash Storage API Primitives

MYSQL ACCELERATION THROUGH NEW FLASH STORAGE API PRIMITIVES

TORBEN MATHIASEN PERCONA LIVE 2012 – NEW YORK

Running Native on Flash

Page 2: MySQL Acceleration Through New Flash Storage API Primitives

TOPICS – NVM SOFTWARE INTERFACES

1.  What are we building?

2.  Why are we building it?

3.  Examples

4.  MySQL atomics support

5.  Where are we headed?

October 29, 2012 2

Page 3: MySQL Acceleration Through New Flash Storage API Primitives

CONVENTIONAL I/O ACCESS

October 29, 2012 3

Traditional Storage

Proprietary Storage OS

Storage Media

Native Flash Translation Layer

Storage Media

Software Defined Storage

Conventional I/O access

Page 4: MySQL Acceleration Through New Flash Storage API Primitives

DIRECT-ACCESS I/O THROUGH NATIVE INTERFACES

October 29, 2012 4

Traditional Storage

Proprietary Storage OS

Storage Media

Native Flash Translation Layer

Storage Media

Software Defined Storage

Conventional I/O access

Direct access I/O

Page 5: MySQL Acceleration Through New Flash Storage API Primitives

MEMORY-ACCESS THROUGH NATIVE INTERFACES

October 29, 2012 5

Traditional Storage

Proprietary Storage OS

Storage Media

Native Flash Translation Layer

Storage Media

Software Defined Storage

Conventional I/O access

Memory access

Direct access I/O

Page 6: MySQL Acceleration Through New Flash Storage API Primitives

COMPARING I/O AND MEMORY ACCESS SEMANTICS

October 29, 2012 6

I/O I/O semantics examples:

•  Open file descriptor – open(), read(), write(), seek(), close() •  (New) Write multiple data blocks atomically, nvm_vectored_write() •  (New) Open key-value store – nvm_kv_open(), kv_put(), kv_get(), kv_batch_*()

Volatile memory semantics example: •  Allocate virtual memory, e.g. malloc() •  memcpy/pointer dereference writes (or reads) to memory address •  (Improved) Page-faulting transparently loads data from NVM into memory

Non-volatile memory semantics example: •  (New) Allocate and map Auto-Commit Memory™ (ACM) virtual memory pages •  memcpy/pointer dereference writes (or reads) to memory address •  (New) Call checkpoint() to create application-consistent ACM page snapshots •  (New) After system failure, remap ACM snapshot pages to recover memory state •  (New) De-stage completed ACM pages to NVM namespace •  (New) Remap and access ACM pages from NVM namespace at any time

Page 7: MySQL Acceleration Through New Flash Storage API Primitives

OPEN NVM DEVELOPMENT INTERFACES

October 29, 2012 7

I/O Semantics (explicit persistence) Memory Semantics (implicit persistence)

API Libraries: Open Source

Open Interface

Key-Value Store

Key-Value Cache

User-Defined Libraries

High-speed Logging

Transactional mmap

Native Filesystem: Open Source

Open Interface

directFS (POSIX filesystem namespace implemented with native Primitives)

Primitives: Open Interface

Traditional Block I/O Primitives

Atomic I/O Transaction Primitives

Sparse-address

Primitives

Clone/ Merge

Primitives

Cache Eviction

Primitives

Memory Transaction Primitives

Native Flash Translation Layer

Page 8: MySQL Acceleration Through New Flash Storage API Primitives

DIRECTFS – NATIVE FILESYSTEM NAMESPACE

October 29, 2012 8

Application

Linux VFS (virtual file system) abstraction layer

Native Flash Translation Layer

ext3 btrfs xfs directFS

Kernel block layer

kernel-space

user-space

Page 9: MySQL Acceleration Through New Flash Storage API Primitives

DIRECTFS – ELIMINATING DUPLICATE LOGIC

October 29, 2012 9

Native Flash Translation Layer block allocation, mapping, recycling

ACID updates, logging/journaling, crash-recovery

directFS file metadata mgmt

Kernel block layer

kernel-space

user-space

Ext3 file metadata mgmt,

block allocation, mapping, recycling, ACID updates, logging/journaling, crash-recovery

Primitive Interfaces

Application

Linux VFS (virtual file system) abstraction layer

Page 10: MySQL Acceleration Through New Flash Storage API Primitives

TOPICS – NVM SOFTWARE INTERFACES

1.  What are we building?

2.  Why are we building it?

3.  Examples

4.  MySQL atomics support

5.  Where are we headed?

October 29, 2012 10

Page 11: MySQL Acceleration Through New Flash Storage API Primitives

NVM SOFTWARE INTERFACES

Industry-first, direct API access to non-volatile memory’s unique characteristics.

The ioMemory SDK was introduced to help developers:

▸  Write less code to create high-performing apps

▸  Tap into performance not available with conventional I/O access to SSDs

▸  Reduce operating costs by decreasing RAM while increasing NVM

October 29, 2012 11

Page 12: MySQL Acceleration Through New Flash Storage API Primitives

DIRECTFS – BENEFITS IN ELIMINATING DUPLICATE LOGIC

October 29, 2012 12

File System Lines of Code

directFS 6879

ReiserFS 19996

ext4 25837

btrfs 51925

XFS 63230

Page 13: MySQL Acceleration Through New Flash Storage API Primitives

ATOMIC I/O PRIMITIVES: SAMPLE USES AND BENEFITS

October 29, 2012 13

Databases Transactional Atomicity: Replace various workarounds implemented in database code to provide write atomicity (double-buffered writes, etc.)

Filesystems File Update Atomicity: Replace various workarounds implemented in filesystem code to provide file/directory update atomicity (journaling, etc.)

▸  99% performance of raw writes Smarter media now natively understands atomic updates, with no additional metadata overhead.

▸  2x longer flash media life Atomic Writes increase the life of flash media up to 2x due to reduction in write-ahead-logging and double-write buffering.

▸  50% less code in key modules Atomic operations dramatically reduce application logic, such as journaling, built as work-arounds.

Page 14: MySQL Acceleration Through New Flash Storage API Primitives

DIRECTFS WITH ATOMIC WRITES - ACHIEVING RAW DEVICE PERFORMANCE

October 29, 2012 14

Ban

dwid

th (M

iB/s

)

I/O Size

Ban

dwid

th (M

iB/s

)

I/O Size Block directFS

1 Thread 8 Threads

Filesystem convenience AND atomic writes with the performance of simple writes to raw device

Page 15: MySQL Acceleration Through New Flash Storage API Primitives

KEY-VALUE STORE API LIBRARY: SAMPLE USES AND BENEFITS

October 29, 2012 15

NoSQL Applications Reduce overprovisioning due to lack of coordination between two-layers of garbage collection (application-layer and flash-layer). Some top NoSQL applications recommend over-provisioning by 3x due to this.

Reduce application I/Os through batched put and get operations.

Increase performance by eliminating packing and unpacking blocks, defragmentation, and duplicate metadata at app layer.

▸  95% performance of raw device Smarter media now natively understands a key-value I/O interface with lock-free updates, crash recovery, and no additional metadata overhead.

▸  Up to 3x capacity increase Dramatically reduces over-provisioning with coordinated garbage collection and automated key expiry.

▸  3x throughput on same SSD Early benchmarks comparing against memcached with BerkeleyDB persistence show up to 3x improvement.

Page 16: MySQL Acceleration Through New Flash Storage API Primitives

KEY-VALUE STORE API LIBRARY BENCHMARKS: (VS MEMCACHEDB)

October 29, 2012 16

Page 17: MySQL Acceleration Through New Flash Storage API Primitives

TOPICS – NVM SOFTWARE INTERFACES

1.  What are we building?

2.  Why are we building it?

3.  Examples

4.  MySQL atomics support

5.  Where are we headed?

October 29, 2012 17

Page 18: MySQL Acceleration Through New Flash Storage API Primitives

EXAMPLE: ATOMIC I/O PRIMITIVES

October 29, 2012 18

iov[0] iov[1] iov[2] iov[3] iov[4]

LBA 7 + range

LBA 24 + range

LBA 42 + range

LBA 68 + range

LBA 24 + range

LBA 7 + range

iov[0] iov[1]

TRANSACTION ENVELOPES

Page 19: MySQL Acceleration Through New Flash Storage API Primitives

ATOMIC I/O PRIMITIVES BENCHMARKS (ATOMIC I/O VS NON-ATOMIC I/O)

1U HP blade server with 16 GB RAM, 8 CPU cores - Intel(R) Xeon(R) CPU X5472 @ 3.00GHz with single 1.2 TB ioDrive2 mono

Significantly more functionality with negligible performance cost

October 29, 2012 19

Page 20: MySQL Acceleration Through New Flash Storage API Primitives

EXAMPLE: KEY-VALUE STORE API LIBRARY

October 29, 2012 20

key value 1-128B 64b-1MB

kv_put() kv_get() or kv_batch_get()

key expiration timer marks KV pair for VSL garbage collection

key hashed into sparse address

space to simplify collision

management

value returned through single I/O operation, regardless of

value size

key value key value key value

pool A pool B

pool C

kv_get_current(), kv_next()

Iterate through each KV pair in

a pool of related keys

Atomic transaction envelope

key

value

Page 21: MySQL Acceleration Through New Flash Storage API Primitives

KEY-VALUE STORE API LIBRARY BENCHMARKS (NATIVE KV GET/PUT VS. RAW READS/WRITES)

!"

#!!!!"

$!!!!"

%!!!!"

&!!!!"

'!!!!!"

'#!!!!"

'$!!!!"

!" #!" $!" %!" &!" '!!" '#!" '$!"

!"#$%$&

#'()*+$&

,*-./)&0)(12(-*34)&5&!"#&

('#)"

$*)"

'%*)"

%$*)"

!"

#!!!!"

$!!!!"

%!!!!"

&!!!!"

'!!!!!"

'#!!!!"

!" #!" $!" %!" &!" '!!" '#!" '$!"

!"#$%

$&

#'()*+$&

,*-./)&!)(01(-*23)&4&!"#&

('#)"

$*)"

'%*)"

%$*)"

!"

#!!!!"

$!!!!"

%!!!!"

&!!!!"

'!!!!!"

'#!!!!"

'$!!!!"

!" #!" $!" %!" &!"

!"#$%

&

'()*+,%&

"*)-.)/+0*&)*1+23*&4.&5.6)53*&

('#)"*+,"-./"

'*)"012"3.45"

!"

#!!!!"

$!!!!"

%!!!!"

&!!!!"

'!!!!!"

'#!!!!"

!" '!" #!" (!" $!" )!" %!" *!"

!"#$%

&

'()*+,%&

"*)-.)/+01*&)*2+34*&5.&6.7)64*&

)'#+",-."/01"

',2345"67418"

Sample Performance - GET

Performance relative to ioDrive

Sample Performance - PUT

Performance relative to ioDrive

1U HP blade server with 16 GB RAM, 8 CPU cores - Intel(R) Xeon(R) CPU X5472 @ 3.00GHz with single 1.2 TB ioDrive2 mono

Significantly more functionality with negligible performance cost

October 29, 2012 21

Page 22: MySQL Acceleration Through New Flash Storage API Primitives

RANGE OF MEMORY-ACCESS SEMANTICS

October 29, 2012 22

Extended Memory Volatile

Transparently extends DRAM onto flash, extending application virtual memory

Checkpointed Memory

Volatile with non-volatile checkpoints

Region of application virtual memory with ability to preserve snapshots to flash namespace

Auto-Commit Memory™ Non-volatile

Region of application memory automatically persisted to non-volatile memory and recoverable post-system failure

Page 23: MySQL Acceleration Through New Flash Storage API Primitives

OS SWAP VS. EXTENDED MEMORY

▸  Originally designed as a last resort to prevent OOM (out-of-memory) failures ▸  Never tuned for high-performance demand-paging ▸  Never tuned for multi-threaded apps ▸  Poor performance, ex. < 30 MB/sec throughput

▸  No application code changes required ▸  Designed to migrate hot pages to DRAM and cold pages to ioMemory ▸  Tuned to run natively on flash (leverages native characteristics) ▸  Tuned for multi-threaded apps ▸  10-15x throughput improvement over standard OS Swap

October 29, 2012 23

Non-Volatile Storage (Disks, SSDs, etc.) System Memory

OS SWAP Mechanism

ioMemory System Memory Extended Memory

Mechanism

Page 24: MySQL Acceleration Through New Flash Storage API Primitives

CHECKPOINTED MEMORY PERSISTENCE PATH

October 29, 2012 24

1.  Application designates virtual address space range to be checkpointed a.  Causes creation of independently-addressable linked clone of the

checkpointed address range (no data moves or copies) b.  Checkpoint appears as addressable file in the directFS

native filesystem namespace.

2.  Application can continue manipulating contents of designated virtual address range without affecting contents of persisted checkpoint file.

3.  Application can load or manipulate persisted checkpoint file at a later time.

System Memory ioMemory Checkpointed

Memory

Page 25: MySQL Acceleration Through New Flash Storage API Primitives

d d d d

EXAMPLE: MEMORY TRANSACTION PRIMITIVES

October 29, 2012 25

1. Allocate pages app virtual address space

CPU loads/stores data to ACMTM

pages

Process allocate ACMTM pages and maps to virtual address space

2. memcpy() Process writes and reads to pages through standard memory-access semantics

3. Checkpoint pages

d d d d

d d d d

System

failure event

5. Restore state app virtual address space

Re-started process maps restored snapshot pages into virtual address space

Application-consistent snapshot pages are created or updated

working pages

snapshot pages

d d d d

d d d d

d d d d

System failure automatically preserves snapshot pages

d d d d 4. Destage pages

d d d d Completed

pages

d d d d Pages persisted

to flash namespace

Page 26: MySQL Acceleration Through New Flash Storage API Primitives

TOPICS – NVM SOFTWARE INTERFACES

1.  What are we building?

2.  Why are we building it?

3.  Examples

4.  MySQL atomics support

5.  Where are we headed?

October 29, 2012 26

Page 27: MySQL Acceleration Through New Flash Storage API Primitives

PERCONA SERVER DEFAULT DOUBLE-WRITE

27

•  Writes data twice •  Extra data flush •  Required for ACID

compliance Pages copied into memory

OS Page

A Page

B Page

C

Page A

Page B

Page C

Two Writes

Page C

Page B

Page A

Double-write buffer

Actual tablespace

DB Server Page

A Page

B Page

C

October 29, 2012

1

2

3

ioMemory

Page 28: MySQL Acceleration Through New Flash Storage API Primitives

PERCONA SERVER WITH ATOMIC-WRITE

October 29, 2012 28

Pages copied into memory

ioMemory

DB Server

OS Page

A Page

B Page

C

Page C

Page B

Page A

Atomic-Write

Page A

Page B

Page C

•  Less Writes •  Better Performance •  ACID Compliance

1

2

Page 29: MySQL Acceleration Through New Flash Storage API Primitives

CASE STUDY: PERCONA SERVER (MYSQL)

Percona has added atomics support to Percona Server 5.5

▸ Removes the need of the MySQL double write buffer

▸ Ensures data integrity in case of system crashes

▸ Writes 50% less, great for flash

▸ Removes complexity from the software stack

▸ Improves both transaction bandwidth and latency

▸ Works though the DirectFS filesystem or on RAW devices

October 29, 2012 29

Page 30: MySQL Acceleration Through New Flash Storage API Primitives

PERCONA SERVER 5.5 TPC-C BENCHMARKS

Benchmarks run by Percona with Atomic I/O and directFS pre-release

October 29, 2012 30

Percona Server on DirectFS with Atomic I/O

Non-ACID (Double-write

disabled)

ACID (Double-write)

50% more transactions with same atomic durability

ACID (Atomic writes)

Percona Server on ext4

Page 31: MySQL Acceleration Through New Flash Storage API Primitives

PERCONA SERVER ATOMICS IMPLEMENTATION

Initialize Fusion-io atomics!os_aio_atomic_writes_init(void)!

!if (srv_atomic_writes_device != NULL &&!

! (vsl_iom_create_by_name(srv_atomic_writes_device,!! ! ! ! &os_aio_atomic_iom_handle,!! ! ! ! &error) != FIO_SUCCESS ||!

! vsl_open(os_aio_atomic_iom_handle, &error) != FIO_SUCCESS ||!! vsl_iom_get_raw_handle(os_aio_atomic_iom_handle,!

! ! ! ! &os_aio_atomic_writes_fd,!! ! ! ! &error) != FIO_SUCCESS)) {!! !ut_print_timestamp(stderr);!

! !fprintf(stderr, " Unable to open atomic writes device %s: %s\n",!! ! !srv_atomic_writes_device, error.message);!

! !return(FALSE);!!}!

!if (srv_atomic_writes_device != NULL) {!

! !fprintf(stderr, " Atomic I/O has been initialized on %s\n",!! ! !srv_atomic_writes_device);!!} !!

!return(TRUE);!}!

October 29, 2012 31

Page 32: MySQL Acceleration Through New Flash Storage API Primitives

PERCONA SERVER ATOMICS IMPLEMENTATION CONT.

Issue atomics write using vector os_aio_do_atomic_write(!

!memset(iov, 0, IO_VECTOR_LIMIT * sizeof(struct vsl_iovec));!

!do {!! !iovcnt = 0;!! !iov_total_len = 0;!

! !do {!! ! !iov_len = ut_min(n - written,!

! ! ! ! ! FTL_ATOMIC_MAX_VECTOR_SIZE);!! ! !iov[iovcnt].iov_base = (uint64_t) ((byte *) buf +!! ! ! ! ! ! ! written);!

! ! !iov[iovcnt].iov_len = iov_len;!! ! !iov[iovcnt].sector = sector_start + !

! ! ! !os_aio_atomic_sector(0, written);!! ! !iov[iovcnt].iov_flag = VSL_IOV_WRITE;!

! ! !iovcnt++;!! ! !written += iov_len;!

! ! !iov_total_len += iov_len;!

! !} while (iovcnt < IO_VECTOR_LIMIT && written < n);!

! !rc = vsl_vectored_write(file, iov, iovcnt, O_ATOMIC);!

October 29, 2012 32

Page 33: MySQL Acceleration Through New Flash Storage API Primitives

TOPICS – NVM SOFTWARE INTERFACES

1.  What are we building?

2.  Why are we building it?

3.  Examples

4.  MySQL atomics support

5.  Where are we headed?

October 29, 2012 33

Page 34: MySQL Acceleration Through New Flash Storage API Primitives

OPEN INTERFACES AND OPEN SOURCE

▸  Primitives: Open Interface

▸  directFS: Open Source

▸  API Libraries: Open Source, Open Interface

▸  INCITS SCSI (T10) active standards proposals: •  SBC-4 SPC-5 Atomic-Write

http://www.t10.org/cgi-bin/ac.pl?t=d&f=11-229r6.pdf

•  SBC-4 SPC-5 Scattered writes, optionally atomic http://www.t10.org/cgi-bin/ac.pl?t=d&f=12-086r3.pdf

•  SBC-4 SPC-5 Gathered reads, optionally atomic http://www.t10.org/cgi-bin/ac.pl?t=d&f=12-087r3.pdf

▸  SNIA NVM-Programming TWG active member

34 October 29, 2012

Page 35: MySQL Acceleration Through New Flash Storage API Primitives

Tradi&onal  SSDs  

ioMemory™  with  Conven&onal  I/O  

ioMemory™  as  Transparent  Cache  

ioMemory™  with    direct  access  I/O  

ioMemory™  with    memory  seman&cs    

App

lica&

on  

Applica&on  

App

lica&

on   Applica&on   Applica&on   Applica&on   Applica&on  

User-­‐defined  I/O  API  Libraries  

User-­‐defined  Memory  API  Libraries  

OS  Block  I/O   OS  Block  I/O   OS  Block  I/O   Direct-­‐access  I/O  API  Libraries  

Memory  Seman&cs    API  Libraries  

Host  

Host  

File  System   File  System   File  System   directFS  –    NVM  filesystem  

directFS  –  NVM  filesystem  

     Block  Layer   Block  Layer   Block  Layer   I/O  Primi&ves   Memory  Primi&ves  

SAS/SATA  VSL™    

expanded  flash  transla&on  layer  

directCache™  VSL™   VSL™  

Network  VSL™  

Remote  

RAID  Controller  

Read/Write   Read/Write   Read/Write   Read/Write   CPU  Load/Store  Flash  Transla&on  

Layer  

Read/Write  

Na&ve  Access  |  

35 October 29, 2012

FLASH MEMORY EVOLUTION

Page 36: MySQL Acceleration Through New Flash Storage API Primitives

T H A N K Y O U