NVMFS: A New File System Designed Specifically to Take Advantage of Nonvolatile Memory

29
NVMFS: A New File System Designed Specifically to Take Advantage of Nonvolatile Memory Dhananjoy Das, Sr. Systems Architect SanDisk Corp. Flash Memory Summit 2015 Santa Clara, CA 1

Transcript of NVMFS: A New File System Designed Specifically to Take Advantage of Nonvolatile Memory

NVMFS: A New File System Designed Specifically to Take Advantage of

Nonvolatile Memory

Dhananjoy  Das,  Sr.  Systems  Architect  SanDisk  Corp.  

Flash Memory Summit 2015 Santa Clara, CA

1

Forward-Looking Statements During our meeting today we will make forward-looking statements. Any statement that refers to expectations, projections or other characterizations of future events or circumstances is a forward-looking statement, including those relating to market growth, products and their capabilities, performance and compatibility, cost savings and other benefits to customers. Information in this presentation may also include or be based upon information from third parties, which reflects their expectations and projections as of the date of issuance. We undertake no obligation to update these forward-looking statements, which speak only as of the date hereof.

Agenda: Applications are KING!

§  Storage landscape (Flash / NVM) §  Non Volatile Memory File System

§  [Use Case ] MySQL Atomics §  [Use Case ] MySQL NVM-Compressed §  [Use Case ] Extended memory, DB Acceleration

Flash Memory Summit 2015 Santa Clara, CA

123

Non-Volatile Memory (NVM) Current: NAND Flash

§ Capacity: 100s of GB to 100s of TB per device § Trends: Higher capacity, lower cost/GB,

lower write cycles, SLC->MLC->3BPC § IOPS: 100K to millions, GB/s of bandwidth

Non-Volatile Memory § DDR/PCI-e attached NVDIMM / Capacitance backed power-

safe buffers + FLASH § orders of magnitude performance improvement

Future: Non-Volatile Memory technologies (Phase Change Memory, MRAM, STT-RAM, etc.)

PCIe attached

SAS and SATA attach SSDs

Fabric attached

NVDIMMs

Flash Memory Summit 2015 Santa Clara, CA

Why Do Applications Need Optimization for Flash?

Flash Memory Summit 2015 Santa Clara, CA

Read/Write Performance

Largely symmetrical Heavily asymmetrical

Wear out/ Background ops

Largely unlimited / Rare

Limited write cycles / Regularly

IOPS/

Latency

100 to 1,000 / 10’s milli sec

100Ks to Millions / 10’s-100’s micro sec

Addressing

Sequence, Sector

Direct, addressable

HDD Flash

Managed writes = greater device lifetime (wear leveling, endurance) Improved system efficiency (TCO and TCA)

Becoming “Flash Aware”: SanDisk NVMFS

Non-Volatile Memory File System – Optimized for Flash and Persistent Memory

Native Flash Translation Layer block allocation, mapping, recycling

ACID updates, logging/journaling, crash-recovery

NVMFS file metadata mgmt

Kernel block layer

kernel-space

user-space

Standard Linux File Systems file metadata mgmt,

block allocation, mapping, recycling, ACID updates, logging/journaling,

crash-recovery

NVM Primitives

MySQL™ Database

Linux VFS (virtual file system) abstraction layer

New File system

Eliminating Duplicate Logic and Leveraging New Primitives for Optimal Flash Performance and Efficiency Flash Memory Summit 2015 Santa Clara, CA

§  NVMFS is POSIX compliant

§  Leverages the functionality of underlying Flash Translation Layer (FTL)

§  Namespace Management

§  Enables Direct flash/memory access and crash recovery

§  NVM primitives are exposed through standard system file interface

Flash Beyond Disk: Adapting the Software Stack

§  Flash-Aware Acceleration §  Changes to MySQL are “aware”

of flash and automatically leverage optimized API

§  NVMFS 1.1 New! §  NVM optimized filesystem §  Standard file namespace, all

existing customer management tools work

§  Raw performance of NVM §  Flash-aware interfaces direct to

applications Flash-Aw

are Acceleration

NVM Primitives Standard Block I/O

Flash-Aware Applications Traditional Databases

Enhanced APIs Double-Buffer Writes

(writes) (writes)

NVM Optimized Filesystem NVMFS 1.1 Linux File Systems

Traditional FTL

SanDisk ioMemory Other SSDs

Flash Memory Summit 2015 Santa Clara, CA

Legacy MySQL Challenges Double-Write and Compression Penalties

Every MySQL write translates to 2 writes to storage device

80% performance penalty with legacy MySQL

compression enabled

Page C Page

B

Page A

Buffer

DRAM Buffer

SSD (or HDD) Database

Database Server

Page C

Page B

Page A

Page C

Page B

Page A

Page C

Page B

Page A

Application initiates updates to pages A, B, and C.

1

MySQL copies updated pages to memory buffer.

2

MySQL writes to double-write buffer on the media.

3

Once step 3 is acknowledged, MySQL writes the updates to the actual tablespace.

4

2 Writes

80%

redu

ctio

n in

TP

S

21

Tran

sact

ion

Rat

e

Results and performance may vary according to configurations and systems, including drive capacity, system architecture and applications. Flash Memory Summit 2015

Santa Clara, CA

Solving the Double-Write Problem SanDisk NVMFS with Atomic Write

▸ MySQL with Atomic Write

Fusion ioMemory Database

Page C

Page B

Page A

DRAM Buffer

Page C

Page B

Page A

Application initiates updates to pages A, B, and C.

1

MySQL copies updated pages to memory buffer.

2

MySQL writes to actual tablespace, bypassing the double-write buffer step due to inherent atomicity guaranteed by the intelligent Fusion ioMemory device.

3

Database Server

Page C Page

B

Page A

§  Enhanced Life Expectancy of Fusion ioMemory Devices:

–  Reduce Writes to flash by half at similar throughput

§  Improved performance consistency

§  Reduced latency, increased transaction/sec

§  Higher performance –  Especially workloads with datasets that are bigger than

DRAM

Perfect Fit for ACID-compliant MySQL

1

One, Single Atomic Write!

The performance results discussed herein are based on internal testing and use of Fusion ioMemory products. Results and performance may vary according to configurations and systems, including drive capacity, system architecture and applications.

Flash Memory Summit 2015 Santa Clara, CA

SanDisk NVMFS Improves Latency Consistency Lower Latency with Greater Stability

0  20  40  60  80  100  120  140  160  180  200  

1   78  

155  

232  

309  

386  

463  

540  

617  

694  

771  

848  

925  

1002  

1079  

1156  

1233  

1310  

1387  

1464  

1541  

1618  

1695  

1772  

1849  

1926  

2003  

2080  

2157  

2234  

2311  

2388  

2465  

2542  

2619  

2696  

2773  

2850  

2927  

3004  

3081  

3158  

3235  

3312  

3389  

3466  

3543  

Latency  (m

sec)  

Time  (Seconds)  

NVMFS  atomics   XFS  double-­‐write  

NVMFS Atomic Write Significantly Reduces Latency while Increasing Performance Consistency

Sysbench  -­‐  MariaDB  10.0.15,  4000  OLTP  TXN  injecLon/second,  99%  latency,  220GB  data  -­‐  10GB  buffer  pool  

NVMFS latency range

XFS latency range

Flash Memory Summit 2015 Santa Clara, CA

Improving MySQL Compression SanDisk Contribution to MySQL Community

§  Benefits of compression without severe performance penalty •  Within 10% of uncompressed

§  Up to 50% improvement in capacity utilization1

§  Enhanced life expectancy of flash devices2

•  Up to 4x fewer writes to storage with Compression and Atomic Write

Compression with almost no performance penalty 1For workloads that compress well. Improvement will vary 2At Similar Throughput (assuming same load)

Transaction Rate

2

The performance results discussed herein are based on internal testing and use of Fusion ioMemory products. Results and performance may vary according to configurations and systems, including drive capacity, system architecture and applications.

Flash Memory Summit 2015 Santa Clara, CA

Flash Beyond Disk: NVM Compression

High level approach

§  Application operates on sparse address space which is always the size of uncompressed.

§  Compressed data block is written in place at same virtual address as the un-compressed. Leaving a hole, empty space in the remainder of space allocated.

§  FTL garbage collection naturally coalesces the addresses in physical space, allowing for re-provisioning of physical space.

Database

File System File 1 File 2 File 3

NVM Compression

write fallocate(PUNCH_HOLE)

FTL (Mapping and Garbage Collection

Sparse Address Space Addr (0) -> (248 – 1)

Physical Address Space (0 – Device capacity)

Read/Write/PTrim

FTL indirect map

Flash Memory Summit 2015 Santa Clara, CA

2

Compression without Performance Penalty Improved Flash Utilization

0

5000

10000

15000

20000

25000

30000

Tim

e 80

160

240

320

400

480

560

640

720

800

880

960

1040

11

20

1200

12

80

1360

14

40

1520

16

00

1680

17

60

1840

19

20

2000

20

80

2160

22

40

2320

24

00

2480

25

60

2640

27

20

2800

28

80

2960

30

40

3120

32

00

3280

33

60

3440

35

20

# of

Tra

nsac

tions

Time (Seconds)

TPC-C-Like benchmark, 1000 warehouses - 75GB Buffer pool, MariaDB 10.0.15

5x improvement

Legacy MySQL with Compression ON

MySQL /NVMFS with Compression ON

Legacy MySQL with Compression OFF

Flash Memory Summit 2015 Santa Clara, CA

2

NVM Compression Eliminates Legacy MySQL’s Compression Penalty

Power/Cooling

Flash Memory Summit 2015 Santa Clara, CA

2

Database Acceleration using Flash Extended Memory

NVMFS (POSIX compliant file system) PMM (Persistent Memory Manager)

NVMFS Help to Increase Performance of Databases

Flash Devices Memory Persistent Memory

Memory Device Technologies:

Memory Interfaces

Oracle MySQL™

(No application changes) (Application optimized)

“Flash -Aware” “Transparent”

Flash Extended Memory

Flash Memory Summit 2015 Santa Clara, CA

3

DDR / PCI-e

MySQL Configuration

16

Fusion ioMemory™ PCIe-based flash

Standard MySQL

database

“Baseline”

Flash Extended Memory Enabled

Fusion ioMemory™ and persistent memory with

NVMFS

Standard MySQL

database

“Transparent”

Optimized MySQL

database

“Flash Aware”

Fusion ioMemory™ and persistent memory with

NVMFS

Flash Memory Summit 2015 Santa Clara, CA

3NVM attach points: DDR / PCI-e

Performance Results

Latency (lower is better)

Throughput (higher is better)

Source: Based on internal testing by SanDisk Flash Memory Summit 2015 Santa Clara, CA

3PCI-e attached NVM

MySQL :: Transparent Acceleration (latency) Performance Overview:: Comparison Between Baseline and NVDIMM

Flash Memory Summit 2015 Santa Clara, CA

3 Using DDR NVDIMM

5614%

22857% 19512% 17204% 14035%11189%

48485%39024% 34783%

28571%

(25%)Insert/Delete/

(50%)Insert/Lookup5

70%Insert530%Lookup5

80%Insert520%Lookup5

100%Insert5

MySQL55.6.245AcceleraDon5Mixed%Workload;%Threads;16,%dataset(thread):%100K%%

(Higher(is(Be+er)(

Baseline% NVDIMM;enabled%

Flash Memory Summit 2015 Santa Clara, CA

3

Up to 2x higher transactions per second using DDR attached NVDIMM

MySQL :: Transparent Acceleration Performance Overview:: Comparison Between Baseline and NVDIMM

Atomic Writes Summary §  Transac?on  throughput  improvements  of  roughly  2.4x  over  conven?onal  flash  SSDs    

§  Half  as  many  writes  per  transac?on  means  poten?al  for  as  

much  as  2x    flash  endurance  for  write  intensive  workloads:  more  cost  effecLve  flash  storage  

§  Standardized:  Approved  SNIA  standard,SBC-­‐4  SPC-­‐5  Atomic-­‐Write  hTp://www.t10.org/cgi-­‐bin/ac.pl?t=d&f=11-­‐229r6.pdf    

§  Uses  unique  flash-­‐aware  op?miza?ons  from  SanDisk  

Flash Memory Summit 2015 Santa Clara, CA

§  Available  commercially  and  fully  supported  from  SanDisk  (NVMFS  1.1)  and  Oracle  MySQL  5.7.4,  Percona  Server  5.5,  5.6  and  MariaDB  10  

1

The performance results discussed herein are based on internal testing and use of Fusion ioMemory products. Results and performance may vary according to configurations and systems, including drive capacity, system architecture and applications.

NVM Compression Summary §  Performance within 10% of uncompressed (and

sometimes greater) for Linkbench and TPC-C like workloads. 5x acceleration of TPC-C like as compared to Row Compression

§  Storage savings of ~2x (data dependent) with as much or more compressibility as MySQL row compression

§  Upto 4x better flash endurance when combined with Atomic Writes

§  Addition power/cooling benefits and scalability benefits §  Available commercially and fully supported from SanDisk (NVMFS 1.0)

and MariaDB 10, coming soon from Oracle and Percona distributions Flash Memory Summit 2015 Santa Clara, CA

2

The performance results discussed herein are based on internal testing and use of Fusion ioMemory products. Results and performance may vary according to configurations and systems, including drive capacity, system architecture and applications.

Extended Memory: Advantages & Benefits

§  Uses “Flash-as-Memory” byte-addressable architecture and interface §  Seamless deployment – add ioMemory and NVMFS/PMM software to Linux

§  Increase performance and capacity in flexible configurations

Flash Memory Summit 2015 Santa Clara, CA

3 Workload: Insert Heavy

Transparent (no software mods)

Flash-Aware (modified software)

Throughput 1.8x - 2x 4x Latency 2x 4x

The performance results discussed herein are based on internal testing and use of Fusion ioMemory products. Results and performance may vary according to configurations and systems, including drive capacity, system architecture and applications.

Who is NVMFS for? §  NVMFS will optimize customer database flash storage by improving

§  Transactional performance such as, latency and throughput §  Enhanced lifespan of flash devices §  Practical capacity

§  Enterprise environment §  OLTP databases running in a Linux OS environment §  Insert heavy workloads needing to persist large amounts of data §  Latency sensitive OLTP workloads §  Concerned about flash endurance

§  Hyperscale environment §  To improve CPU utilization per node §  Clusters of MySQL nodes by being able to store more data

Flash Memory Summit 2015 Santa Clara, CA

Questions?

Flash Memory Summit 2015 Santa Clara, CA

24

Thank  You!  

Booth#207  @BigDataFlash                                            #bigdataflash  

ITblog.sandisk.com  hTp://bigdataflash.sandisk.com  

 

Paper: https://www.usenix.org/system/files/conference/inflow14/inflow14-das.pdf http://www.sandisk.com/assets/white-papers/MySQL_with_directFS_and_Atomic_Writes.pdf Blog: http://itblog.sandisk.com/in-a-battle-of-hardware-software-innovation-comes-out-on-top/

Backup

Flash Memory Summit 2015 Santa Clara, CA

25

Performance  -­‐  Datapath

Micro-­‐benchmark  §  Parallel,  direct  I/O  on  a  single  

file  on  a  very  fast  device    ApplicaLons  §  Databases  §  Virtual  Machines  

0

0.2

0.4

0.6

0.8

1

1.2

0 1 2 3 4 5 6 7 8 9 Pe

rfor

man

ce

(nor

mal

ized

) Threads

block nvmfs xfs

0  

1  

2  

3  

4  

5  

6  

7  

8  

nvmfs   xfs  Elap

sed  Time  

(normalize

d)  

Performance  -­‐  TRIM  Handling

Micro-­‐benchmark  §  Trim  aher  write  §  16  KiB  Direct  Write  +  4  KiB  TRIM    

ApplicaLons  §  MySQL  Page-­‐compression  

Performance  -­‐  File  Logging

block device nvmfs xfs IOPS 1 0.97 0.36 Physical Bytes Written 1 1 1.38

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

Perf

orm

ance

(n

orm

aliz

ed)

Micro-­‐benchmark  §  Append  data  at  end  of  file  §  4  KiB  write(2)  +  fdatasync(2)    ApplicaLons  §  Databases  §  HFT  §  Log  Structured  Systems  

§  Up to 3.2x reduction in Writes to flash resulting in a longer device lifetime §  Utilize flash hardware longer 80.77%

43.03%

24.37%

42.4%

18.75%7.7%

100%%Update% 50%%Update,%50%%Lookup%

30%%Update,%70%%Lookup%

Flash%Writes%(GB)%(Lower'is'Be+er)'

Baseline%Writes% OpDmized%Writes%

Endurance benefits for Cassandra Performance Overview:: Comparison Between Baseline and NVDIMM

Flash Memory Summit 2015 Santa Clara, CA