Running NoSQL Natively on Flash Fusion-io...

44
Copyright © 2013 Fusion-io, Inc. All rights reserved. Running NoSQL Natively on Flash Fusion-io SDK Torben Mathiasen & Salvatore Buccoliero

Transcript of Running NoSQL Natively on Flash Fusion-io...

Page 1: Running NoSQL Natively on Flash Fusion-io SDKnosqlroadshow.com/dl/NoSQL-Cph-2013/Presentations/new/Running… · ioMemory vs Disk June 18, 2013 Fusion-io Confidential 4 = 4,000 x

Copyright © 2013 Fusion-io, Inc. All rights reserved.

Running NoSQL Natively on Flash Fusion-io SDK

Torben Mathiasen & Salvatore Buccoliero

Page 2: Running NoSQL Natively on Flash Fusion-io SDKnosqlroadshow.com/dl/NoSQL-Cph-2013/Presentations/new/Running… · ioMemory vs Disk June 18, 2013 Fusion-io Confidential 4 = 4,000 x

The Future of High Performance Storage?

June 18, 2013 2

Everyone seems to agree.

Page 3: Running NoSQL Natively on Flash Fusion-io SDKnosqlroadshow.com/dl/NoSQL-Cph-2013/Presentations/new/Running… · ioMemory vs Disk June 18, 2013 Fusion-io Confidential 4 = 4,000 x

ioMemory ▸ A New Memory tier called ioMemory

• Leverages the best advantages of DRAM and rotating drives

▸ High Speed like DRAM

▸ Persistence and Large capacity of Spinning Hard Drives

▸ PCIe based NAND Flash storage ▸ Micro-second level Disk Access Latency - 15µs

▸ Very high data throughput - 1,5GB/s

▸ Very high IOPS – 400.000 random write/s

▸ Scalable – stay ahead of data / performance demands

▸ Advanced wear-leveling algorithm

▸ N+1 Chip level redundancy (think RAID protection on card)

▸ 100% data integrity protection in case of power loss

▸ Endurance is PBW – TB’s written daily for more than 8 years!

Manufactured by Fusion-io - OEM’ed by

Page 4: Running NoSQL Natively on Flash Fusion-io SDKnosqlroadshow.com/dl/NoSQL-Cph-2013/Presentations/new/Running… · ioMemory vs Disk June 18, 2013 Fusion-io Confidential 4 = 4,000 x

ioMemory vs Disk

June 18, 2013 Fusion-io Confidential 4

= 4,000 x

800,000 IOPs 150-200 IOPs

Page 5: Running NoSQL Natively on Flash Fusion-io SDKnosqlroadshow.com/dl/NoSQL-Cph-2013/Presentations/new/Running… · ioMemory vs Disk June 18, 2013 Fusion-io Confidential 4 = 4,000 x

Analytics Search

ORACLE Text

Messaging

MQ

Workstation

Databases

INFORMIX

Virtualization

KVM

HPC

GPFS

HDInsight

Big Data

Security/Loggi

ng

Backup

Development Web

LAMP

Caching

Where to use ioMemory

HDInsight

Page 6: Running NoSQL Natively on Flash Fusion-io SDKnosqlroadshow.com/dl/NoSQL-Cph-2013/Presentations/new/Running… · ioMemory vs Disk June 18, 2013 Fusion-io Confidential 4 = 4,000 x

The compute performance problem

▸ “Compute power continues to outpace performance delivered by Storage.”

▸ “Problem is not getting better, its getting worse.”

Processor:

•Multi Core

•Higher

Bandwidth

Memory:

•Larger Footprint

•Higher

Bandwidth

Storage:

• Minor

Throughput

improvements

• Currently

Solved with

more Disks

TIME

PE

RF

OR

MA

NC

E

Legacy solution to the data supply problem:

Each option requires significant increase

in CAPEX and OPEX, and does not

fully address the problem.

$$

$$

$ • Add more disk

• Add more memory

• Add more servers

• Optimize the application

Page 7: Running NoSQL Natively on Flash Fusion-io SDKnosqlroadshow.com/dl/NoSQL-Cph-2013/Presentations/new/Running… · ioMemory vs Disk June 18, 2013 Fusion-io Confidential 4 = 4,000 x

Networked storage data supply chain from application to flash

June 18, 2013 7

▸ 9 Intermediary components required

▸All adding access delay, cost, complexity, and lowering reliability

(especially the super capacitors)

▸Requests must do a round trip touching everything TWICE…

Application

Server

Processor

Network

Switch

Storage

Appliance

Processor

Disk RAID

Controller

SAS/SATA

Bus and

Protocol

SSD

Embedded

CPU

SSD

RAM

Battery/Sup

er

Capacitors

NAND

Flash

Network

Adapter

Network

Adapter

Page 8: Running NoSQL Natively on Flash Fusion-io SDKnosqlroadshow.com/dl/NoSQL-Cph-2013/Presentations/new/Running… · ioMemory vs Disk June 18, 2013 Fusion-io Confidential 4 = 4,000 x

SSD data supply chain from application to flash

June 18, 2013 8

▸ 5 Intermediary components required

▸All adding access delay, cost, complexity, and lowering reliability

(especially the super capacitors)

Application

Server

Processor

Disk RAID

Controller

SAS/SATA

Bus and

Protocol

SSD

Embedded

CPU

SSD

RAM

Battery/Sup

er

Capacitors

NAND

Flash

Page 9: Running NoSQL Natively on Flash Fusion-io SDKnosqlroadshow.com/dl/NoSQL-Cph-2013/Presentations/new/Running… · ioMemory vs Disk June 18, 2013 Fusion-io Confidential 4 = 4,000 x

A horse in front of a Ferrari?

June 18, 2013 9

Page 10: Running NoSQL Natively on Flash Fusion-io SDKnosqlroadshow.com/dl/NoSQL-Cph-2013/Presentations/new/Running… · ioMemory vs Disk June 18, 2013 Fusion-io Confidential 4 = 4,000 x

Fusion-io Approach From Application…. to Flash

June 18, 2013 10

▸ 0 Intermediary components required

▸No need for super capacitors because data is not

"buffered” in DRAM

NAND

Flash

Application

Server

Processor

Page 11: Running NoSQL Natively on Flash Fusion-io SDKnosqlroadshow.com/dl/NoSQL-Cph-2013/Presentations/new/Running… · ioMemory vs Disk June 18, 2013 Fusion-io Confidential 4 = 4,000 x

The landscape of sub second Timings

June 18, 2013 11

L1-L3

Cache 10 ns DRAM

100 ns

Fusion-io

15 µs

SAN

4 ms

Blink of an eye

1/10 second

Get cup of Coffee

2,5 minutes

Fly to Thailand

11,1 hours Heartbeat

1 second

Very Small - Kb Very expensive

Volatile Non volatile

HOW FAST DO YOU GET DATA TO THE FACTORY?

Multiplier is 10m - 10.000.000

Page 12: Running NoSQL Natively on Flash Fusion-io SDKnosqlroadshow.com/dl/NoSQL-Cph-2013/Presentations/new/Running… · ioMemory vs Disk June 18, 2013 Fusion-io Confidential 4 = 4,000 x

Direct Cut Through Architecture

June 18, 2013 Fusion-io Confidential 12

PC

Ie

DRAM

Host

CPU

App

OS

LEGACY APPROACH FUSION DIRECT APPROACH

PC

Ie

SA

S

DRAM

Data path

Controller

NAND

Host

CPU

RAID

Controller

App

OS

The goal of every I/O operation is to move data to and from DRAM and the device

SC

Super

Capacitors

Page 13: Running NoSQL Natively on Flash Fusion-io SDKnosqlroadshow.com/dl/NoSQL-Cph-2013/Presentations/new/Running… · ioMemory vs Disk June 18, 2013 Fusion-io Confidential 4 = 4,000 x

Fusion-io is not a SSD device

June 18, 2013 13

Page 14: Running NoSQL Natively on Flash Fusion-io SDKnosqlroadshow.com/dl/NoSQL-Cph-2013/Presentations/new/Running… · ioMemory vs Disk June 18, 2013 Fusion-io Confidential 4 = 4,000 x

Fusion-io Confidential 14 June 18, 2013

Usage Models – Baby Steps

• Moving specific components of the database

to the ioDrives:

▸ Tempdb database

▸ Indexes

▸ Frequently accessed tables

▸ Transaction logs

▸ Partition tables

Page 15: Running NoSQL Natively on Flash Fusion-io SDKnosqlroadshow.com/dl/NoSQL-Cph-2013/Presentations/new/Running… · ioMemory vs Disk June 18, 2013 Fusion-io Confidential 4 = 4,000 x

Fusion-io Confidential 15 June 18, 2013

All In

• If database size permits, placing entire

database system on Fusion-ioDrives

provides maximum performance benefit

Page 16: Running NoSQL Natively on Flash Fusion-io SDKnosqlroadshow.com/dl/NoSQL-Cph-2013/Presentations/new/Running… · ioMemory vs Disk June 18, 2013 Fusion-io Confidential 4 = 4,000 x

16 June 18, 2013

n Node Cluster

Clustering / HA with No Shared Storage!

▸ Perfect NoSQL Model

▸ MSFT SQL Server Always On

▸ Oracle Dataguard

▸ SIOS DataKeeper

▸ Advantages

• Fast replication

• Just another block storage device

Application Mirroring

Individual storage

Page 17: Running NoSQL Natively on Flash Fusion-io SDKnosqlroadshow.com/dl/NoSQL-Cph-2013/Presentations/new/Running… · ioMemory vs Disk June 18, 2013 Fusion-io Confidential 4 = 4,000 x

Getting Performance To ESXi

▸ External storage for virtual machines too costly

▸ Fusion-io delivers IOPS to hosts and virtual machines

SAN/NFS Storage

IOPS

SAN/NFS Storage

IOPS

ioTurbine software caches frequently used data on ioDrives

Reduce SAN/NFS

costs

Page 18: Running NoSQL Natively on Flash Fusion-io SDKnosqlroadshow.com/dl/NoSQL-Cph-2013/Presentations/new/Running… · ioMemory vs Disk June 18, 2013 Fusion-io Confidential 4 = 4,000 x

Fusion-io Confidential 18 June 18, 2013

Fusion-io as a storage appliance server

▸ Standard HP, IBM, DELL Servers

▸ Rack or Blades…

▸ SHARED STORAGE

▸ FC

▸ ISCSI

▸ HA MIRROR

Page 19: Running NoSQL Natively on Flash Fusion-io SDKnosqlroadshow.com/dl/NoSQL-Cph-2013/Presentations/new/Running… · ioMemory vs Disk June 18, 2013 Fusion-io Confidential 4 = 4,000 x

IoTurbine with ION cache store (distributed cache)

▸ Best For

• Enterprise class shared

caching

• Large scale server farms

• Ideal where servers

cannot accommodate

local ioMemory

• Each ESX(i) Host will

have a unique ION LUN

presented

Cache Management

Cache Store LUNs

Delivered from ION

Primary Storage

Page 20: Running NoSQL Natively on Flash Fusion-io SDKnosqlroadshow.com/dl/NoSQL-Cph-2013/Presentations/new/Running… · ioMemory vs Disk June 18, 2013 Fusion-io Confidential 4 = 4,000 x

OS Support

June 18, 2013 20

RHEL 5.6, 5.7, 5.8, 6.0, 6.1, 6.2

SLES 10.4, 11, 11.1

OEL 5.7, 6.0, 6.1, 6.2

CentOS 5.6, 5.7, 6.0, 6.1, 6.2

Debian Squeeze

Fedora 15, 16

openSUSE 12.1

Ubuntu 10.04, 11.10

Linux

Solaris 10 x64 U8, U9, U10

OpenSolaris 2009.06 x64

OSX 10.6 and later

FreeBSD 8,9

Unix

Windows Server 2003 SP2

Windows 7 64 bit

Windows 8 64 bit (in Oct)

Windows Server 2008 R1 SP2

Windows Server 2008 R2

Windows Server 2012 (in Oct)

Windows

VMware ESX 4.0, 4.1

VMware ESXi 4.0, 4.1

VMware ESXi 5.0, 5.1

Windows 2008 R2 with Hyper-V

Hypervisors

support.fusionio.com

Page 21: Running NoSQL Natively on Flash Fusion-io SDKnosqlroadshow.com/dl/NoSQL-Cph-2013/Presentations/new/Running… · ioMemory vs Disk June 18, 2013 Fusion-io Confidential 4 = 4,000 x

Flash Offers A New Architectural Choice

June 18, 2013 21

Milliseconds 10-3 Microseconds 10-6 Nanoseconds 10-9

CPU Cache DRAM

Disk Drives

Server-based Flash

Page 22: Running NoSQL Natively on Flash Fusion-io SDKnosqlroadshow.com/dl/NoSQL-Cph-2013/Presentations/new/Running… · ioMemory vs Disk June 18, 2013 Fusion-io Confidential 4 = 4,000 x

Evolution of Flash Performance

June 18, 2013 22 #Cassandra13

FLASH AS

MEMORY

FLASH AS

DISK

Page 23: Running NoSQL Natively on Flash Fusion-io SDKnosqlroadshow.com/dl/NoSQL-Cph-2013/Presentations/new/Running… · ioMemory vs Disk June 18, 2013 Fusion-io Confidential 4 = 4,000 x

Lets look at some charts

June 18, 2013 23

Page 24: Running NoSQL Natively on Flash Fusion-io SDKnosqlroadshow.com/dl/NoSQL-Cph-2013/Presentations/new/Running… · ioMemory vs Disk June 18, 2013 Fusion-io Confidential 4 = 4,000 x

Adding 3x the DRAM does not really improve things

June 18, 2013 24

Page 25: Running NoSQL Natively on Flash Fusion-io SDKnosqlroadshow.com/dl/NoSQL-Cph-2013/Presentations/new/Running… · ioMemory vs Disk June 18, 2013 Fusion-io Confidential 4 = 4,000 x

HBase Server

June 18, 2013 25

▸ A typical server…

CPU Cores: 32 with HT

Memory: 128 GB

Is your working set larger than 128GB?

Page 26: Running NoSQL Natively on Flash Fusion-io SDKnosqlroadshow.com/dl/NoSQL-Cph-2013/Presentations/new/Running… · ioMemory vs Disk June 18, 2013 Fusion-io Confidential 4 = 4,000 x

HBase Cluster

June 18, 2013 26

▸ With NoSQL Databases, we tend to scale out for

DRAM

Combined Resources

CPU Cores: 96

Memory: 384 GB

More cores than needed to serve reads and writes.

Page 27: Running NoSQL Natively on Flash Fusion-io SDKnosqlroadshow.com/dl/NoSQL-Cph-2013/Presentations/new/Running… · ioMemory vs Disk June 18, 2013 Fusion-io Confidential 4 = 4,000 x

The HBase BucketCache (HBase-7404)

June 18, 2013 27

Committed to HBase trunk. Will be in 0.96 release, backport patch for

0.94 available.`

Victim cache for LRUBlockCache – Move fast ioMemory close to DRAM cache

+

https://issues.apache.org/jira/browse/HBASE-7404

Page 28: Running NoSQL Natively on Flash Fusion-io SDKnosqlroadshow.com/dl/NoSQL-Cph-2013/Presentations/new/Running… · ioMemory vs Disk June 18, 2013 Fusion-io Confidential 4 = 4,000 x

BucketCache Configuration

June 18, 2013 28

▸ In hbase-site.xml

<property> <name>hbase.bucketcache.ioengine</name> <value>file:/path

/to/bucketcache.dat</value> </property>

<property> <name>hbase.bucketcache.size</name> <!-- 2TB: unit is MB

-->

<value>2097152</value>

</property>

Page 29: Running NoSQL Natively on Flash Fusion-io SDKnosqlroadshow.com/dl/NoSQL-Cph-2013/Presentations/new/Running… · ioMemory vs Disk June 18, 2013 Fusion-io Confidential 4 = 4,000 x

BucketCache Warm-up

June 18, 2013 Fusion-io Confidential 29

RE

AD

OP

S D

UR

ING

CA

CH

E W

AR

M-U

P

0

5000

10000

15000

20000

25000

30000

35000

40000

45000

50000

10

5

70

1

13

0

16

90

2

25

0

28

10

3

37

0

39

30

4

49

0

50

50

5

61

0

61

70

6

73

0

72

90

7

85

0

84

10

8

97

0

95

30

1

00

90

1

06

50

1

12

10

1

17

70

1

23

30

1

28

90

1

34

50

1

40

10

1

45

70

1

51

30

1

56

90

1

62

50

1

68

10

1

73

70

1

79

30

1

84

90

1

90

50

1

96

10

2

01

71

2

07

31

2

12

91

2

18

51

2

24

11

2

29

71

2

35

31

2

40

91

2

46

51

2

52

11

2

57

71

2

63

31

2

68

91

read ops/sec

Page 30: Running NoSQL Natively on Flash Fusion-io SDKnosqlroadshow.com/dl/NoSQL-Cph-2013/Presentations/new/Running… · ioMemory vs Disk June 18, 2013 Fusion-io Confidential 4 = 4,000 x

Fusion-io Software Development Kit

30

Traditional Storage

Proprietary Storage OS

Storage Media

Native Flash Translation Layer

Storage Media

Software Defined Storage

Applications

Block I/O

Block I/O

Enhanced I/O Atomic Writes / directFS

Key-Value Store API

Memory Access Extended Memory

Auto Commit Memory

Page 31: Running NoSQL Natively on Flash Fusion-io SDKnosqlroadshow.com/dl/NoSQL-Cph-2013/Presentations/new/Running… · ioMemory vs Disk June 18, 2013 Fusion-io Confidential 4 = 4,000 x

DirectFS Linux file system

31

Native Flash Translation Layer

block allocation, mapping, recycling

ACID updates, logging/journaling, crash-recovery

directFS file metadata mgmt

Kernel block layer

kernel-space

user-space

Ext3

file metadata mgmt,

block allocation, mapping, recycling,

ACID updates, logging/journaling, crash-recovery

Primitive Interfaces

Application

Linux VFS (virtual file system) abstraction layer

Page 32: Running NoSQL Natively on Flash Fusion-io SDKnosqlroadshow.com/dl/NoSQL-Cph-2013/Presentations/new/Running… · ioMemory vs Disk June 18, 2013 Fusion-io Confidential 4 = 4,000 x

directFS: Speed Through Simplicity

32

0 10000 20000 30000 40000 50000 60000 70000

directFS

ReiserFS

Ext4

Btrfs

XFS

L INES OF CODE

Page 33: Running NoSQL Natively on Flash Fusion-io SDKnosqlroadshow.com/dl/NoSQL-Cph-2013/Presentations/new/Running… · ioMemory vs Disk June 18, 2013 Fusion-io Confidential 4 = 4,000 x

Atomic writes – Transactional I/O

▸ System call tells DirectFS that all I/O to this file should be

treated as atomic

▸ Avoids the partial page write problem

▸ Accepted by T10 technical committee for SCSI standard

▸ Minimal application changes required

33

Page 34: Running NoSQL Natively on Flash Fusion-io SDKnosqlroadshow.com/dl/NoSQL-Cph-2013/Presentations/new/Running… · ioMemory vs Disk June 18, 2013 Fusion-io Confidential 4 = 4,000 x

Percona Server, MariaDB, MySQL 5.6

▸ Efficient XtraDB/InnoDB storage engine

▸ Well optimized for seek-less storage like flash

▸ Many config parameters to fine-tune performance

▸ What else can be done?

• Lock contention can still be improved as seen by using

multiple instances with the same storage device

• Tapping into the native performance of flash by

exposing key FTL features to the application

34

Page 35: Running NoSQL Natively on Flash Fusion-io SDKnosqlroadshow.com/dl/NoSQL-Cph-2013/Presentations/new/Running… · ioMemory vs Disk June 18, 2013 Fusion-io Confidential 4 = 4,000 x

MySQL Writes Comparison

35

Traditional MySQL Writes MySQL with Atomic Writes

Page

C Page

B

Page

A

Buffer

DRAM

Buffer

SSD (or HDD) Database

Database

Server

Page

C

Page

B

Page

A

Page

C

Page

B

Page

A

Page

C

Page

B

Page

A

Application

initiates updates

to pages A, B,

and C.

1

MySQL copies

updated pages to

memory buffer.

2

MySQL writes

to double-write

buffer on the

media.

3

Once step 3 is

acknowledged,

MySQL writes

the updates to

the actual

tablespace.

4

ioMemory Database

Page

C

Page

B

Page

A

DRAM

Buffer

Page

C

Page

B

Page

A

Application

initiates updates

to pages A, B,

and C.

1

MySQL copies

updated pages

to memory

buffer.

2

MySQL writes to

actual tablespace,

bypassing the

double-write buffer

step due to

inherent atomicity

guaranteed by the

intelligent device.

3

Database

Server

Page

C Page

B

Page

A

Page 36: Running NoSQL Natively on Flash Fusion-io SDKnosqlroadshow.com/dl/NoSQL-Cph-2013/Presentations/new/Running… · ioMemory vs Disk June 18, 2013 Fusion-io Confidential 4 = 4,000 x

Atomic benchmarks

First, lets sum up the MySQL benefits here:

• Writing only 50% of the data otherwise required for

ACID compliance

That’s pretty much it…but it gives us

▸ Twice the flash endurance

▸ Much better latency because of fewer syscalls

▸ Much better application throughput due to less I/O

▸ Better concurrency due to fewer locks

36

Page 37: Running NoSQL Natively on Flash Fusion-io SDKnosqlroadshow.com/dl/NoSQL-Cph-2013/Presentations/new/Running… · ioMemory vs Disk June 18, 2013 Fusion-io Confidential 4 = 4,000 x

Atomics 50% more TPC-C throughput

37

Page 38: Running NoSQL Natively on Flash Fusion-io SDKnosqlroadshow.com/dl/NoSQL-Cph-2013/Presentations/new/Running… · ioMemory vs Disk June 18, 2013 Fusion-io Confidential 4 = 4,000 x

Fusion-io advanced development Storage Class Memory

38

Small Capacity DRAM (volatile)

$$/GB Big Capacity Flash

$/GB

Memory-speed persistence

Byte-addressable vs. block addressable

Small Capacity SCM

(persistent) Server Virtual Memory

Server

Page 39: Running NoSQL Natively on Flash Fusion-io SDKnosqlroadshow.com/dl/NoSQL-Cph-2013/Presentations/new/Running… · ioMemory vs Disk June 18, 2013 Fusion-io Confidential 4 = 4,000 x

SCM research

▸ Lets look at keeping a database log using memory

semantics

▸ Goal is to reduce latency, cost of flushing data to a

persistent state and further minimize writes

▸ SCM testing using modified Innosim toool

39

Page 40: Running NoSQL Natively on Flash Fusion-io SDKnosqlroadshow.com/dl/NoSQL-Cph-2013/Presentations/new/Running… · ioMemory vs Disk June 18, 2013 Fusion-io Confidential 4 = 4,000 x

SCM logger interface

▸ logger_open()

Open and initialize logging infrastructure within the FTL

▸ logger_close()

Clean-up

▸ logger_append()

Append to head of log at memory speeds. This basically

translates to a memcpy()

▸ logger_sync()

Serialize data using assembler ‘mfence’ instruction 40

Page 41: Running NoSQL Natively on Flash Fusion-io SDKnosqlroadshow.com/dl/NoSQL-Cph-2013/Presentations/new/Running… · ioMemory vs Disk June 18, 2013 Fusion-io Confidential 4 = 4,000 x

Practical Database Use Case: MySQL

41

8,000

16,000 15,750

0

2,000

4,000

6,000

8,000

10,000

12,000

14,000

16,000

18,000

B A S E L I N E S C E N A R I O 1 S C E N A R I O 2

INN

OS

IM O

PS

/SE

C

Nearly as fast

as disabling

the transaction

log completely.

Log transaction through block I/O No Logging Log to Fusion-io ACM

Page 42: Running NoSQL Natively on Flash Fusion-io SDKnosqlroadshow.com/dl/NoSQL-Cph-2013/Presentations/new/Running… · ioMemory vs Disk June 18, 2013 Fusion-io Confidential 4 = 4,000 x

The coming shift in software development

42

▸As an SSD, flash accelerates applications.

At full maturity, Non-Volatile Memory

will transform software development.

Page 44: Running NoSQL Natively on Flash Fusion-io SDKnosqlroadshow.com/dl/NoSQL-Cph-2013/Presentations/new/Running… · ioMemory vs Disk June 18, 2013 Fusion-io Confidential 4 = 4,000 x

f u s i o n i o . c o m | R E D E F I N E W H A T ’ S P O S S I B L E

T H A N K Y O U