Optimizing Flash-based Key-value Cache Systems · 555,666 777,888 888,999. Over-Provisioning Space...

Optimizing Flash-based Key-value Cache Systems

The 8th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage’16)

†Department of Computing, Hong Kong Polytechnic University

‡Computer Science & Engineering, Louisiana State University

Zhaoyan Shen†, Feng Chen‡, Yichen Jia‡, Zili Shao†

Key-value Information

1

• Key-value access is dominant in web services

– Many apps simply store and retrieve key-value pairs

– Key-value cache is the first line of defense• Benefits: Improve throughput, reduce latency, reduce server load

– In-memory KV cache is popular (Memcache)

• High speed but has cost, power, capacity problem

Key-value Slabs (Flash LBA)

Log-Structured Logical Flash Space

Flash based Key-value Cache

2

+

Slab(Slab, Slot)Key

Hash Index (Memory)

SHA-1(Key_1)

Key-value cache Flash SSD

• In-memory hash map to track key-to-value mapping

• Slabs are used in a log-structured way

• Updated value item written to a new location and old values recycled later

Critical Issues

3

• Redundant mapping

• Double garbage collection

• Over-over-provisioning

Critical Issue 1: Redundant Mapping

4

• Redundant mapping at application- and FTL-level – KVC: An in-memory hash table (Key Slab, Offset)

– FTL: An on-device page mapping table (LBA PBA)

• Problems – Two mapping structures (unnecessarily) co-exist at two levels

– A significant waste of on-device DRAM space (e.g., 1GB for 1TB)• The on-device DRAM buffer is costly, unreliable, and could be used for buffering.

SHA-1(Key)

AAA,BBBBBB,CCCCCC,DDDDDD,EEE

Slab SpaceHash table

(1, 2)K1

Mapping Table

0123

AAA, BBBBBB,CCCCCC,DDDDDD,EEE

……

PBALBA

Critical Issues

KVC software Mapping FTL Mapping in hardware

Critical Issue 2: Double Garbage Collection

5

• Garbage collection (GC) at app- and FTL- levels– KVC: Recycle deleted or changed key-value items

– FTL: Recycle trimmed or changed pages

• Problems– Semantic validity of a key-value entry is not seen at FTL

– Redundant data copy operation

SHA-1(Key)

Hash table

(1, 2)K1


PBASlab Space


012345

LBA

CCC,DDD

Mapping Table

(2, 1) AAA,BBBBBB,CCCCCC,DDDDDD,EEE

CCC,DDD

FTL-level GCKVC-level GC

Critical Issues

Critical Issue 3: Over-over-provisioning

6

• Over-provisioning at FTL-level– FTL has a portion (20-30%) of flash as Over-Provisioning Space (OPS)

– OPS space is invisible and unusable to the host applications

• Problems– OPS is reserved for dealing the worst-case situation, as a storage

– Over-over provisioning for Key-value caches• Key-value caches are dominated by read (GET) traffic, not writes

– Key-value cache hit ratio is highly sensitive to usable cache size• If 20-30% space can be released, the cache hit ratio can be greatly improved

SHA-1(Key)



(1, 2)K1

LBA

0123


……

PBA

……

FFF,GGGHHH,IIIJJJ, KKK

Mapping Table

Unusable

Critical Issues

Semantic Gap Problem

7

Semantic Gap

• Fine-grained GC• Key-to-value mapping• Validity of slab entries

• Physical data layout on flash• Direct flash memory control• Proper mapping granularity

In the current SW/HW architecture, we can do little to address these three issues.

Key-value cache Flash SSD

Critical Issues

Optimization Goals

8

• Redundant mapping Single mapping

• Double garbage collection App-driven GC

• Over-over-provisioning Dynamic OPS

Design Overview

9

• An enhanced flash-aware key-value cache

• A thin intermediate library layer (libssd)

• A specialized flash memory PCI-E SSD hardware

Flash SSD

Garbage Collection

PageMapping

Bad Block Mgm.

Flash Operations

WearLeveling

OPSMgm.

Operating Systems

Page CacheDeviceDriver

I/OScheduling

Key-value Cache

K/V Mapping

Slab Mgm.CacheMgm.

Open-channel SSD

Flash Operations

Operating Systems

Page CacheDeviceDriver

I/OScheduling

Library (libssd)

OperationConversion

Slab/FlashTranslation

Bad Block Mgm.

Key-value Cache

Key/SlabMapping

KVC-drivenGC

OPSManagement

Slab Manager Cache Manager

An Enhanced Flash-aware Key-value Cache

10

• Slab management

• Unified direct mapping

• Garbage collection

• OPS management

Key-value Cache

• Our choice – Directly use a flash block as a slab (8MB)

• Slab buffer: An in-memory slab maintained for each class

– Parallelize and asynchronize the slab write I/Os to the flash memory

• Round-robin allocation of in-flash slab for load-balancing across channels

In-Memory Slab Buffer

Slab Management

11

Class i Class i+1 Class nClass i+2

Key/Value

Channel #0 Channel #1 Channel #2 Channel #3 Channel #4

Key-value Cache

FlashSSD

SHA-1(Key)



(1, 2)K1

Mapping Table

0123


……

Flash Memory

Unified Direct Mapping

12

• Collapse multiple levels of indirect mapping to only one– Prior mapping: KeySlabLBAPBA

– Current mapping: KeySlab (i.e., Flash Block)

• Benefits– Removes the time overhead for lookup intermediate layers (Speed+)

– Only one single must-have in-memory hash table is needed (Cost-)

– On-device RAM space can be completely removed (or for other uses)

Key-value Cache

SHA-1(Key)

Hash table

(Slab, Offset)K1


Flash Memory

…………

Slab #1

Slab #2

CDC, CDC

• One single GC is driven directly by key-value cache system– All slab writes are in units of blocks (no need for device-level GC)

– GC is directly triggered and controlled by application-level KVC

• Two GC policies– Greedy GC: the least occupied slab is evicted to move minimum slots

– Quick clean: the LRU slab is immediately dropped recycling valid slots

– Adaptively used for different circumstances (busy or idle)

Target Slab

App-driven Garbage Collection

13

Key-value Cache

Greedy GC


copyCCC,DDD

Victim Slab


Quick Clean

eee,fffccc,dddbbb,cccaaa,bbb

LRU Slab MRU Slab

333,444555,666777,888888,999

Over-Provisioning Space Management

14

• Dynamically tuning OPS space online– Rationale – KVC is typically read-intensive and OPS can be small

– Goal – keep just enough OPS space (adaptive to intensity of writes)

• OPS management policies– Heuristic method: An OPS window (WL and WH) to estimate size

• Low watermark hit – Trigger quick clean, raise the OPS window

• High watermark hit – OPS is over-allocated, lower the OPS window

Key-value Cache

OP

S

Time

WH

WL

Max OSP

Quick clean initiated

Preliminary Experiments

15

• Implementation– Key-value cache on Twitter’s Fatcache to fit hardward

– Libssd Library (621 lines of code in C)

• Experimental Setup– Intel Xeon E-1225, 32GB Memory, 1TB Disk, Open-Channel SSD

– Ubuntu 14.04 LTS, Linux 3.17.8, Ext4 filesystem

• Hardware: Open-channel SSD– A PCI-E based with 12 channel, and 192 LUNs

– Direct control to the device (via ioctl interface)

Experiments

SET – Throughput and Latency

16

• SET Workloads: 40milliom requests of 1KB key/value pairs

• Both set throughput/latency from our scheme are the best

Experiments

Our Scheme

Stock Fatcache

Our Scheme Stock Fatcache

Conclusion

• KV stores become critical as they are one of the most important building blocks in modern web infrastructures and high-performance data-intensive applications.

• We build a highly efficient flash-based cache system which enables three benefits, namely a unified single-level direct mapping, a cache-driven fine-grained garbage collection, and an adaptive over-provisioning scheme

• We are implementing a prototype on the Open-Channel SSD hardware and our preliminary results show that it is highly promising

17

Conclusion

Thank You !

19

Backup Slides

GET – Throughput and Latency

20

• GET performance is largely determined by the raw speed

• GET latencies are among the lowest in the set of SSDs

Our Scheme

Stock Fatcache

Our SchemeStock Fatcache

SET Latencies – A Zoom-in View

21

• The direct control successfully removes high cost GC effects

• Quick clean removes long I/Os under high write pressure

Our Scheme Stock Fatchche + KingSpec

Block Erase Count

22

• Trace collected with running Fatcache on Samsung SSD

• Block trace is replayed on MSR’s SSD simulator for erase count

• Our solution reduces erase count by 30%

0

500

1000

1500

2000

2500

3000

3500

1 2 4 6 8

Erase Count

SSDSim Open-channel SSD

Quick Clean kicks in

Effect of the In-memory Slab Buffer

23

• 10x buffer size difference does not affect latency significantly

• A small (56MB) in-memory slab buffer is sufficient for use

Optimizing Flash-based Key-value Cache Systems · 555,666 777,888 888,999. Over-Provisioning Space...

Documents

Transcript of Optimizing Flash-based Key-value Cache Systems · 555,666 777,888 888,999. Over-Provisioning Space...