Optimizing Flash-based Key-value Cache Systems · 555,666 777,888 888,999. Over-Provisioning Space...

24
Optimizing Flash-based Key-value Cache Systems The 8th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage’16) †Department of Computing, Hong Kong Polytechnic University ‡Computer Science & Engineering, Louisiana State University Zhaoyan Shen , Feng Chen , Yichen Jia , Zili Shao

Transcript of Optimizing Flash-based Key-value Cache Systems · 555,666 777,888 888,999. Over-Provisioning Space...

  • Optimizing Flash-based Key-value Cache Systems

    The 8th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage’16)

    †Department of Computing, Hong Kong Polytechnic University

    ‡Computer Science & Engineering, Louisiana State University

    Zhaoyan Shen†, Feng Chen‡, Yichen Jia‡, Zili Shao†

  • Key-value Information

    1

    • Key-value access is dominant in web services

    – Many apps simply store and retrieve key-value pairs

    – Key-value cache is the first line of defense• Benefits: Improve throughput, reduce latency, reduce server load

    – In-memory KV cache is popular (Memcache)

    • High speed but has cost, power, capacity problem

  • Key-value Slabs (Flash LBA)

    Log-Structured Logical Flash Space

    Flash based Key-value Cache

    2

    +

    Slab(Slab, Slot)Key

    Hash Index (Memory)

    SHA-1(Key_1)

    Key-value cache Flash SSD

    • In-memory hash map to track key-to-value mapping

    • Slabs are used in a log-structured way

    • Updated value item written to a new location and old values recycled later

  • Critical Issues

    3

    • Redundant mapping

    • Double garbage collection

    • Over-over-provisioning

  • Critical Issue 1: Redundant Mapping

    4

    • Redundant mapping at application- and FTL-level – KVC: An in-memory hash table (Key Slab, Offset)

    – FTL: An on-device page mapping table (LBA PBA)

    • Problems – Two mapping structures (unnecessarily) co-exist at two levels

    – A significant waste of on-device DRAM space (e.g., 1GB for 1TB)• The on-device DRAM buffer is costly, unreliable, and could be used for buffering.

    SHA-1(Key)

    AAA,BBBBBB,CCCCCC,DDDDDD,EEE

    Slab SpaceHash table

    (1, 2)K1

    Mapping Table

    0123

    AAA, BBBBBB,CCCCCC,DDDDDD,EEE

    ……

    PBALBA

    Critical Issues

    KVC software Mapping FTL Mapping in hardware

  • Critical Issue 2: Double Garbage Collection

    5

    • Garbage collection (GC) at app- and FTL- levels– KVC: Recycle deleted or changed key-value items

    – FTL: Recycle trimmed or changed pages

    • Problems– Semantic validity of a key-value entry is not seen at FTL

    – Redundant data copy operation

    SHA-1(Key)

    Hash table

    (1, 2)K1

    AAA,BBBBBB,CCCCCC,DDDDDD,EEE

    PBASlab Space

    AAA,BBBBBB,CCCCCC,DDDDDD,EEE

    012345

    LBA

    CCC,DDD

    Mapping Table

    (2, 1) AAA,BBBBBB,CCCCCC,DDDDDD,EEE

    CCC,DDD

    FTL-level GCKVC-level GC

    Critical Issues

  • Critical Issue 3: Over-over-provisioning

    6

    • Over-provisioning at FTL-level– FTL has a portion (20-30%) of flash as Over-Provisioning Space (OPS)

    – OPS space is invisible and unusable to the host applications

    • Problems– OPS is reserved for dealing the worst-case situation, as a storage

    – Over-over provisioning for Key-value caches• Key-value caches are dominated by read (GET) traffic, not writes

    – Key-value cache hit ratio is highly sensitive to usable cache size• If 20-30% space can be released, the cache hit ratio can be greatly improved

    SHA-1(Key)

    AAA,BBBBBB,CCCCCC,DDDDDD,EEE

    Slab SpaceHash table

    (1, 2)K1

    LBA

    0123

    AAA,BBBBBB,CCCCCC,DDDDDD,EEE

    ……

    PBA

    ……

    FFF,GGGHHH,IIIJJJ, KKK

    Mapping Table

    Unusable

    Critical Issues

  • Semantic Gap Problem

    7

    Semantic Gap

    • Fine-grained GC• Key-to-value mapping• Validity of slab entries

    • Physical data layout on flash• Direct flash memory control• Proper mapping granularity

    In the current SW/HW architecture, we can do little to address these three issues.

    Key-value cache Flash SSD

    Critical Issues

  • Optimization Goals

    8

    • Redundant mapping Single mapping

    • Double garbage collection App-driven GC

    • Over-over-provisioning Dynamic OPS

  • Design Overview

    9

    • An enhanced flash-aware key-value cache

    • A thin intermediate library layer (libssd)

    • A specialized flash memory PCI-E SSD hardware

    Flash SSD

    Garbage Collection

    PageMapping

    Bad Block Mgm.

    Flash Operations

    WearLeveling

    OPSMgm.

    Operating Systems

    Page CacheDeviceDriver

    I/OScheduling

    Key-value Cache

    K/V Mapping

    Slab Mgm.CacheMgm.

    Open-channel SSD

    Flash Operations

    Operating Systems

    Page CacheDeviceDriver

    I/OScheduling

    Library (libssd)

    OperationConversion

    Slab/FlashTranslation

    Bad Block Mgm.

    Key-value Cache

    Key/SlabMapping

    KVC-drivenGC

    OPSManagement

    Slab Manager Cache Manager

  • An Enhanced Flash-aware Key-value Cache

    10

    • Slab management

    • Unified direct mapping

    • Garbage collection

    • OPS management

    Key-value Cache

  • • Our choice – Directly use a flash block as a slab (8MB)

    • Slab buffer: An in-memory slab maintained for each class

    – Parallelize and asynchronize the slab write I/Os to the flash memory

    • Round-robin allocation of in-flash slab for load-balancing across channels

    In-Memory Slab Buffer

    Slab Management

    11

    Class i Class i+1 Class nClass i+2

    Key/Value

    Channel #0 Channel #1 Channel #2 Channel #3 Channel #4

    Key-value Cache

    FlashSSD

  • SHA-1(Key)

    AAA,BBBBBB,CCCCCC,DDDDDD,EEE

    Slab SpaceHash table

    (1, 2)K1

    Mapping Table

    0123

    AAA,BBBBBB,CCCCCC,DDDDDD,EEE

    ……

    Flash Memory

    Unified Direct Mapping

    12

    • Collapse multiple levels of indirect mapping to only one– Prior mapping: KeySlabLBAPBA

    – Current mapping: KeySlab (i.e., Flash Block)

    • Benefits– Removes the time overhead for lookup intermediate layers (Speed+)

    – Only one single must-have in-memory hash table is needed (Cost-)

    – On-device RAM space can be completely removed (or for other uses)

    Key-value Cache

    SHA-1(Key)

    Hash table

    (Slab, Offset)K1

    AAA,BBBBBB,CCCCCC,DDDDDD,EEE

    Flash Memory

    …………

    Slab #1

    Slab #2

    CDC, CDC

  • • One single GC is driven directly by key-value cache system– All slab writes are in units of blocks (no need for device-level GC)

    – GC is directly triggered and controlled by application-level KVC

    • Two GC policies– Greedy GC: the least occupied slab is evicted to move minimum slots

    – Quick clean: the LRU slab is immediately dropped recycling valid slots

    – Adaptively used for different circumstances (busy or idle)

    Target Slab

    App-driven Garbage Collection

    13

    Key-value Cache

    Greedy GC

    AAA,BBBBBB,CCCCCC,DDDDDD,EEE

    copyCCC,DDD

    Victim Slab

    AAA,BBBBBB,CCCCCC,DDDDDD,EEE

    Quick Clean

    eee,fffccc,dddbbb,cccaaa,bbb

    LRU Slab MRU Slab

    333,444555,666777,888888,999

  • Over-Provisioning Space Management

    14

    • Dynamically tuning OPS space online– Rationale – KVC is typically read-intensive and OPS can be small

    – Goal – keep just enough OPS space (adaptive to intensity of writes)

    • OPS management policies– Heuristic method: An OPS window (WL and WH) to estimate size

    • Low watermark hit – Trigger quick clean, raise the OPS window

    • High watermark hit – OPS is over-allocated, lower the OPS window

    Key-value Cache

    OP

    S

    Time

    WH

    WL

    Max OSP

    Quick clean initiated

  • Preliminary Experiments

    15

    • Implementation– Key-value cache on Twitter’s Fatcache to fit hardward

    – Libssd Library (621 lines of code in C)

    • Experimental Setup– Intel Xeon E-1225, 32GB Memory, 1TB Disk, Open-Channel SSD

    – Ubuntu 14.04 LTS, Linux 3.17.8, Ext4 filesystem

    • Hardware: Open-channel SSD– A PCI-E based with 12 channel, and 192 LUNs

    – Direct control to the device (via ioctl interface)

    Experiments

  • SET – Throughput and Latency

    16

    • SET Workloads: 40milliom requests of 1KB key/value pairs

    • Both set throughput/latency from our scheme are the best

    Experiments

    Our Scheme

    Stock Fatcache

    Our Scheme Stock Fatcache

  • Conclusion

    • KV stores become critical as they are one of the most important building blocks in modern web infrastructures and high-performance data-intensive applications.

    • We build a highly efficient flash-based cache system which enables three benefits, namely a unified single-level direct mapping, a cache-driven fine-grained garbage collection, and an adaptive over-provisioning scheme

    • We are implementing a prototype on the Open-Channel SSD hardware and our preliminary results show that it is highly promising

    17

    Conclusion

  • Thank You !

  • 19

    Backup Slides

  • GET – Throughput and Latency

    20

    • GET performance is largely determined by the raw speed

    • GET latencies are among the lowest in the set of SSDs

    Our Scheme

    Stock Fatcache

    Our SchemeStock Fatcache

  • SET Latencies – A Zoom-in View

    21

    • The direct control successfully removes high cost GC effects

    • Quick clean removes long I/Os under high write pressure

    Our Scheme Stock Fatchche + KingSpec

  • Block Erase Count

    22

    • Trace collected with running Fatcache on Samsung SSD

    • Block trace is replayed on MSR’s SSD simulator for erase count

    • Our solution reduces erase count by 30%

    0

    500

    1000

    1500

    2000

    2500

    3000

    3500

    1 2 4 6 8

    Erase Count

    SSDSim Open-channel SSD

    Quick Clean kicks in

  • Effect of the In-memory Slab Buffer

    23

    • 10x buffer size difference does not affect latency significantly

    • A small (56MB) in-memory slab buffer is sufficient for use