Some key-value stores using log-structure Zhichao Liang [email protected] LevelDB Riak.

22
Some key-value stores using log-structure Zhichao Liang [email protected] LevelDB Riak

Transcript of Some key-value stores using log-structure Zhichao Liang [email protected] LevelDB Riak.

Page 1: Some key-value stores using log-structure Zhichao Liang frankey0207@gmail.com LevelDB Riak.

Some key-value stores using log-structure

Zhichao [email protected]

LevelDB

Riak

Page 2: Some key-value stores using log-structure Zhichao Liang frankey0207@gmail.com LevelDB Riak.

Outline• Why log structure?• Riak: log-structure hash table• Rethinkdb: log-structure b-tree• Leveldb: log-structure merge tree• Conclusion

Page 3: Some key-value stores using log-structure Zhichao Liang frankey0207@gmail.com LevelDB Riak.

Outline• Why log structure?• Riak: log-structure hash table• Rethinkdb: log-structure b-tree• Leveldb: log-structure merge tree• Conclusion

Page 4: Some key-value stores using log-structure Zhichao Liang frankey0207@gmail.com LevelDB Riak.

Log Structure• A log-structured file system is a file system design first

proposed in 1988 by John K. Ousterhout and Fred Douglis.• Design for high write throughput, all updates to data and

metadata are written sequentially to a continuous stream, called a log.

• Conventional file systems tend to lay out files with great care for

spatial locality and make in-place changes to their data structures.

Page 5: Some key-value stores using log-structure Zhichao Liang frankey0207@gmail.com LevelDB Riak.

Log Structure for SSD• Random write degrades the system performance and shrinks

the lifetime of ssd.• Log structure is ssd-friendly natively!

Magnetic Disk SSD

freefreefreefree

freefree

freefreefreefree

freefree

data 1new data 1data 2data 3data 4

new data 3

blockblock

data 3data 2data 1 RAM

free

freefree

data 2

erasederasederased

new data 1data 2data 3 data 3

Page 6: Some key-value stores using log-structure Zhichao Liang frankey0207@gmail.com LevelDB Riak.

Outline• Why log structure?• Riak: log-structure hash table• Rethinkdb: log-structure b-tree• Leveldb: log-structure merge tree• Conclusion

Page 7: Some key-value stores using log-structure Zhichao Liang frankey0207@gmail.com LevelDB Riak.

Riak ?• Riak is an open source, highly scalable, fault-tolerant

distributed database. • Supported core features:

- operate in highly distributed environments- no single point of failure- highly fault-tolerant- scales simply and intelligently- highly data available- low cost of operations

Page 8: Some key-value stores using log-structure Zhichao Liang frankey0207@gmail.com LevelDB Riak.

Bitcask• A Bitcask instance is a directory, and only one

operating system process will open that Bitcask for writing at a given time.

• The active file is only written by appending, which means that sequential writes do not require disk seeking.

Page 9: Some key-value stores using log-structure Zhichao Liang frankey0207@gmail.com LevelDB Riak.

Hash Index: keydir• A keydir is simply a hash table that maps every key in

a Bitcask to a fixed-size structure giving the file, offset and size of the most recently written entry for that key .

Page 10: Some key-value stores using log-structure Zhichao Liang frankey0207@gmail.com LevelDB Riak.

Merge• The merge process iterates over all non-active file

and produces as output a set of data files containing only the “live” or latest versions of each present key.

Page 11: Some key-value stores using log-structure Zhichao Liang frankey0207@gmail.com LevelDB Riak.

Outline• Why log structure?• Riak: log-structure hash table• Rethinkdb: log-structure b-tree• Leveldb: log-structure merge tree• Conclusion

Page 12: Some key-value stores using log-structure Zhichao Liang frankey0207@gmail.com LevelDB Riak.

RethinkDB ?• RethinkDB is a persistent, industrial-strength key-value store

with full support for the Memcached protocol.• Powerful technology:

- Linear scaling across cores- Fine-grained durability control- Instantaneous recovery on power failure

• Supported core features:- Atomic increment/decrement- Values up to 10MB in size- Multi-GET support- Up to one million transactions per second on commodity hardware

Page 13: Some key-value stores using log-structure Zhichao Liang frankey0207@gmail.com LevelDB Riak.

Installation & usage• RethinkDB works on modern 64-bit distributions of

Linux.

• Running the rethinkdb server:

Ubuntu 10.04.1 x86_64 Ubuntu 10.10 x86_64Red Hat Enterprise Linux 5 x86_64 CentOS 5 x86_64SUSE Linux 10

Default installation path: /usr/bin/rethinkdb-1.0./rethinkdb-1.0 -f /u01/rethinkdb_data./rethinkdb-1.0 -f /u01/rethinkdb_data -c 4 -p 11500./rethinkdb-1.0 -f /u01/rethinkdb_data

-f /u03/rethinkdb_data -c 4 -p 11500

Page 14: Some key-value stores using log-structure Zhichao Liang frankey0207@gmail.com LevelDB Riak.

The methodology• Firstly, lack of mechanical parts makes random reads

on SSD are significantly efficient!• Secondly, random writes trigger more erases, making

these operations expensive, and decreasing the drive lifetime!

• RethinkDB takes an append-only approach to storing data, pioneered by log-structured file system!

What are the consequences of appen-

only ?

Page 15: Some key-value stores using log-structure Zhichao Liang frankey0207@gmail.com LevelDB Riak.

Append-only consequences

Data Consistency

Hot Backups

Instantaneous Recovery

Easy Replication

Lock-Free Concurrency

Live Schema Changes

Database Snapshots

2) large amount of data that quickly becomes obsolete in an environment with a heavy insert or update workload

1) eliminating data locality requires a larger number of disk access

Page 16: Some key-value stores using log-structure Zhichao Liang frankey0207@gmail.com LevelDB Riak.

Append-only B-tree

Page 1 15 Page 2 95 Page 3 1915

Data File … …5 9 1915

Page 1 15

Page 2 95 Page 3 1915

15

Page 3 1915

Page 3 1915

Page 1 15

Page 1 15

Page 17: Some key-value stores using log-structure Zhichao Liang frankey0207@gmail.com LevelDB Riak.

Outline• Why log structure?• Riak: log-structure hash table• Rethinkdb: log-structure b-tree• Leveldb: log-structure merge tree• Conclusion

Page 18: Some key-value stores using log-structure Zhichao Liang frankey0207@gmail.com LevelDB Riak.

LevelDB ?• LevelDB is a fast key-value storage library written at

Google that provides an ordered mapping from string keys to string values.

• Supported core features:- Data is stored sorted by key- Multiple changes can be made in one atomic batch- Users can create a transient snapshot to get a consistent view of data- Data is automatically compressed using the Snappy compression library

Page 19: Some key-value stores using log-structure Zhichao Liang frankey0207@gmail.com LevelDB Riak.

Installation & usage• LevelDB works with snappy, which is a compression /decompression library.

• It is a library, no database server!svn checkout http://leveldb.googlecode.com/svn/trunk/leveldb-read-onlycd leveldb-read-onlymake && cp libleveldb.a /usr/local/lib &&cp -r include/leveldb /usr/local/include

download snappy from http://code.google.com/p/snappy/ cd snappy-1.0.4./configure && make && make install

libleveldb.a

Page 20: Some key-value stores using log-structure Zhichao Liang frankey0207@gmail.com LevelDB Riak.

Log-structure merge tree• LevelDB

Page 21: Some key-value stores using log-structure Zhichao Liang frankey0207@gmail.com LevelDB Riak.

Outline• Why log structure?• Riak: log-structure hash table• Rethinkdb: log-structure b-tree• Leveldb: log-structure merge tree• Conclusion

Page 22: Some key-value stores using log-structure Zhichao Liang frankey0207@gmail.com LevelDB Riak.

Conclusion• Log-structure