RocksDB and MongoRocks - Percona · PDF filefile format (data block) aaaaaaa : val aaaaaab :...

Post on 06-Feb-2018

252 views 5 download

Transcript of RocksDB and MongoRocks - Percona · PDF filefile format (data block) aaaaaaa : val aaaaaab :...

RocksDB and MongoRocks

Islam AbdelRahman Software Engineer

MongoDB using RocksDB storage engine

What is MongoRocks

• Embedded Persistent key-value store • Optimized for server work load • Open source • Used by Facebook, LinkedIn, Yahoo, Microsoft,

Netflix, Airbnb, Pinterest …

What is RocksDB

RocksDB Architecture

Log Structured Merge Trees

Level 0

Level 1

Level 2

Level 3

Memtable (64 MB)

(256 MB)

(512 MB)

(5 GB)

(50 GB)

Level 4 (500 GB)

Newer

Older

Writes

Level 0

Memtable (64 MB)

(256 MB)

WAL

(Key, Value)

Flush

Level 0

Memtable (64 MB)

(256 MB)

new

Compaction

Level 0

Level 1

Level 2

Level 3

Memtable (64 MB)

(256 MB)

(512 MB)

(5 GB)

(50 GB)

Level 4 (500 GB)

Compaction

Level 0

Level 1

Level 2

Level 3

Memtable (64 MB)

(256 MB)

(512 MB)

(5 GB)

(50 GB)

Level 4 (500 GB)

new new new

• Foreground • Write to memtable + Write Ahead Log

• Background • Flush • Compaction

Writes

File format Data BlockData Block

Data BlockData BlockData BlockIndex Block

Bloom Filter BlockStatistics Block

File format (Data Block)

AAAAAAA : VALAAAAAAB : VALAAAAAAC : VALAABAAAA : VALAABAAAX : VAL

AAAAAAA : VAL[6]B : VAL[6]C : VAL

[2]BAAAA : VAL[6]X : VAL

CompressedBlock

(Snappy / Zlib / etc.)

File format Data BlockData Block

Data BlockData BlockData BlockIndex Block

Bloom Filter BlockStatistics Block

Other files

Manifest WAL LOG

LSM State Recovery Debugging

Level 1+ files

1 -> 10 11 -> 50 60 -> 70 75 -> 80 90 -> 100

None overlapping key ranges

Level 0 files

20 -> 80 1 -> 100 11 -> 99 30 -> 40

Overlapping key ranges

Reads (point look up)

Level 0

Level 1

Level 2

Level 3

Memtable (64 MB)

(256 MB)

(512 MB)

(5 GB)

(50 GB)

Level 4 (500 GB)

Reads (Iterators)

Level 0

Level 1

Level 2

Level 3

Memtable (64 MB)

(256 MB)

(512 MB)

(5 GB)

(50 GB)

Level 4 (500 GB)

(1 Iterator)

(4 Iterators)

(1 Iterator)

(1 Iterator)

(1 Iterator)

(1 Iterator)

RocksDB Iterator

•  MongoDB 3.0 introduced pluggable storage engine API

•  MongoDB using RocksDB storage engine •  Running in production since March 2015

MongoRocks

•  Mobile backend as a service •  One of the biggest MongoDB deployments •  Millions of collections, millions of indexes

Parse

•  Huge storage savings (compressed 5 TB to 285 GB)

•  Document level locking •  Better Backups

MongoRocks

•  RocksDB files are immutable •  Backups are fast •  Incremental backup using rocks-strata •  Queriable backups

MongoRocks Backups

MongoRocks Backup

Level 0

Level 1

Memtable

1 2 3 4 5

1 2 3 4 5

Level 2 6 6

Backup Directory

MongoRocks Backup

Level 0

Level 1

Memtable

1 2 3 4 5

1 2 3 4 5

Level 2 6 6

Backup Directory

7

8 9 7 8 9

•  RocksDB: https://github.com/facebook/rocksdb/ •  MongoRocks: https://github.com/mongodb-partners/mongo-

rocks •  Rocks-Strata: https://github.com/facebookgo/rocks-strata

Thanks !