RocksDB and MongoRocks - Percona · PDF filefile format (data block) aaaaaaa : val aaaaaab :...

27

Transcript of RocksDB and MongoRocks - Percona · PDF filefile format (data block) aaaaaaa : val aaaaaab :...

Page 1: RocksDB and MongoRocks - Percona · PDF filefile format (data block) aaaaaaa : val aaaaaab : val aaaaaac : val aabaaaa : val aabaaax : val aaaaaaa : val [6]b : val [6]c : val [2]baaaa
Page 2: RocksDB and MongoRocks - Percona · PDF filefile format (data block) aaaaaaa : val aaaaaab : val aaaaaac : val aabaaaa : val aabaaax : val aaaaaaa : val [6]b : val [6]c : val [2]baaaa

RocksDB and MongoRocks

Islam AbdelRahman Software Engineer

Page 3: RocksDB and MongoRocks - Percona · PDF filefile format (data block) aaaaaaa : val aaaaaab : val aaaaaac : val aabaaaa : val aabaaax : val aaaaaaa : val [6]b : val [6]c : val [2]baaaa

MongoDB using RocksDB storage engine

What is MongoRocks

Page 4: RocksDB and MongoRocks - Percona · PDF filefile format (data block) aaaaaaa : val aaaaaab : val aaaaaac : val aabaaaa : val aabaaax : val aaaaaaa : val [6]b : val [6]c : val [2]baaaa

• Embedded Persistent key-value store • Optimized for server work load • Open source • Used by Facebook, LinkedIn, Yahoo, Microsoft,

Netflix, Airbnb, Pinterest …

What is RocksDB

Page 5: RocksDB and MongoRocks - Percona · PDF filefile format (data block) aaaaaaa : val aaaaaab : val aaaaaac : val aabaaaa : val aabaaax : val aaaaaaa : val [6]b : val [6]c : val [2]baaaa

RocksDB Architecture

Page 6: RocksDB and MongoRocks - Percona · PDF filefile format (data block) aaaaaaa : val aaaaaab : val aaaaaac : val aabaaaa : val aabaaax : val aaaaaaa : val [6]b : val [6]c : val [2]baaaa

Log Structured Merge Trees

Level 0

Level 1

Level 2

Level 3

Memtable (64 MB)

(256 MB)

(512 MB)

(5 GB)

(50 GB)

Level 4 (500 GB)

Newer

Older

Page 7: RocksDB and MongoRocks - Percona · PDF filefile format (data block) aaaaaaa : val aaaaaab : val aaaaaac : val aabaaaa : val aabaaax : val aaaaaaa : val [6]b : val [6]c : val [2]baaaa

Writes

Level 0

Memtable (64 MB)

(256 MB)

WAL

(Key, Value)

Page 8: RocksDB and MongoRocks - Percona · PDF filefile format (data block) aaaaaaa : val aaaaaab : val aaaaaac : val aabaaaa : val aabaaax : val aaaaaaa : val [6]b : val [6]c : val [2]baaaa

Flush

Level 0

Memtable (64 MB)

(256 MB)

new

Page 9: RocksDB and MongoRocks - Percona · PDF filefile format (data block) aaaaaaa : val aaaaaab : val aaaaaac : val aabaaaa : val aabaaax : val aaaaaaa : val [6]b : val [6]c : val [2]baaaa

Compaction

Level 0

Level 1

Level 2

Level 3

Memtable (64 MB)

(256 MB)

(512 MB)

(5 GB)

(50 GB)

Level 4 (500 GB)

Page 10: RocksDB and MongoRocks - Percona · PDF filefile format (data block) aaaaaaa : val aaaaaab : val aaaaaac : val aabaaaa : val aabaaax : val aaaaaaa : val [6]b : val [6]c : val [2]baaaa

Compaction

Level 0

Level 1

Level 2

Level 3

Memtable (64 MB)

(256 MB)

(512 MB)

(5 GB)

(50 GB)

Level 4 (500 GB)

new new new

Page 11: RocksDB and MongoRocks - Percona · PDF filefile format (data block) aaaaaaa : val aaaaaab : val aaaaaac : val aabaaaa : val aabaaax : val aaaaaaa : val [6]b : val [6]c : val [2]baaaa

• Foreground • Write to memtable + Write Ahead Log

• Background • Flush • Compaction

Writes

Page 12: RocksDB and MongoRocks - Percona · PDF filefile format (data block) aaaaaaa : val aaaaaab : val aaaaaac : val aabaaaa : val aabaaax : val aaaaaaa : val [6]b : val [6]c : val [2]baaaa

File format Data BlockData Block

Data BlockData BlockData BlockIndex Block

Bloom Filter BlockStatistics Block

Page 13: RocksDB and MongoRocks - Percona · PDF filefile format (data block) aaaaaaa : val aaaaaab : val aaaaaac : val aabaaaa : val aabaaax : val aaaaaaa : val [6]b : val [6]c : val [2]baaaa

File format (Data Block)

AAAAAAA : VALAAAAAAB : VALAAAAAAC : VALAABAAAA : VALAABAAAX : VAL

AAAAAAA : VAL[6]B : VAL[6]C : VAL

[2]BAAAA : VAL[6]X : VAL

CompressedBlock

(Snappy / Zlib / etc.)

Page 14: RocksDB and MongoRocks - Percona · PDF filefile format (data block) aaaaaaa : val aaaaaab : val aaaaaac : val aabaaaa : val aabaaax : val aaaaaaa : val [6]b : val [6]c : val [2]baaaa

File format Data BlockData Block

Data BlockData BlockData BlockIndex Block

Bloom Filter BlockStatistics Block

Page 15: RocksDB and MongoRocks - Percona · PDF filefile format (data block) aaaaaaa : val aaaaaab : val aaaaaac : val aabaaaa : val aabaaax : val aaaaaaa : val [6]b : val [6]c : val [2]baaaa

Other files

Manifest WAL LOG

LSM State Recovery Debugging

Page 16: RocksDB and MongoRocks - Percona · PDF filefile format (data block) aaaaaaa : val aaaaaab : val aaaaaac : val aabaaaa : val aabaaax : val aaaaaaa : val [6]b : val [6]c : val [2]baaaa

Level 1+ files

1 -> 10 11 -> 50 60 -> 70 75 -> 80 90 -> 100

None overlapping key ranges

Page 17: RocksDB and MongoRocks - Percona · PDF filefile format (data block) aaaaaaa : val aaaaaab : val aaaaaac : val aabaaaa : val aabaaax : val aaaaaaa : val [6]b : val [6]c : val [2]baaaa

Level 0 files

20 -> 80 1 -> 100 11 -> 99 30 -> 40

Overlapping key ranges

Page 18: RocksDB and MongoRocks - Percona · PDF filefile format (data block) aaaaaaa : val aaaaaab : val aaaaaac : val aabaaaa : val aabaaax : val aaaaaaa : val [6]b : val [6]c : val [2]baaaa

Reads (point look up)

Level 0

Level 1

Level 2

Level 3

Memtable (64 MB)

(256 MB)

(512 MB)

(5 GB)

(50 GB)

Level 4 (500 GB)

Page 19: RocksDB and MongoRocks - Percona · PDF filefile format (data block) aaaaaaa : val aaaaaab : val aaaaaac : val aabaaaa : val aabaaax : val aaaaaaa : val [6]b : val [6]c : val [2]baaaa

Reads (Iterators)

Level 0

Level 1

Level 2

Level 3

Memtable (64 MB)

(256 MB)

(512 MB)

(5 GB)

(50 GB)

Level 4 (500 GB)

(1 Iterator)

(4 Iterators)

(1 Iterator)

(1 Iterator)

(1 Iterator)

(1 Iterator)

RocksDB Iterator

Page 20: RocksDB and MongoRocks - Percona · PDF filefile format (data block) aaaaaaa : val aaaaaab : val aaaaaac : val aabaaaa : val aabaaax : val aaaaaaa : val [6]b : val [6]c : val [2]baaaa

•  MongoDB 3.0 introduced pluggable storage engine API

•  MongoDB using RocksDB storage engine •  Running in production since March 2015

MongoRocks

Page 21: RocksDB and MongoRocks - Percona · PDF filefile format (data block) aaaaaaa : val aaaaaab : val aaaaaac : val aabaaaa : val aabaaax : val aaaaaaa : val [6]b : val [6]c : val [2]baaaa

•  Mobile backend as a service •  One of the biggest MongoDB deployments •  Millions of collections, millions of indexes

Parse

Page 22: RocksDB and MongoRocks - Percona · PDF filefile format (data block) aaaaaaa : val aaaaaab : val aaaaaac : val aabaaaa : val aabaaax : val aaaaaaa : val [6]b : val [6]c : val [2]baaaa

•  Huge storage savings (compressed 5 TB to 285 GB)

•  Document level locking •  Better Backups

MongoRocks

Page 23: RocksDB and MongoRocks - Percona · PDF filefile format (data block) aaaaaaa : val aaaaaab : val aaaaaac : val aabaaaa : val aabaaax : val aaaaaaa : val [6]b : val [6]c : val [2]baaaa

•  RocksDB files are immutable •  Backups are fast •  Incremental backup using rocks-strata •  Queriable backups

MongoRocks Backups

Page 24: RocksDB and MongoRocks - Percona · PDF filefile format (data block) aaaaaaa : val aaaaaab : val aaaaaac : val aabaaaa : val aabaaax : val aaaaaaa : val [6]b : val [6]c : val [2]baaaa

MongoRocks Backup

Level 0

Level 1

Memtable

1 2 3 4 5

1 2 3 4 5

Level 2 6 6

Backup Directory

Page 25: RocksDB and MongoRocks - Percona · PDF filefile format (data block) aaaaaaa : val aaaaaab : val aaaaaac : val aabaaaa : val aabaaax : val aaaaaaa : val [6]b : val [6]c : val [2]baaaa

MongoRocks Backup

Level 0

Level 1

Memtable

1 2 3 4 5

1 2 3 4 5

Level 2 6 6

Backup Directory

7

8 9 7 8 9

Page 26: RocksDB and MongoRocks - Percona · PDF filefile format (data block) aaaaaaa : val aaaaaab : val aaaaaac : val aabaaaa : val aabaaax : val aaaaaaa : val [6]b : val [6]c : val [2]baaaa

•  RocksDB: https://github.com/facebook/rocksdb/ •  MongoRocks: https://github.com/mongodb-partners/mongo-

rocks •  Rocks-Strata: https://github.com/facebookgo/rocks-strata

Thanks !

Page 27: RocksDB and MongoRocks - Percona · PDF filefile format (data block) aaaaaaa : val aaaaaab : val aaaaaac : val aabaaaa : val aabaaax : val aaaaaaa : val [6]b : val [6]c : val [2]baaaa