Replicated RocksDB at Pinterest @scale 2016 San Jose
-
Upload
bo-liu -
Category
Technology
-
view
192 -
download
0
Transcript of Replicated RocksDB at Pinterest @scale 2016 San Jose
![Page 1: Replicated RocksDB at Pinterest @scale 2016 San Jose](https://reader036.fdocuments.net/reader036/viewer/2022062401/586f91551a28ab54768b7be3/html5/thumbnails/1.jpg)
![Page 2: Replicated RocksDB at Pinterest @scale 2016 San Jose](https://reader036.fdocuments.net/reader036/viewer/2022062401/586f91551a28ab54768b7be3/html5/thumbnails/2.jpg)
August 31, 2016
PinterestEngineering
![Page 3: Replicated RocksDB at Pinterest @scale 2016 San Jose](https://reader036.fdocuments.net/reader036/viewer/2022062401/586f91551a28ab54768b7be3/html5/thumbnails/3.jpg)
Bo LiuSoftware Engineer, Serving Systems
Replicated RocksDB at Pinterest
![Page 4: Replicated RocksDB at Pinterest @scale 2016 San Jose](https://reader036.fdocuments.net/reader036/viewer/2022062401/586f91551a28ab54768b7be3/html5/thumbnails/4.jpg)
Kafka
Example 1WritesReads
John saw Pin 1, Pin 2, …Pin K at Time T
Online event tracking system
![Page 5: Replicated RocksDB at Pinterest @scale 2016 San Jose](https://reader036.fdocuments.net/reader036/viewer/2022062401/586f91551a28ab54768b7be3/html5/thumbnails/5.jpg)
Kafka
Example 1Writes
Fetch the last 1,000 Pins seen by John
Reads
John saw Pin 1, Pin 2, …Pin K at Time T
Online event tracking system
![Page 6: Replicated RocksDB at Pinterest @scale 2016 San Jose](https://reader036.fdocuments.net/reader036/viewer/2022062401/586f91551a28ab54768b7be3/html5/thumbnails/6.jpg)
Kafka
Example 1Writes
Fetch the last 1,000 Pins seen by John
Fetch the number of Pins seen by John between Time T1 and T2
Reads
John saw Pin 1, Pin 2, …Pin K at Time T
Online event tracking system
![Page 7: Replicated RocksDB at Pinterest @scale 2016 San Jose](https://reader036.fdocuments.net/reader036/viewer/2022062401/586f91551a28ab54768b7be3/html5/thumbnails/7.jpg)
Kafka
Example 2WritesReads
John just followed Board 1
Board based Pin retrieving and ranking system
![Page 8: Replicated RocksDB at Pinterest @scale 2016 San Jose](https://reader036.fdocuments.net/reader036/viewer/2022062401/586f91551a28ab54768b7be3/html5/thumbnails/8.jpg)
Kafka
Example 2WritesReads
John just followed Board 1
Board based Pin retrieving and ranking system
Pin 1 was just saved to Board 1
![Page 9: Replicated RocksDB at Pinterest @scale 2016 San Jose](https://reader036.fdocuments.net/reader036/viewer/2022062401/586f91551a28ab54768b7be3/html5/thumbnails/9.jpg)
Kafka
Example 2Writes
Fetch the most relevant Pins followed by John
Reads
John just followed Board 1
Board based Pin retrieving and ranking system
Pin 1 was just saved to Board 1
![Page 10: Replicated RocksDB at Pinterest @scale 2016 San Jose](https://reader036.fdocuments.net/reader036/viewer/2022062401/586f91551a28ab54768b7be3/html5/thumbnails/10.jpg)
Kafka
Example 3WritesReads
Add u to HyperLogLog A
Distributed storage system with data structure support
![Page 11: Replicated RocksDB at Pinterest @scale 2016 San Jose](https://reader036.fdocuments.net/reader036/viewer/2022062401/586f91551a28ab54768b7be3/html5/thumbnails/11.jpg)
Kafka
Example 3WritesReads
Add u to HyperLogLog A
Distributed storage system with data structure support
Add e to List B
![Page 12: Replicated RocksDB at Pinterest @scale 2016 San Jose](https://reader036.fdocuments.net/reader036/viewer/2022062401/586f91551a28ab54768b7be3/html5/thumbnails/12.jpg)
Fetch List B
Kafka
Example 3WritesReads
Add u to HyperLogLog A
Distributed storage system with data structure support
Add e to List B
![Page 13: Replicated RocksDB at Pinterest @scale 2016 San Jose](https://reader036.fdocuments.net/reader036/viewer/2022062401/586f91551a28ab54768b7be3/html5/thumbnails/13.jpg)
Fetch List B
Fetch the unique member # of HyperLogLog A
Kafka
Example 3WritesReads
Add u to HyperLogLog A
Distributed storage system with data structure support
Add e to List B
![Page 14: Replicated RocksDB at Pinterest @scale 2016 San Jose](https://reader036.fdocuments.net/reader036/viewer/2022062401/586f91551a28ab54768b7be3/html5/thumbnails/14.jpg)
![Page 15: Replicated RocksDB at Pinterest @scale 2016 San Jose](https://reader036.fdocuments.net/reader036/viewer/2022062401/586f91551a28ab54768b7be3/html5/thumbnails/15.jpg)
![Page 16: Replicated RocksDB at Pinterest @scale 2016 San Jose](https://reader036.fdocuments.net/reader036/viewer/2022062401/586f91551a28ab54768b7be3/html5/thumbnails/16.jpg)
![Page 17: Replicated RocksDB at Pinterest @scale 2016 San Jose](https://reader036.fdocuments.net/reader036/viewer/2022062401/586f91551a28ab54768b7be3/html5/thumbnails/17.jpg)
![Page 18: Replicated RocksDB at Pinterest @scale 2016 San Jose](https://reader036.fdocuments.net/reader036/viewer/2022062401/586f91551a28ab54768b7be3/html5/thumbnails/18.jpg)
RocksDB Replicator
Application API Admin API
Generate cluster config
Application Logic Admin Logic ZooKeeper
Admin tool
Common system architecture
Rocks DBRocks DBRocks DBRocks DB
![Page 19: Replicated RocksDB at Pinterest @scale 2016 San Jose](https://reader036.fdocuments.net/reader036/viewer/2022062401/586f91551a28ab54768b7be3/html5/thumbnails/19.jpg)
RocksDB Replicator
Generate cluster config
Admin tool
Load configwhen start
Application API Admin API
Application Logic Admin Logic ZooKeeper
Common system architecture
Rocks DBRocks DBRocks DBRocks DB
![Page 20: Replicated RocksDB at Pinterest @scale 2016 San Jose](https://reader036.fdocuments.net/reader036/viewer/2022062401/586f91551a28ab54768b7be3/html5/thumbnails/20.jpg)
RocksDB Replicator
Generate cluster config
Admin tool
Load configwhen start
ZooKeeper
Application API Admin API
Application Logic Admin Logic
Create/Open DB
Common system architecture
Rocks DBRocks DBRocks DBRocks DB
![Page 21: Replicated RocksDB at Pinterest @scale 2016 San Jose](https://reader036.fdocuments.net/reader036/viewer/2022062401/586f91551a28ab54768b7be3/html5/thumbnails/21.jpg)
RocksDB Replicator
Generate cluster config
Admin tool
Load configwhen start
ZooKeeper
Add/Remove DB for replication
Application API Admin API
Application Logic Admin Logic
Create/Open DB
Common system architecture
Rocks DBRocks DBRocks DBRocks DB
![Page 22: Replicated RocksDB at Pinterest @scale 2016 San Jose](https://reader036.fdocuments.net/reader036/viewer/2022062401/586f91551a28ab54768b7be3/html5/thumbnails/22.jpg)
Generate cluster config
Admin tool
Load configwhen start
Create/Open DB Add/Remove DB for replication
Data Replicationlocal updates
remote updates
Application API Admin API
Application Logic Admin Logic
RocksDB Replicator
ZooKeeper
Common system architecture
Rocks DBRocks DBRocks DBRocks DB
![Page 23: Replicated RocksDB at Pinterest @scale 2016 San Jose](https://reader036.fdocuments.net/reader036/viewer/2022062401/586f91551a28ab54768b7be3/html5/thumbnails/23.jpg)
Generate cluster config
Load configwhen start
Create/Open DB Add/Remove DB for replication
Data Replicationlocal updates
remote updates
RocksDB Replicator
ZooKeeper
Cluster management
Application API Admin APIAdmin tool
Application Logic Admin Logic
Common system architecture
Rocks DBRocks DBRocks DBRocks DB
![Page 24: Replicated RocksDB at Pinterest @scale 2016 San Jose](https://reader036.fdocuments.net/reader036/viewer/2022062401/586f91551a28ab54768b7be3/html5/thumbnails/24.jpg)
Cluster managementGenerate cluster config
Load configwhen start
Create/Open DB Add/Remove DB for replication
Data Replicationlocal updates
remote updates
RocksDB Replicator
Admin tool
GetDB()
Application API Admin API
Admin Logic ZooKeeperApplication Logic
Common system architecture
Rocks DBRocks DBRocks DBRocks DB
![Page 25: Replicated RocksDB at Pinterest @scale 2016 San Jose](https://reader036.fdocuments.net/reader036/viewer/2022062401/586f91551a28ab54768b7be3/html5/thumbnails/25.jpg)
Cluster managementGenerate cluster config
Load configwhen start
Create/Open DB Add/Remove DB for replication
Data Replicationlocal updates
remote updates
RocksDB Replicator
Admin tool
GetDB()ZooKeeper
Read/Write
Common system architectureApplication API Admin API
Application Logic Admin Logic
Rocks DBRocks DBRocks DBRocks DB
![Page 26: Replicated RocksDB at Pinterest @scale 2016 San Jose](https://reader036.fdocuments.net/reader036/viewer/2022062401/586f91551a28ab54768b7be3/html5/thumbnails/26.jpg)
RocksDB replicator design•Support async Master-Slave replication only•Replicate multiple RocksDBs in one process•Replication role at RocksDB instance level•Work reactively ( AddDB(), RemoveDB() )•Low replication latency
![Page 27: Replicated RocksDB at Pinterest @scale 2016 San Jose](https://reader036.fdocuments.net/reader036/viewer/2022062401/586f91551a28ab54768b7be3/html5/thumbnails/27.jpg)
RocksDB replicator implementation•RocksDB WAL sequence # as global replication sequence #
•fbthrift for RPC•Pull & Push
![Page 28: Replicated RocksDB at Pinterest @scale 2016 San Jose](https://reader036.fdocuments.net/reader036/viewer/2022062401/586f91551a28ab54768b7be3/html5/thumbnails/28.jpg)
Latest SEQ #
Thrift Server
Worker threads
RocksDB replicator workflow
DB1 Master
DB2 Slave
Upstream: ip_Port
![Page 29: Replicated RocksDB at Pinterest @scale 2016 San Jose](https://reader036.fdocuments.net/reader036/viewer/2022062401/586f91551a28ab54768b7be3/html5/thumbnails/29.jpg)
Get update sinceSEQ# for DB2Latest SEQ #
Thrift Server
Worker threads
RocksDB replicator workflow
DB1 Master
DB2 Slave
Upstream: ip_Port
![Page 30: Replicated RocksDB at Pinterest @scale 2016 San Jose](https://reader036.fdocuments.net/reader036/viewer/2022062401/586f91551a28ab54768b7be3/html5/thumbnails/30.jpg)
Get update sinceSEQ# for DB2
Updates since SEQ# for DB2
Latest SEQ #
Thrift Server
Worker threads
RocksDB replicator workflow
DB1 Master
DB2 Slave
Upstream: ip_Port
![Page 31: Replicated RocksDB at Pinterest @scale 2016 San Jose](https://reader036.fdocuments.net/reader036/viewer/2022062401/586f91551a28ab54768b7be3/html5/thumbnails/31.jpg)
Apply updates
Get update sinceSEQ# for DB2
Updates since SEQ# for DB2
Latest SEQ #
Thrift Server
Worker threads
RocksDB replicator workflow
DB1 Master
DB2 Slave
Upstream: ip_Port
![Page 32: Replicated RocksDB at Pinterest @scale 2016 San Jose](https://reader036.fdocuments.net/reader036/viewer/2022062401/586f91551a28ab54768b7be3/html5/thumbnails/32.jpg)
Get updates since SEQ# for DB1
Thrift Server
Worker threads
RocksDB replicator workflow
DB1 Master
DB2 Slave
Upstream: ip_Port
![Page 33: Replicated RocksDB at Pinterest @scale 2016 San Jose](https://reader036.fdocuments.net/reader036/viewer/2022062401/586f91551a28ab54768b7be3/html5/thumbnails/33.jpg)
Get updates since SEQ# for DB1
Thrift Server
Worker threads
Send request
RocksDB replicator workflow
DB1 Master
DB2 Slave
Upstream: ip_Port
![Page 34: Replicated RocksDB at Pinterest @scale 2016 San Jose](https://reader036.fdocuments.net/reader036/viewer/2022062401/586f91551a28ab54768b7be3/html5/thumbnails/34.jpg)
Get updates since SEQ# for DB1
Thrift Server
Worker threads
Send request
Has updates since SEQ#?
RocksDB replicator workflow
DB1 Master
DB2 Slave
Upstream: ip_Port
![Page 35: Replicated RocksDB at Pinterest @scale 2016 San Jose](https://reader036.fdocuments.net/reader036/viewer/2022062401/586f91551a28ab54768b7be3/html5/thumbnails/35.jpg)
Get updates since SEQ# for DB1
Thrift Server
Worker threads
Send requestYes, this is the data
Has updates since SEQ#?
RocksDB replicator workflow
DB1 Master
DB2 Slave
Upstream: ip_Port
![Page 36: Replicated RocksDB at Pinterest @scale 2016 San Jose](https://reader036.fdocuments.net/reader036/viewer/2022062401/586f91551a28ab54768b7be3/html5/thumbnails/36.jpg)
Get updates since SEQ# for DB1
Thrift Server
Worker threads
Send requestResponseYes, this is the data
Has updates since SEQ#?
RocksDB replicator workflow
DB1 Master
DB2 Slave
Upstream: ip_Port
![Page 37: Replicated RocksDB at Pinterest @scale 2016 San Jose](https://reader036.fdocuments.net/reader036/viewer/2022062401/586f91551a28ab54768b7be3/html5/thumbnails/37.jpg)
Response
Get updates since SEQ# for DB1
Thrift Server
Worker threads
Send requestResponseYes, this is the data
Has updates since SEQ#?
RocksDB replicator workflow
DB1 Master
DB2 Slave
Upstream: ip_Port
![Page 38: Replicated RocksDB at Pinterest @scale 2016 San Jose](https://reader036.fdocuments.net/reader036/viewer/2022062401/586f91551a28ab54768b7be3/html5/thumbnails/38.jpg)
Get updates since SEQ# for DB1
Thrift Server
Worker threads
RocksDB replicator workflow
DB1 Master
DB2 Slave
Upstream: ip_Port
![Page 39: Replicated RocksDB at Pinterest @scale 2016 San Jose](https://reader036.fdocuments.net/reader036/viewer/2022062401/586f91551a28ab54768b7be3/html5/thumbnails/39.jpg)
Get updates since SEQ# for DB1
Thrift Server
Worker threads
Send request
RocksDB replicator workflow
DB1 Master
DB2 Slave
Upstream: ip_Port
![Page 40: Replicated RocksDB at Pinterest @scale 2016 San Jose](https://reader036.fdocuments.net/reader036/viewer/2022062401/586f91551a28ab54768b7be3/html5/thumbnails/40.jpg)
Get updates since SEQ# for DB1
Thrift Server
Worker threads
Send request
Has updates since SEQ#?
RocksDB replicator workflow
DB1 Master
DB2 Slave
Upstream: ip_Port
![Page 41: Replicated RocksDB at Pinterest @scale 2016 San Jose](https://reader036.fdocuments.net/reader036/viewer/2022062401/586f91551a28ab54768b7be3/html5/thumbnails/41.jpg)
No, wait for my notification
Get updates since SEQ# for DB1
Thrift Server
Worker threads
Send request
Has updates since SEQ#?
RocksDB replicator workflow
DB1 Master
DB2 Slave
Upstream: ip_Port
![Page 42: Replicated RocksDB at Pinterest @scale 2016 San Jose](https://reader036.fdocuments.net/reader036/viewer/2022062401/586f91551a28ab54768b7be3/html5/thumbnails/42.jpg)
Get updates since SEQ# for DB1
Thrift Server
Worker threads
Send request
Writes
No, wait for my notification
Has updates since SEQ#?
RocksDB replicator workflow
DB1 Master
DB2 Slave
Upstream: ip_Port
![Page 43: Replicated RocksDB at Pinterest @scale 2016 San Jose](https://reader036.fdocuments.net/reader036/viewer/2022062401/586f91551a28ab54768b7be3/html5/thumbnails/43.jpg)
Get updates since SEQ# for DB1
Thrift Server
Worker threads
Send requestNo, wait for my notification
Has updates since SEQ#?
These are the new updates
RocksDB replicator workflowWrites
DB1 Master
DB2 Slave
Upstream: ip_Port
![Page 44: Replicated RocksDB at Pinterest @scale 2016 San Jose](https://reader036.fdocuments.net/reader036/viewer/2022062401/586f91551a28ab54768b7be3/html5/thumbnails/44.jpg)
Get updates since SEQ# for DB1
Thrift Server
Worker threads
Send requestNo, wait for my notification
Has updates since SEQ#?
These are the new updates
Response
RocksDB replicator workflow
DB1 Master
DB2 Slave
Upstream: ip_Port
Writes
![Page 45: Replicated RocksDB at Pinterest @scale 2016 San Jose](https://reader036.fdocuments.net/reader036/viewer/2022062401/586f91551a28ab54768b7be3/html5/thumbnails/45.jpg)
Response
Get updates since SEQ# for DB1
Thrift Server
Worker threads
RocksDB replicator workflow
Send requestNo, wait for my notification
Has updates since SEQ#?
These are the new updates
Response
DB1 Master
DB2 Slave
Upstream: ip_Port
Writes
![Page 46: Replicated RocksDB at Pinterest @scale 2016 San Jose](https://reader036.fdocuments.net/reader036/viewer/2022062401/586f91551a28ab54768b7be3/html5/thumbnails/46.jpg)
•Production load: 1MB/s, P99 12ms, Max 60ms•Synthetic load: 76MB/s, P99 106ms, Max 224ms•Developer velocity: Build a production quality real-time counter service in one week
Performance
![Page 47: Replicated RocksDB at Pinterest @scale 2016 San Jose](https://reader036.fdocuments.net/reader036/viewer/2022062401/586f91551a28ab54768b7be3/html5/thumbnails/47.jpg)
Cluster managementGenerate cluster config
Load configwhen start
Create/Open DB Add/Remove DB for replication
Data Replicationlocal updates
remote updates
RocksDB Replicator
Admin tool
GetDB()ZooKeeper
Read/Write
Application API Admin API
Rocks DBRocks DBRocks DBRocks DB
Application Logic Admin Logic
Open source - coming soon
![Page 48: Replicated RocksDB at Pinterest @scale 2016 San Jose](https://reader036.fdocuments.net/reader036/viewer/2022062401/586f91551a28ab54768b7be3/html5/thumbnails/48.jpg)
Serving Systems Team @Pinterest
Thank you
Bo Liu, Shu Zhang, Jian Fang, Jinru He, Linda Lo, Yongsheng Wu
Data Analytics Team @PinterestBryant Xiao, Justin Mejorada Pier, Shuo Xiang,Qingxian Lai, Tien Nguyen, Chunyan Wang
![Page 49: Replicated RocksDB at Pinterest @scale 2016 San Jose](https://reader036.fdocuments.net/reader036/viewer/2022062401/586f91551a28ab54768b7be3/html5/thumbnails/49.jpg)
Q&A