MongoDB at Baidu
-
Upload
mat-keep -
Category
Technology
-
view
1.319 -
download
0
Transcript of MongoDB at Baidu
![Page 1: MongoDB at Baidu](https://reader030.fdocuments.net/reader030/viewer/2022021420/58e73be31a28ab49038b528b/html5/thumbnails/1.jpg)
MongoDB@Baidu
Xiao Beibei Project Owner & Senior Developer
![Page 2: MongoDB at Baidu](https://reader030.fdocuments.net/reader030/viewer/2022021420/58e73be31a28ab49038b528b/html5/thumbnails/2.jpg)
Baidu
![Page 3: MongoDB at Baidu](https://reader030.fdocuments.net/reader030/viewer/2022021420/58e73be31a28ab49038b528b/html5/thumbnails/3.jpg)
Who are we?
ü Largest internet search services in China
ü Various products, solu=ons & services
ü NASDAQ: BIDU Market Cap: 64B Revenue: 10B Qtrly Growth: 33.10%
![Page 4: MongoDB at Baidu](https://reader030.fdocuments.net/reader030/viewer/2022021420/58e73be31a28ab49038b528b/html5/thumbnails/4.jpg)
Story between 2 “Giants”
+Who am
I? ü Senior NoSQL Developer
ü Various MongoDB project owner
ü In charge of the LARGEST MongoDB cluster in CHINA
![Page 5: MongoDB at Baidu](https://reader030.fdocuments.net/reader030/viewer/2022021420/58e73be31a28ab49038b528b/html5/thumbnails/5.jpg)
Where MongoDB fits?
![Page 6: MongoDB at Baidu](https://reader030.fdocuments.net/reader030/viewer/2022021420/58e73be31a28ab49038b528b/html5/thumbnails/6.jpg)
Small Step à Big Surprise
l Start from Baidu Address Book
ü Small project
ü Various sources
ü Flexible schema
l more than 3 hundred million
users
![Page 7: MongoDB at Baidu](https://reader030.fdocuments.net/reader030/viewer/2022021420/58e73be31a28ab49038b528b/html5/thumbnails/7.jpg)
Success + Confidence = More Projects • Message & Mul=media Message Projects • Netdisk picture meta data • Facial Recogni=on System • User Opera=on Log System • Baidu Cloud • Baidu Post Bar … …
ü Over 100 businesses ü Drive meta data > 200B ü PB Level
![Page 8: MongoDB at Baidu](https://reader030.fdocuments.net/reader030/viewer/2022021420/58e73be31a28ab49038b528b/html5/thumbnails/8.jpg)
Big MongoDB Cluster • Consolidate the entrance • All use SSD + raid 0 • Most 1 Master, 2 Secondary, 2 Arbiter • Some 1 Master, 2 Secondary, 1 Arbiter
Standard Mongodb Cluster
Standard Mongodb Cluster ….
Rest mongoDB service Api
… mongos
P
S…
A…
P S…
A…
config
![Page 9: MongoDB at Baidu](https://reader030.fdocuments.net/reader030/viewer/2022021420/58e73be31a28ab49038b528b/html5/thumbnails/9.jpg)
How we use MongoDB?
![Page 10: MongoDB at Baidu](https://reader030.fdocuments.net/reader030/viewer/2022021420/58e73be31a28ab49038b528b/html5/thumbnails/10.jpg)
Throughput !!!
• All run good, BUT when WRITES > 10 thousands qps
Query Slow
Writes Timeout Mongod
Memory Usage Increase
Reads impact, Query Slow
Problem
![Page 11: MongoDB at Baidu](https://reader030.fdocuments.net/reader030/viewer/2022021420/58e73be31a28ab49038b528b/html5/thumbnails/11.jpg)
Simple way is the BEST! Root Cause Cache Replacement
In 3.0, Cache replacement works not quite efficiently
Try to Pilot Upgrade to 3.2
Solu=on
![Page 12: MongoDB at Baidu](https://reader030.fdocuments.net/reader030/viewer/2022021420/58e73be31a28ab49038b528b/html5/thumbnails/12.jpg)
Replica=on makes this possible Problem
Online index crea=on issue • Time-‐Consuming • Direct or background • Write =meout during crea=ng
Solu=on
• Crea=ng index in turn • Secondary first and primary last • Oplog =me
![Page 13: MongoDB at Baidu](https://reader030.fdocuments.net/reader030/viewer/2022021420/58e73be31a28ab49038b528b/html5/thumbnails/13.jpg)
Big Issue Problem
Why? • MongoDB balancer user single thread to move data • Cons & Pros
Query Slow!!!
Data increases rapidly à Clusters increase accordingly Largest cluster = 160 shards, 2T each
![Page 14: MongoDB at Baidu](https://reader030.fdocuments.net/reader030/viewer/2022021420/58e73be31a28ab49038b528b/html5/thumbnails/14.jpg)
Mi=ga=on • Reduced the balancer window from 24 to 6 hours, so that it ran in off-‐
peak hours • Good way for a period =me, BUT when more …
• Shard key: uid or Hash? • Pre-‐alloca=ng chunks • Balancer or oplog?
Solu=on
![Page 15: MongoDB at Baidu](https://reader030.fdocuments.net/reader030/viewer/2022021420/58e73be31a28ab49038b528b/html5/thumbnails/15.jpg)
Na=ve Auto Balance
Config Server Mongos
shard1 shard2
Please receive data
Data Transferring …
Update Chunk Manager Update Chunk Manager
Update Chunk Informa=on
Update Chunk Cache
Delete or Not delete
Incremental data sync
Move certain chunk to shard2
Solu=on
![Page 16: MongoDB at Baidu](https://reader030.fdocuments.net/reader030/viewer/2022021420/58e73be31a28ab49038b528b/html5/thumbnails/16.jpg)
Modified Balancer
Data Transferring …
Update Chunk Manager Update Chunk Manager
Update Chunk Informa=on
Update when WriteBack
Solu=on
Config Server Mongos
shard1 shard2
![Page 17: MongoDB at Baidu](https://reader030.fdocuments.net/reader030/viewer/2022021420/58e73be31a28ab49038b528b/html5/thumbnails/17.jpg)
Itera=on in Detail
IdenFfy a range to be migrated Identify
Take a note of the current oplog Fme Record
Send a query to source shard, and iterate over the returned cursor to write matching documents to the desFnaFon shard
Query
Scan the oplog from the source shard for events recorded from Fmestamp recorded at the start of this pass; matching events are then wriLen to the desFnaFon shard
Scan & Match
When the last oplog event has been applied, the pass has completed and the worker process can be stopped
Apply
![Page 18: MongoDB at Baidu](https://reader030.fdocuments.net/reader030/viewer/2022021420/58e73be31a28ab49038b528b/html5/thumbnails/18.jpg)
Summary
![Page 19: MongoDB at Baidu](https://reader030.fdocuments.net/reader030/viewer/2022021420/58e73be31a28ab49038b528b/html5/thumbnails/19.jpg)
Quick Summary
• Early adop=on makes us
• 100+ diverse app & more are coming
• $$$ Cost saving with awesome scalability
• Con=nuous improvements = Confidence
• Add LSM to WT to have beier insert performance • Mulitmaster as an op=on
![Page 20: MongoDB at Baidu](https://reader030.fdocuments.net/reader030/viewer/2022021420/58e73be31a28ab49038b528b/html5/thumbnails/20.jpg)
Key Take away • Baidu = Big system + Big data + Big challenge
– We need a strong & scalable DB architecture, MongoDB is fantas=c!
• Upgrading to 3.x is a MUST – WT engine, Document valida=on, …
• Innova=on & Automa=on via customized scripts
MongoDB CAN manage our “BIG DATA”
600 nodes 160 shards
200 B documents
![Page 21: MongoDB at Baidu](https://reader030.fdocuments.net/reader030/viewer/2022021420/58e73be31a28ab49038b528b/html5/thumbnails/21.jpg)
Next Steps MongoDB: is enhancing balancer performance
Working with MongoDB as the beta tester for the new feature
Enabling parallel chunk migra=on Remove Throiling by Default (for WiredTiger)
![Page 22: MongoDB at Baidu](https://reader030.fdocuments.net/reader030/viewer/2022021420/58e73be31a28ab49038b528b/html5/thumbnails/22.jpg)
+Questions?