Schema Design - Real world use case
-
Upload
matias-cascallares -
Category
Technology
-
view
1.079 -
download
0
description
Transcript of Schema Design - Real world use case
![Page 1: Schema Design - Real world use case](https://reader035.fdocuments.net/reader035/viewer/2022062312/555e2418d8b42a6a4c8b4dbd/html5/thumbnails/1.jpg)
Consulting Engineer, MongoDB
Matias Cascallares
#MongoDBDays
Schema DesignReal World Use Case
![Page 2: Schema Design - Real world use case](https://reader035.fdocuments.net/reader035/viewer/2022062312/555e2418d8b42a6a4c8b4dbd/html5/thumbnails/2.jpg)
Agenda
• Why is schema design important
• A real world use case– Social Inbox– History
• Conclusions
![Page 3: Schema Design - Real world use case](https://reader035.fdocuments.net/reader035/viewer/2022062312/555e2418d8b42a6a4c8b4dbd/html5/thumbnails/3.jpg)
Why is Schema Design important?
• Largest factor for a performant system
• Schema design with MongoDB is different• RDBMS – "What answers do I have?"• MongoDB – "What question will I have?"
![Page 4: Schema Design - Real world use case](https://reader035.fdocuments.net/reader035/viewer/2022062312/555e2418d8b42a6a4c8b4dbd/html5/thumbnails/4.jpg)
#1 – Message Inbox
![Page 5: Schema Design - Real world use case](https://reader035.fdocuments.net/reader035/viewer/2022062312/555e2418d8b42a6a4c8b4dbd/html5/thumbnails/5.jpg)
• Let’s get
• Social
![Page 6: Schema Design - Real world use case](https://reader035.fdocuments.net/reader035/viewer/2022062312/555e2418d8b42a6a4c8b4dbd/html5/thumbnails/6.jpg)
Sending Messages
?
![Page 7: Schema Design - Real world use case](https://reader035.fdocuments.net/reader035/viewer/2022062312/555e2418d8b42a6a4c8b4dbd/html5/thumbnails/7.jpg)
Reading my Inbox
?
![Page 8: Schema Design - Real world use case](https://reader035.fdocuments.net/reader035/viewer/2022062312/555e2418d8b42a6a4c8b4dbd/html5/thumbnails/8.jpg)
Design Goals
• Efficiently send new messages to recipients
• Efficiently read inbox
![Page 9: Schema Design - Real world use case](https://reader035.fdocuments.net/reader035/viewer/2022062312/555e2418d8b42a6a4c8b4dbd/html5/thumbnails/9.jpg)
3 Approaches (there are more)
• Fan out on Read
• Fan out on Write
• Fan out on Write with Bucketing
![Page 10: Schema Design - Real world use case](https://reader035.fdocuments.net/reader035/viewer/2022062312/555e2418d8b42a6a4c8b4dbd/html5/thumbnails/10.jpg)
// Shard on "from"db.shardCollection( "mongodbdays.inbox", { from: 1 } )
// Make sure we have an index to handle inbox readsdb.inbox.ensureIndex( { to: 1, sent: 1 } )
msg = {from: ”Matias",to: [ "Bob", "Jane" ],
sent: new Date(), message: "Hi!",
}
// Send a messagedb.inbox.save( msg )
// Read my inboxdb.inbox.find( { to: ”Matias" } ).sort( { sent: -1 } )
Fan out on read
Schema Design, Matias Cascallares
![Page 11: Schema Design - Real world use case](https://reader035.fdocuments.net/reader035/viewer/2022062312/555e2418d8b42a6a4c8b4dbd/html5/thumbnails/11.jpg)
Fan out on read – IO
Shard 1 Shard 2 Shard 3
Send Message
![Page 12: Schema Design - Real world use case](https://reader035.fdocuments.net/reader035/viewer/2022062312/555e2418d8b42a6a4c8b4dbd/html5/thumbnails/12.jpg)
Fan out on read – IO
Shard 1 Shard 2 Shard 3
Read Inbox
![Page 13: Schema Design - Real world use case](https://reader035.fdocuments.net/reader035/viewer/2022062312/555e2418d8b42a6a4c8b4dbd/html5/thumbnails/13.jpg)
Considerations
• Write: one document per message sent
• Reading my inbox means finding all messages with my own name in the recipient field
• Read: requires scatter-gather on sharded cluster
• Then a lot of random IO on a shard to find everything
![Page 14: Schema Design - Real world use case](https://reader035.fdocuments.net/reader035/viewer/2022062312/555e2418d8b42a6a4c8b4dbd/html5/thumbnails/14.jpg)
// Shard on “recipient” and “sent” db.shardCollection( "mongodbdays.inbox", { ”recipient”: 1, ”sent”: 1 } )
msg = {from: ”Matias",to: [ "Bob", "Jane" ],
sent: new Date(), message: "Hi!",
}
// Send a messagefor ( recipient in msg.to ) {
msg.recipient = recipientdb.inbox.save( msg );
}
// Read my inboxdb.inbox.find( { recipient: "Matias" } ).sort( { sent: -1 } )
Fan out on write
Schema Design, Matias Cascallares
![Page 15: Schema Design - Real world use case](https://reader035.fdocuments.net/reader035/viewer/2022062312/555e2418d8b42a6a4c8b4dbd/html5/thumbnails/15.jpg)
Fan out on write – IO
Shard 1 Shard 2 Shard 3
Send Message
![Page 16: Schema Design - Real world use case](https://reader035.fdocuments.net/reader035/viewer/2022062312/555e2418d8b42a6a4c8b4dbd/html5/thumbnails/16.jpg)
Fan out on write – IO
Shard 1 Shard 2 Shard 3
Read Inbox
![Page 17: Schema Design - Real world use case](https://reader035.fdocuments.net/reader035/viewer/2022062312/555e2418d8b42a6a4c8b4dbd/html5/thumbnails/17.jpg)
Considerations
• Write: one document per recipient
• Reading my inbox is just finding all of the messages with me as the recipient
• Can shard on recipient, so inbox reads hit one shard
• But still lots of random IO on the shard
![Page 18: Schema Design - Real world use case](https://reader035.fdocuments.net/reader035/viewer/2022062312/555e2418d8b42a6a4c8b4dbd/html5/thumbnails/18.jpg)
// Shard on “owner / sequence”db.shardCollection( "mongodbdays.inbox", { owner: 1, sequence: 1 } )db.shardCollection( "mongodbdays.users", { user_name: 1 } )
msg = {from: ”Matias",to: [ "Bob", "Jane" ],
sent: new Date(), message: "Hi!",
}
Fan out on write with buckets
Schema Design, Matias Cascallares
![Page 19: Schema Design - Real world use case](https://reader035.fdocuments.net/reader035/viewer/2022062312/555e2418d8b42a6a4c8b4dbd/html5/thumbnails/19.jpg)
// Send a messagefor( recipient in msg.to ) { count = db.users.findAndModify({
query: { user_name: recipient }, update: { "$inc": { "msg_count": 1 } }, upsert: true, new: true }).msg_count;
sequence = Math.floor(count / 50);
db.inbox.update({ owner: recipient, sequence: sequence }, { $push: { "messages": msg } },{ upsert: true }
);}
// Read my inboxdb.inbox.find( { owner: "Matias" } ).sort ( { sequence: -1 } ).limit( 2 )
Fan out on write with buckets
Schema Design, Matias Cascallares
![Page 20: Schema Design - Real world use case](https://reader035.fdocuments.net/reader035/viewer/2022062312/555e2418d8b42a6a4c8b4dbd/html5/thumbnails/20.jpg)
Fan out on write with buckets
• Each “inbox” document is an array of messages
• Append a message onto “inbox” of recipient
• Bucket inboxes so there’s not too many messages per document
• Can shard on recipient, so inbox reads hit one shard
• 1 or 2 documents to read the whole inbox
![Page 21: Schema Design - Real world use case](https://reader035.fdocuments.net/reader035/viewer/2022062312/555e2418d8b42a6a4c8b4dbd/html5/thumbnails/21.jpg)
Fan out on write with buckets - IO
Shard 1 Shard 2 Shard 3
Send Message
![Page 22: Schema Design - Real world use case](https://reader035.fdocuments.net/reader035/viewer/2022062312/555e2418d8b42a6a4c8b4dbd/html5/thumbnails/22.jpg)
Fan out on write with buckets - IO
Shard 1 Shard 2 Shard 3
Read Inbox
![Page 23: Schema Design - Real world use case](https://reader035.fdocuments.net/reader035/viewer/2022062312/555e2418d8b42a6a4c8b4dbd/html5/thumbnails/23.jpg)
#2 – History
![Page 24: Schema Design - Real world use case](https://reader035.fdocuments.net/reader035/viewer/2022062312/555e2418d8b42a6a4c8b4dbd/html5/thumbnails/24.jpg)
![Page 25: Schema Design - Real world use case](https://reader035.fdocuments.net/reader035/viewer/2022062312/555e2418d8b42a6a4c8b4dbd/html5/thumbnails/25.jpg)
Design Goals
Need to retain a limited amount of history e.g.
– Number of items– Hours, Days, Weeks– May be legislative requirement (e.g. HIPPA, SOX,
DPA)
Need to query efficiently by – match– ranges
![Page 26: Schema Design - Real world use case](https://reader035.fdocuments.net/reader035/viewer/2022062312/555e2418d8b42a6a4c8b4dbd/html5/thumbnails/26.jpg)
3 Approaches (there are more)
• Bucket by number of messages
• Fixed size array
• Bucket by date + TTL Collections
![Page 27: Schema Design - Real world use case](https://reader035.fdocuments.net/reader035/viewer/2022062312/555e2418d8b42a6a4c8b4dbd/html5/thumbnails/27.jpg)
db.inbox.find() { owner: "Matias", sequence: 25, messages: [ { from: "Matias", to: [ "Bob", "Jane" ], sent: ISODate("2013-03-01T09:59:42.689Z"), message: "Hi!" }, …] }
// Query with a date rangedb.inbox.find({ owner: "Matias", messages: { $elemMatch: {sent:{$gt: ISODate("…") }}}})
// Remove elements based on a datedb.inbox.update({ owner: "Matias" }, { $pull: { messages: { sent: { $lt: ISODate("…") } } } } )
Bucket by number of messages
Schema Design, Matias Cascallares
![Page 28: Schema Design - Real world use case](https://reader035.fdocuments.net/reader035/viewer/2022062312/555e2418d8b42a6a4c8b4dbd/html5/thumbnails/28.jpg)
Considerations
• Shrinking documents, space can be reclaimed with– db.runCommand ( { compact: '<collection>' } )
• Removing the document after the last element in the array as been removed– { "_id" : …, "messages" : [ ], "owner" : ”Bob", "sequence" : 0 }
![Page 29: Schema Design - Real world use case](https://reader035.fdocuments.net/reader035/viewer/2022062312/555e2418d8b42a6a4c8b4dbd/html5/thumbnails/29.jpg)
msg = { from: "Your Boss", to: [ "Bob" ],
sent: new Date(), message: "CALL ME NOW!"
}
// 2.4 Introduces $each, $sort and $slice modifiers for $pushdb.messages.update(
{ _id: 1 }, { $push: { messages: { $each: [ msg ],
$sort: { sent: 1 },
$slice: -50 }
} })
Maintain the latest – Fixed size array
Schema Design, Matias Cascallares
![Page 30: Schema Design - Real world use case](https://reader035.fdocuments.net/reader035/viewer/2022062312/555e2418d8b42a6a4c8b4dbd/html5/thumbnails/30.jpg)
Considerations
• Need to compute the size of the array based on retention period
![Page 31: Schema Design - Real world use case](https://reader035.fdocuments.net/reader035/viewer/2022062312/555e2418d8b42a6a4c8b4dbd/html5/thumbnails/31.jpg)
// messages: one doc per user per day
db.inbox.findOne(){
_id: 1, to: "Joe", sequence: ISODate("2013-02-04T00:00:00.392Z"), messages: [ ] }
// Auto expires data after 31536000 seconds = 1 yeardb.messages.ensureIndex( { sequence: 1 }, { expireAfterSeconds: 31536000 }
)
TTL Collections
Schema Design, Matias Cascallares
![Page 32: Schema Design - Real world use case](https://reader035.fdocuments.net/reader035/viewer/2022062312/555e2418d8b42a6a4c8b4dbd/html5/thumbnails/32.jpg)
Conclusion
![Page 33: Schema Design - Real world use case](https://reader035.fdocuments.net/reader035/viewer/2022062312/555e2418d8b42a6a4c8b4dbd/html5/thumbnails/33.jpg)
Summary
• Multiple ways to model a domain problem
• Understand the key uses cases of your app
• Balance between ease of query vs. ease of write
• Random IO should be avoided
• Scatter/gatter should be avoided
![Page 34: Schema Design - Real world use case](https://reader035.fdocuments.net/reader035/viewer/2022062312/555e2418d8b42a6a4c8b4dbd/html5/thumbnails/34.jpg)
Questions?