MongoDB Schema Design: Four Real-World Examples
-
Upload
mike-friedman -
Category
Technology
-
view
17.417 -
download
2
description
Transcript of MongoDB Schema Design: Four Real-World Examples
Perl Engineer & Evangelist, 10gen
Mike Friedman
#MongoDBdays
Schema DesignFour Real-World Use Cases
Single Table En
Agenda
• Why is schema design important
• 4 Real World Schemas– Inbox– History– Indexed Attributes– Multiple Identities
• Conclusions
Why is Schema Design important?
• Largest factor for a performant system
• Schema design with MongoDB is different
• RDBMS – "What answers do I have?"• MongoDB – "What question will I have?"
#1 - Message Inbox
Let’s getSocial
Sending Messages
?
Design Goals
• Efficiently send new messages to recipients
• Efficiently read inbox
Reading my Inbox
?
3 Approaches (there are more)• Fan out on Read
• Fan out on Write
• Fan out on Write with Bucketing
// Shard on "from"db.shardCollection( "mongodbdays.inbox", { from: 1 } )
// Make sure we have an index to handle inbox readsdb.inbox.ensureIndex( { to: 1, sent: 1 } )
msg = {from: "Joe",to: [ "Bob", "Jane" ],
sent: new Date(), message: "Hi!",
}
// Send a messagedb.inbox.save( msg )
// Read my inboxdb.inbox.find( { to: "Joe" } ).sort( { sent: -1 } )
Fan out on read
Fan out on read – Send Message
Shard 1 Shard 2 Shard 3
Send Message
Fan out on read – Inbox Read
Shard 1 Shard 2 Shard 3
Read Inbox
Considerations
• One document per message sent
• Reading an inbox means finding all messages with my own name in the recipient field
• Requires scatter-gather on sharded cluster
• Then a lot of random IO on a shard to find everything
// Shard on “recipient” and “sent” db.shardCollection( "mongodbdays.inbox", { ”recipient”: 1, ”sent”: 1 } )
msg = {from: "Joe",to: [ "Bob", "Jane" ],
sent: new Date(), message: "Hi!",
}
// Send a messagefor ( recipient in msg.to ) {
msg.recipient = msg.to[recipient]db.inbox.save( msg );
}
// Read my inboxdb.inbox.find( { recipient: "Joe" } ).sort( { sent: -1 } )
Fan out on write
Fan out on write – Send Message
Shard 1 Shard 2 Shard 3
Send Message
Fan out on write– Read Inbox
Shard 1 Shard 2 Shard 3
Read Inbox
Considerations
• One document per recipient
• Reading my inbox is just finding all of the messages with me as the recipient
• Can shard on recipient, so inbox reads hit one shard
• But still lots of random IO on the shard
// Shard on “owner / sequence”db.shardCollection( "mongodbdays.inbox", { owner: 1, sequence: 1 } )db.shardCollection( "mongodbdays.users", { user_name: 1 } )
msg = {from: "Joe",to: [ "Bob", "Jane" ],
sent: new Date(), message: "Hi!",
}
Fan out on write with buckets
// Send a messagefor( recipient in msg.to) { count = db.users.findAndModify({
query: { user_name: msg.to[recipient] }, update: { "$inc": { "msg_count": 1 } }, upsert: true, new: true }).msg_count;
sequence = Math.floor(count / 50);
db.inbox.update({ owner: msg.to[recipient], sequence: sequence }, { $push: { "messages": msg } },{ upsert: true } );
}
// Read my inboxdb.inbox.find( { owner: "Joe" } ).sort ( { sequence: -1 } ).limit( 2 )
Fan out on write with buckets
Fan out on write with buckets• Each “inbox” document is an array of
messages
• Append a message onto “inbox” of recipient
• Bucket inboxes so there’s not too many messages per document
• Can shard on recipient, so inbox reads hit one shard
• 1 or 2 documents to read the whole inbox
Fan out on write with buckets - Send
Shard 1 Shard 2 Shard 3
Send Message
Fan out on write with buckets - Read
Shard 1 Shard 2 Shard 3
Read Inbox
#2 – History
Design Goals
• Need to retain a limited amount of history e.g.– Hours, Days, Weeks– May be legislative requirement (e.g. HIPPA, SOX,
DPA)
• Need to query efficiently by – match– ranges
3 Approaches (there are more)• Bucket by Number of messages
• Fixed size Array
• Bucket by Date + TTL Collections
db.inbox.find() { owner: "Joe", sequence: 25, messages: [ { from: "Joe", to: [ "Bob", "Jane" ], sent: ISODate("2013-03-01T09:59:42.689Z"), message: "Hi!" }, …] }
// Query with a date rangedb.inbox.find ({owner: "friend1", messages: { $elemMatch: {sent:{$gte: ISODate("…") }}}})
// Remove elements based on a datedb.inbox.update({owner: "friend1" }, { $pull: { messages: { sent: { $gte: ISODate("…") } } } } )
Inbox – Bucket by # messages
Considerations
• Shrinking documents, space can be reclaimed with– db.runCommand ( { compact: '<collection>' } )
• Removing the document after the last element in the array as been removed– { "_id" : …, "messages" : [ ], "owner" : "friend1", "sequence" : 0 }
msg = { from: "Your Boss", to: [ "Bob" ],
sent: new Date(), message: "CALL ME NOW!"
}
// 2.4 Introduces $each, $sort and $slice for $pushdb.messages.update(
{ _id: 1 }, { $push: { messages: { $each: [ msg ],
$sort: { sent: 1 },
$slice: -50 }
} })
Maintain the latest – Fixed Size Array
Considerations
• Need to compute the size of the array based on retention period
// messages: one doc per user per day
db.inbox.findOne(){
_id: 1, to: "Joe", sequence: ISODate("2013-02-04T00:00:00.392Z"), messages: [ ] }
// Auto expires data after 31536000 seconds = 1 yeardb.messages.ensureIndex( { sequence: 1 }, { expireAfterSeconds: 31536000 } )
TTL Collections
#3 – Indexed Attributes
Design Goal
• Application needs to stored a variable number of attributes e.g.– User defined Form– Meta Data tags
• Queries needed– Equality– Range based
• Need to be efficient, regardless of the number of attributes
2 Approaches (there are more)• Attributes as Embedded Document
• Attributes as Objects in an Array
db.files.insert( { _id: "local.0", attr: { type: "text", size: 64, created: ISODate("..." } } )
db.files.insert( { _id: "local.1", attr: { type: "text", size: 128} } )
db.files.insert( { _id: "mongod", attr: { type: "binary", size: 256, created: ISODate("...") } } )
// Need to create an index for each item in the sub-documentdb.files.ensureIndex( { "attr.type": 1 } )db.files.find( { "attr.type": "text"} )
// Can perform range queriesdb.files.ensureIndex( { "attr.size": 1 } )db.files.find( { "attr.size": { $gt: 64, $lte: 16384 } } )
Attributes as a Sub-Document
Considerations
• Each attribute needs an Index
• Each time you extend, you add an index
• Lots and lots of indexes
db.files.insert( {_id: "local.0", attr: [ { type: "text" },
{ size: 64 },
{ created: ISODate("...") } ] } )
db.files.insert( { _id: "local.1", attr: [ { type: "text" },
{ size: 128 } ] } )
db.files.insert( { _id: "mongod", attr: [ { type: "binary" },
{ size: 256 }, { created: ISODate("...") } ] } )
db.files.ensureIndex( { attr: 1 } )
Attributes as Objects in Array
Considerations
• Only one index needed on attr
• Can support range queries, etc.
• Index can be used only once per query
#4 – Multiple Identities
Design Goal
• Ability to look up by a number of different identities e.g.
• Username• Email address• FB Handle• LinkedIn URL
2 Approaches (there are more)• Identifiers in a single document
• Separate Identifiers from Content
db.users.findOne(){ _id: "joe", email: "[email protected], fb: "joe.smith", // facebook li: "joe.e.smith", // linkedin other: {…}}
// Shard collection by _iddb.shardCollection("mongodbdays.users", { _id: 1 } )
// Create indexes on each keydb.users.ensureIndex( { email: 1} )db.users.ensureIndex( { fb: 1 } )db.users.ensureIndex( { li: 1 } )
Single Document by User
Read by _id (shard key)
Shard 1 Shard 2 Shard 3
find( { _id: "joe"} )
Considerations
• Lookup by shard key is routed to 1 shard
• Lookup by other identifier is scatter gathered across all shards
• Secondary keys cannot have a unique index
// Create unique indexdb.identities.ensureIndex( { identifier : 1} , { unique: true} )
// Create a document for each users documentdb.identities.save( { identifier : { hndl: "joe" }, user: "1200-42" } )db.identities.save( { identifier : { email: "[email protected]" }, user: "1200-42" } )db.identities.save( { identifier : { li: "joe.e.smith" }, user: "1200-42" } )
// Shard collection by _iddb.shardCollection( "mydb.identities", { identifier : 1 } )
// Create unique indexdb.users.ensureIndex( { _id: 1} , { unique: true} )
// Shard collection by _iddb.shardCollection( "mydb.users", { _id: 1 } )
Document per Identity
Read requires 2 reads
Shard 1 Shard 2 Shard 3
db.identities.find({"identifier" : { "hndl" : "joe" }})
db.users.find( { _id: "1200-42"} )
Considerations
• Lookup to Identities is a routed query
• Lookup to Users is a routed query
• Unique indexes available
Conclusion
Summary
• Multiple ways to model a domain problem
• Understand the key uses cases of your app
• Balance between ease of query vs. ease of write
• Random IO should be avoided
Perl Engineer & Evangelist, 10gen
Mike Friedman
#MongoDBdays
Thank You