MongoSV Schema Workshop

152
Schema Design Workshop Sridhar Nanjundeswaran Software Engineer, 10Gen [email protected] @snanjund Wednesday, December 5, 12

description

 

Transcript of MongoSV Schema Workshop

Page 1: MongoSV Schema Workshop

Schema Design Workshop

Sridhar Nanjundeswaran

Software Engineer, [email protected]

@snanjund

Wednesday, December 5, 12

Page 2: MongoSV Schema Workshop

Agenda

• Part One - Basic Schema & Patterns• Part Two - Schema Design• Part Three - Sharding• Part Four: - Replication

Wednesday, December 5, 12

Page 3: MongoSV Schema Workshop

Why is schema design different?• RDBMS design you ask "what answers do I have"

• MongoDB you ask "what questions will I have"

Wednesday, December 5, 12

Page 4: MongoSV Schema Workshop

Goals

• Learn Data Modeling with MongoDB• Labs to try to solve problems• Understand implications of• Replication • Sharding

Please, ask many, many questions!

Wednesday, December 5, 12

Page 5: MongoSV Schema Workshop

Part OneBasic Schema & Patterns

Wednesday, December 5, 12

Page 6: MongoSV Schema Workshop

So why model data?

http://bit.ly/SSs7QB

Wednesday, December 5, 12

Page 7: MongoSV Schema Workshop

Normalization• 1970 E.F.Codd introduces 1st Normal Form (1NF)• 1971 E.F.Codd introduces 2nd and 3rd Normal Form (2NF, 3NF)• 1974 Codd & Boyce define Boyce/Codd Normal Form (BCNF)• 2002 Date, Darween, Lorentzos define 6th Normal Form (6NF)

Goals:• Avoid anomalies when inserting, updating or deleting• Minimize redesign when extending the schema• Make the model informative to users• Avoid bias towards a particular style of query

* source : wikipediaWednesday, December 5, 12

Page 8: MongoSV Schema Workshop

So today’s example will use...

http://bit.ly/RyIOvO

Wednesday, December 5, 12

Page 9: MongoSV Schema Workshop

TerminologyRDBMS MongoDB

Table Collection

Row(s) JSON  Document

Index Index

Join Embedding  &  Linking

Partition Shard

Partition  Key Shard  Key

Wednesday, December 5, 12

Page 10: MongoSV Schema Workshop

Schema DesignRelational Database

Wednesday, December 5, 12

Page 11: MongoSV Schema Workshop

Schema DesignMongoDB

Wednesday, December 5, 12

Page 12: MongoSV Schema Workshop

Schema DesignMongoDB

linking

Wednesday, December 5, 12

Page 13: MongoSV Schema Workshop

Schema DesignMongoDB

embedding

linking

Wednesday, December 5, 12

Page 14: MongoSV Schema Workshop

Basic schema

Design documents that simply map to your application

> post = { author: "Hergé", date: ISODate("2011-09-18T09:56:06.298Z"), text: "Destination Moon", tags: ["comic", "movie"] }

> db.blogs.save(post)

Wednesday, December 5, 12

Page 15: MongoSV Schema Workshop

> db.blogs.find()

{ _id: ObjectId("4c4ba5c0672c685e5e8aabf3"), author: "Hergé", date: ISODate("2011-09-18T09:56:06.298Z"), text: "Destination Moon", tags: [ "comic", "movie" ] } Notes:• ID must be unique, but can be anything you’d like• MongoDB will generate a default ID if one is not supplied

Find the document

Wednesday, December 5, 12

Page 16: MongoSV Schema Workshop

Secondary index for “author”

// 1 means ascending, -1 means descending> db.blogs.ensureIndex( { author: 1 } )

> db.blogs.find( { author: 'Hergé' } ) { _id: ObjectId("4c4ba5c0672c685e5e8aabf3"), date: ISODate("2011-09-18T09:56:06.298Z"), author: "Hergé", ... }

Add an index, find via Index

Wednesday, December 5, 12

Page 17: MongoSV Schema Workshop

Examine the query plan

> db.blogs.find( { author: "Hergé" } ).explain(){! "cursor" : "BtreeCursor author_1",! "nscanned" : 1,! "nscannedObjects" : 1,! "n" : 1,! "millis" : 5,! "indexBounds" : {! ! "author" : [! ! ! [! ! ! ! "Hergé",! ! ! ! "Hergé"! ! ! ]! ! ]! }}

Wednesday, December 5, 12

Page 18: MongoSV Schema Workshop

Examine the query plan

> db.blogs.find( { author: "Hergé" } ).explain(){! "cursor" : "BtreeCursor author_1",! "nscanned" : 1,! "nscannedObjects" : 1,! "n" : 1,! "millis" : 5,! "indexBounds" : {! ! "author" : [! ! ! [! ! ! ! "Hergé",! ! ! ! "Hergé"! ! ! ]! ! ]! }}

Wednesday, December 5, 12

Page 19: MongoSV Schema Workshop

Examine the query plan

> db.blogs.find( { author: "Hergé" } ).explain(){! "cursor" : "BtreeCursor author_1",! "nscanned" : 1,! "nscannedObjects" : 1,! "n" : 1,! "millis" : 5,! "indexBounds" : {! ! "author" : [! ! ! [! ! ! ! "Hergé",! ! ! ! "Hergé"! ! ! ]! ! ]! }}

How long it took

Number of objects returned

Wednesday, December 5, 12

Page 20: MongoSV Schema Workshop

Query operatorsConditional operators: $ne, $in, $nin, $mod, $all, $size, $exists, $type, .. $lt, $lte, $gt, $gte, $ne...

// find posts with any tags> db.blogs.find( { tags: { $exists: true } } )

Regular expressions:// posts where author starts with h> db.blogs.find( { author: /^h/i } )

Counting: // number of posts written by Hergé> db.blogs.find( { author: "Hergé" } ).count()

Wednesday, December 5, 12

Page 21: MongoSV Schema Workshop

Extending the Schema

http://bit.ly/PpjT1l

Wednesday, December 5, 12

Page 22: MongoSV Schema Workshop

Extending the Schema> new_comment = { author: "Kyle", date: new Date(), text: "great book" }

> db.blogs.update( { text: "Destination Moon" }, { "$push": { comments: new_comment }, "$inc": { comments_count: 1 } } )

Wednesday, December 5, 12

Page 23: MongoSV Schema Workshop

Extending the Schema> new_comment = { author: "Kyle", date: new Date(), text: "great book" }

> db.blogs.update( { text: "Destination Moon" }, { "$push": { comments: new_comment }, "$inc": { comments_count: 1 } } )

Increment counterAdd element to

array

Wednesday, December 5, 12

Page 24: MongoSV Schema Workshop

> db.blogs.find( { author: "Hergé"} )

{ _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "Hergé", date : ISODate("2011-09-18T09:56:06.298Z"), text : "Destination Moon", tags : [ "comic", "movie" ], comments : [! {! ! author : "Kyle",! ! date : ISODate("2011-09-19T09:56:06.298Z"),! ! text : "great book"! } ], comments_count: 1 }

Extending the Schema

Wednesday, December 5, 12

Page 25: MongoSV Schema Workshop

// create index on nested documents:> db.blogs.ensureIndex( { "comments.author": 1 } )

> db.blogs.find( { "comments.author": "Kyle" } )

// find last 5 posts:> db.blogs.find().sort( { date: -1 } ).limit(5)

// most commented post:> db.blogs.find().sort( { comments_count: -1 } ).limit(1)

When sorting, check if you need an index

Extending the Schema

Wednesday, December 5, 12

Page 26: MongoSV Schema Workshop

Common Patterns

http://bit.ly/SNnt4z

Wednesday, December 5, 12

Page 27: MongoSV Schema Workshop

Inheritance

http://bit.ly/T7MqUz

Wednesday, December 5, 12

Page 28: MongoSV Schema Workshop

Inheritance

Wednesday, December 5, 12

Page 29: MongoSV Schema Workshop

select * from shapes;

id type area radius length width

1 circle 3.14 1

2 square 4 2

3 rect 10 5 2

Single Table Inheritance - RDBMS

Wednesday, December 5, 12

Page 30: MongoSV Schema Workshop

Single Table Inheritance - MongoDB> db.shapes.find() { _id: "1", type: "c", area: 3.14, radius: 1} { _id: "2", type: "s", area: 4, length: 2} { _id: "3", type: "r", area: 10, length: 5, width: 2}

missing values not stored!

Wednesday, December 5, 12

Page 31: MongoSV Schema Workshop

Single Table Inheritance - MongoDB> db.shapes.find() { _id: "1", type: "c", area: 3.14, radius: 1} { _id: "2", type: "s", area: 4, length: 2} { _id: "3", type: "r", area: 10, length: 5, width: 2}

// find shapes where radius > 0 > db.shapes.find( { radius: { $gt: 0 } } )

Wednesday, December 5, 12

Page 32: MongoSV Schema Workshop

Single Table Inheritance - MongoDB> db.shapes.find() { _id: "1", type: "c", area: 3.14, radius: 1} { _id: "2", type: "s", area: 4, length: 2} { _id: "3", type: "r", area: 10, length: 5, width: 2}

// find shapes where radius > 0 > db.shapes.find( { radius: { $gt: 0 } } )

// create index> db.shapes.ensureIndex( { radius: 1 }, { sparse:true } )

index only values present!

Wednesday, December 5, 12

Page 33: MongoSV Schema Workshop

One to Many

http://bit.ly/Oqbt8z

Wednesday, December 5, 12

Page 34: MongoSV Schema Workshop

One to Many

One to Many relationships can specify• degree of association between objects• containment• life-cycle

Wednesday, December 5, 12

Page 35: MongoSV Schema Workshop

One to ManyEmbedded Array

•$slice operator to return subset of comments•some queries harder

•e.g find latest comments across all blogs

blogs: { author : "Hergé", date : ISODate("2011-09-18T09:56:06.298Z"), comments : [! { author : "Kyle",! ! date : ISODate("2011-09-19T09:56:06.298Z"),! ! text : "great book" } ] }

> db.blogs.find( { author: "Hergé" }, { comment: { $slice : 10 } } )

Wednesday, December 5, 12

Page 36: MongoSV Schema Workshop

One to ManyNormalized (2 collections)• most flexible• more queries

blogs: { _id: 1000, author: "Hergé", date: ISODate("2011-09-18T09:56:06.298Z"), comments: [! {comment : 1)} ]}

comments : { _id : 1, blog: 1000, author : "Kyle",! ! date : ISODate("2011-09-19T09:56:06.298Z")}

> blog = db.blogs.find( { text: "Destination Moon" } );> db.comments.find( { blog: blog._id } ).limit(5);

Wednesday, December 5, 12

Page 37: MongoSV Schema Workshop

Many to Many

http://bit.ly/QTzhBF

Wednesday, December 5, 12

Page 38: MongoSV Schema Workshop

Many - Many

Example: • Blog can have many Tags• Tag can be used by many Blogs

Wednesday, December 5, 12

Page 39: MongoSV Schema Workshop

// Each Tag lists the "_id" of the Blogtags: { _id: 20, name: "comic", // Unique blog_ids: [ 10, 11, 12 ] }

{ _id: 30, name: "movie", // Unique blog_ids: [ 10 ] }

Many - Many

Wednesday, December 5, 12

Page 40: MongoSV Schema Workshop

// Each Tag lists the "_id" of the Blogtags: { _id: 20, name: "comic", // Unique blog_ids: [ 10, 11, 12 ] }

{ _id: 30, name: "movie", // Unique blog_ids: [ 10 ] }

// Each Blog lists the "tag" of the Tagsblogs: { _id: 10, name: "Destination Moon", tags: [ "comic", "movie" ] }

Many - Many

Wednesday, December 5, 12

Page 41: MongoSV Schema Workshop

// Each Tag lists the "_id" of the Blogtags: { _id: 20, name: "comic", // Unique blog_ids: [ 10, 11, 12 ] }

{ _id: 30, name: "movie", // Unique blog_ids: [ 10 ] }

// Each Blog lists the "tag" of the Tagsblogs: { _id: 10, name: "Destination Moon", tags: [ "comic", "movie" ] }

Many - Many

links via unique key, in this case "tags", could be "_id"

Wednesday, December 5, 12

Page 42: MongoSV Schema Workshop

// Each Tag lists the "_id" of the Blogtags: { _id: 20, name: "comic", // Unique blog_ids: [ 10, 11, 12 ] }

{ _id: 30, name: "movie", // Unique blog_ids: [ 10 ] }

// Each Blog lists the "tag" of the Tagsblogs: { _id: 10, name: "Destination Moon", tags: [ "comic", "movie" ] } // All Tags for a given Blog> db.tags.find( { blog_ids: 10 } )

Many - Many

Wednesday, December 5, 12

Page 43: MongoSV Schema Workshop

Use _id or not?

blogs: { _id: 10, name: "..." tags: [ "comic", "movie" ] }

Pros:• Single query

Cons:• Cascade any changes

blogs: { _id: 10, name: "..." tags: [ 10, 20 ] }

Pros:• Single update

Cons:• Second query required

Wednesday, December 5, 12

Page 44: MongoSV Schema Workshop

// Each Blog lists the _id of the Tagblogs: { _id: 10, name: "Destination Moon", tag_ids: [ 20, 30 ] } // Association not stored on the Tagtags: { _id: 20, name: "comic" }

Alternative

Wednesday, December 5, 12

Page 45: MongoSV Schema Workshop

// Each Blog lists the _id of the Tagblogs: { _id: 10, name: "Destination Moon", tag_ids: [ 20, 30 ] } // Association not stored on the Tagtags: { _id: 20, name: "comic" }

// All Blogs for a given Tag> db.blogs.find( { tag_ids: 20 } )

Alternative

Wednesday, December 5, 12

Page 46: MongoSV Schema Workshop

// Each Blog lists the _id of the Tagblogs: { _id: 10, name: "Destination Moon", tag_ids: [ 20, 30 ] } // Association not stored on the Tagtags: { _id: 20, name: "comic" }

// All Blogs for a given Tag> db.blogs.find( { tag_ids: 20 } )

// All Tags for a given Blog> blog = db.blogs.findOne( { _id: 10 } )> db.tags.find({_id: {$in : blog.tag_ids}})

Alternative

Wednesday, December 5, 12

Page 47: MongoSV Schema Workshop

Many - Many Intersection AttributesExample: • Blog can have many Tags• Tag can be used my many Blogs• When a Tag is used, record the usage date

Wednesday, December 5, 12

Page 48: MongoSV Schema Workshop

// Each Blog lists the _id of the Tagblogs: { _id: 10, name: "...", tag_ids: [ 20, 30 ] } // Association not stored on the Tagtags: { _id: 20, name: "comic" }

// Store the interaction and usage dateusages: { blog_id: 10, // Blog _id tag_id : 20, // Tag _id usage: ISODate("2012-10-12...") }

// Find the Tags for a Blogfor(var c = db.usages.find({ blog_id: 10 }); c.hasNext(); ){ u = c.next(); t = db.tags.findOne( { _id: c.tag_id } ) printjson( u.usage );

Many - Many Normalized

Wednesday, December 5, 12

Page 49: MongoSV Schema Workshop

// Each Blog lists the Blog Usage Objectblogs: { _id: 10, name: "Destination Moon", tags: [ { tag: "comic", usage: ISODate("2012-10-12...") } { tag: "movie", usage: ISODate("2012-09-11...") } ] }

// Find the Tags for a Blog> db.blogs.find( { _id: 10 }, { tags: 1} ) Pros:• Usage object encapsulated where used

Cons:• If updates allowed, changes will have to be cascaded

Many - Many Intersection Attributes

Wednesday, December 5, 12

Page 50: MongoSV Schema Workshop

Summary

• Single biggest performance factor

• More choices than in an RDBMS

• Embedding, index design, shard keys

Wednesday, December 5, 12

Page 51: MongoSV Schema Workshop

Part TwoSchema Design

Wednesday, December 5, 12

Page 52: MongoSV Schema Workshop

Lab #1Design Schema for Twitter

• Model each users activity stream• Users

• Name, email address, display name• Tweets

• Text• Who• Timestamp

Wednesday, December 5, 12

Page 53: MongoSV Schema Workshop

Lab #1 - Solution ATwo Collections// users - one doc per user{ _id: "alvin", email: "[email protected]", display: "jonnyeight"}

// tweets - one doc per user per tweet{ user: "bob", for: "alvin", tweet: "20111209-1231", text: "Best Tweet Ever!", ts: ISODate("2011-09-18T09:56:06.298Z")}

Wednesday, December 5, 12

Page 54: MongoSV Schema Workshop

Lab #1 - Solution BEmbedded Tweets// users - one doc per user with all tweets{ _id: "alvin", email: "[email protected]", display; "jonnyeight", tweets: [! {! ! user: "bob",! ! tweet: "20111209-1231",! ! text: "Best Tweet Ever!", ts: ISODate("2011-09-18T09:56:06.298Z")! } ]}

Wednesday, December 5, 12

Page 55: MongoSV Schema Workshop

Embedding

• Great for read performance

• One seek to load entire object

• One roundtrip to database

• Writes can be slow if adding to objects all the time

Wednesday, December 5, 12

Page 56: MongoSV Schema Workshop

Linking or Embedding?

Linking can make some queries easy

// Find latest 50 tweets for "alvin"> db.tweets.find( { _id:"alvin"} ) .sort( {ts:-1} ) .limit(50)

But what effect does this have on the systems?

Wednesday, December 5, 12

Page 57: MongoSV Schema Workshop

Collection 1

Index 1

Wednesday, December 5, 12

Page 58: MongoSV Schema Workshop

Virtual Address Space 1

Collection 1

Index 1 This is your virtual memory size

(mapped)

Wednesday, December 5, 12

Page 59: MongoSV Schema Workshop

Virtual Address Space 1

Physical RAM

Collection 1

Index 1

This is your resident

memory size

Wednesday, December 5, 12

Page 60: MongoSV Schema Workshop

Virtual Address Space 1

Physical RAM

DiskCollection 1

Index 1

Wednesday, December 5, 12

Page 61: MongoSV Schema Workshop

Virtual Address Space 1

Physical RAM

DiskCollection 1

Index 1

100 ns

10,000,000 ns

=

=

Wednesday, December 5, 12

Page 62: MongoSV Schema Workshop

Virtual Address Space 1

Physical RAM

DiskCollection 1

Index 1

> db.tweets.find( { _id: "alvin" } ) .sort( { ts: -1 } ) .limit(10)

1

2

3

Linking = Many seeks + random reads

Wednesday, December 5, 12

Page 63: MongoSV Schema Workshop

Virtual Address Space 1

Physical RAM

DiskCollection 1

Index 1

1

Embedding = Large Sequential Read

> db.tweets.find( { _id: "alvin" } )

Wednesday, December 5, 12

Page 64: MongoSV Schema Workshop

Lab #2Alternative Schema

• Display last 10 tweets from today• Efficiently use memory and Disk seeks / IOPs

Wednesday, December 5, 12

Page 65: MongoSV Schema Workshop

Lab #2 - SolutionBuckets// tweets : one doc per user per day> db.tweets.findOne()

{ _id: "alvin-2011/12/09", email: "[email protected]", tweets: [ { user: "Bob",! tweet: "20111209-1231",! text: "Best Tweet Ever!" } , ! { author: "Joe",! tweet: "20111210-9025",! date: "May 27 2011",! text: "Stuck in traffic (again)" } ]}

Wednesday, December 5, 12

Page 66: MongoSV Schema Workshop

Lab #2 - SolutionLast 10 Tweets

> db.tweets.find( { _id: "alvin-2011/12/09" }, { tweets: { $slice : 10 } } ) .sort( { _id: -1 } ) .limit(1)

Wednesday, December 5, 12

Page 67: MongoSV Schema Workshop

Lab #2 - SolutionAdding a Tweet> tweet = { user: "Bob",! tweet: "20111209-1231",! text: "Best Tweet Ever!" }

> db.tweets.update( { _id : "alvin-2011/12/09" }, { $push : { tweets : tweet } );

Wednesday, December 5, 12

Page 68: MongoSV Schema Workshop

Lab #2 - SolutionGetting All Tweets> cursor = db.tweets.find ( { _id : /^alvin/ } ).sort( { _id : -1 } )

> while ( cursor.hasNext() ) { doc = cursor.next(); for ( var i=0; i<doc.tweets.length; i++ ) printjson( doc.tweets[i] )}

Wednesday, December 5, 12

Page 69: MongoSV Schema Workshop

Lab #2 - SolutionDeleting a Tweet> db.tweets.update( { _id: "alvin-20111209" }, { $pull: { tweets: { tweet: "20111209-1231" } })

Wednesday, December 5, 12

Page 70: MongoSV Schema Workshop

Virtual Address Space 1

Physical RAM

DiskCollection 1

Index 1

> db.tweets.find( { _id: "alvin-2011/12/09" }, { tweets: { $slice : 10 } } ) .sort( { _id: -1 } ) .limit(1)

Bucket = 1 seek + 1 sequential read

1

Wednesday, December 5, 12

Page 71: MongoSV Schema Workshop

Trees

http://bit.ly/Oqc8Xs

Wednesday, December 5, 12

Page 72: MongoSV Schema Workshop

Trees

Hierarchical information

   

Wednesday, December 5, 12

Page 73: MongoSV Schema Workshop

Trees

Full Tree in Document

{ retweet: [ { who: “Kyle”, text: “...”, retweet: [ {who: “James”, text: “...”, retweet: []} ]} ]}

Pros: Single Document, Performance, Intuitive

Cons: Hard to search, Partial Results, 16MB limit

   

Wednesday, December 5, 12

Page 74: MongoSV Schema Workshop

Array of Ancestors// Store all Ancestors of a node { _id: "a" } { _id: "b", tree: [ "a" ], retweet: "a" } { _id: "c", tree: [ "a", "b" ], retweet: "b" } { _id: "d", tree: [ "a", "b" ], retweet: "b" } { _id: "e", tree: [ "a" ], retweet: "a" } { _id: "f", tree: [ "a", "e" ], retweet: "e" }

A B C

DE

F

Wednesday, December 5, 12

Page 75: MongoSV Schema Workshop

Array of Ancestors// Store all Ancestors of a node { _id: "a" } { _id: "b", tree: [ "a" ], retweet: "a" } { _id: "c", tree: [ "a", "b" ], retweet: "b" } { _id: "d", tree: [ "a", "b" ], retweet: "b" } { _id: "e", tree: [ "a" ], retweet: "a" } { _id: "f", tree: [ "a", "e" ], retweet: "e" }

// find all direct retweets of "b"> db.tweets.find( { retweet: "b" } )

// find all retweets of "e" anywhere in tree> db.tweets.find( { tree: "e" } )

// find tweet history of f:> tweets = db.tweets.findOne( { _id: "f" } ).tree> db.tweets.find( { _id: { $in : tweets } } )

A B C

DE

F

Wednesday, December 5, 12

Page 76: MongoSV Schema Workshop

Trees as Paths

Store hierarchy as a path expression• Separate each node by a delimiter, e.g. “/”• Use text search for find parts of a tree

{ retweets: [ { _id: "a", text: "initial tweet", path: "a" }, { _id: "b", text: "reweet with comment", path: "a/b" }, { _id: "c", text: "reply to retweet", path : "a/b/c"} ] }

// Find the conversations "a" started > db.tweets.find( { path: /^a/i } )

A B C

DE

F

Wednesday, December 5, 12

Page 77: MongoSV Schema Workshop

http://bit.ly/QeNsPX

Queues & Workflows

Wednesday, December 5, 12

Page 78: MongoSV Schema Workshop

Lab #3Following Requests• Users are allowed to "follow" another user

• User send a "follow" request• Follower approves or not• Requests are timed out after 7 days

• The approval is an async process

Wednesday, December 5, 12

Page 79: MongoSV Schema Workshop

Lab #3 - SolutionQueues & Workflows• Need to maintain order and state• Ensure that updates are atomic

> db.approvals.insert( { inprogress: false, approved: false, priority: 1, text: "Hey Jim, want to follow you!" } );// find highest priority approval and mark as in-progressjob = db.approvals.findAndModify({ query: { inprogress: false }, sort: { priority: -1 }, update: { $set: { inprogress: true, started: new Date() } }, new: true})

Wednesday, December 5, 12

Page 80: MongoSV Schema Workshop

Lab #3 - SolutionQueues & Workflows• Need to maintain order and state• Ensure that updates are atomic

> db.approvals.insert( { inprogress: false, approved: false, priority: 1, text: "Hey Jim, want to follow you!" } );// find highest priority approval and mark as in-progressjob = db.approvals.findAndModify({ query: { inprogress: false }, sort: { priority: -1 }, update: { $set: { inprogress: true, started: new Date() } }, new: true})

Wednesday, December 5, 12

Page 81: MongoSV Schema Workshop

Lab #3 - SolutionQueues & Workflows

{ inprogress: true, priority: 1, approved: False, started: ISODate("2011-09-18T09:56:06.298Z") ... }

updated

added

Wednesday, December 5, 12

Page 82: MongoSV Schema Workshop

Lab #3 - SolutionQueues & Workflows• Follower approves request

// update approval after receiving approval> job = db.approvals.update( { _id: "1234" }, { $set: { approved: true } } )

• System times out request after 7 days

var limit=new Date();limit.setDate(limit.getDate()-7);

> job = db.approvals.update( { inprogress: true, started: { $gt: limit} }, { $set: { approved: false } } )

Wednesday, December 5, 12

Page 83: MongoSV Schema Workshop

Lab #4Voting

Twitter meets Stack Overflow

• Users can "vote" for a tweet• A user can "vote" once and only once• Need to display current votes

Wednesday, December 5, 12

Page 84: MongoSV Schema Workshop

Lab #4 - SolutionVotes// One document per voter per tweet> db.votes.insert( { tweet: "20111209-1231", voter: "alvin" } );

// Unique index guarantees the user can't vote twice> db.votes.ensureIndex( { tweet: 1, voter: 1 }, { unique: true } );

// Count will return the number of votes cast> db.votes.find({ tweet: "20111209-1231" }).count()

Wednesday, December 5, 12

Page 85: MongoSV Schema Workshop

Count or Not?

• Indexes in MongoDB are not counting• The count has to be computed via a index scan

// One summary document per tweet, no "voter" key> db.votes.update( { tweet: "20111209-1231", voter: { $exists: false } }, { "$inc": { count: 1 } }, true, false );

// Return the count for the no "voter" document> db.votes.find( { tweet: "20111209-1231", voter: { $exists: false } }, { count: 1, _id: 0} )

Wednesday, December 5, 12

Page 86: MongoSV Schema Workshop

Lab #5Time Series• Records votes by

• Day, Hour, Minute• Show time series of votes cast

Wednesday, December 5, 12

Page 87: MongoSV Schema Workshop

Lab #5 - Solution ATime Series// Time series buckets, hour and minute sub-docs{ _id: "20111209-1231", ts: ISODate("2011-12-09T00:00:00.000Z") daily: 67, hourly: { 0: 23, 1: 14, 2: 19 ... 23: 72 }, minute: { 0: 0, 1: 4, 2: 6 ... 1439: 0 }}

Wednesday, December 5, 12

Page 88: MongoSV Schema Workshop

Lab #5 - Solution ATime Series// Add one to the last minute before midnight> db.votes.update( { _id: "20111209-1231", ts: ISODate("2011-12-09T00:00:00.037Z") }, { $inc: { daily: 1 }, $inc: { "hourly.23": 1 }, $inc: { "minute.1439": 1 } )

What is the cost of updating the minute before midnight?

Wednesday, December 5, 12

Page 89: MongoSV Schema Workshop

• Sequence of key/value pairs• NOT a hash map• Optimized to scan quickly

• 1439 skips

BSON Storage

...0 1 2 3 1439

Wednesday, December 5, 12

Page 90: MongoSV Schema Workshop

• Can skip sub-documents

• 23 skips (hours) + 59 skips (minutes) = 82 skips

BSON Storage

1

0 ...

... ...59

1 23

1380 143960 ... 119

Wednesday, December 5, 12

Page 91: MongoSV Schema Workshop

Lab #5 - Solution BTime Series// Time series buckets, each hour a sub-document{ _id: "20111209-1231", ts: ISODate("2011-12-09T00:00:00.000Z") daily: 67, minute: { 0: { 0: 0, 1: 7, ... 59: 2 }, ... 23: { 0: 15, ... 59: 6 } }}

// Add one to the last second before midnight> db.votes.update( { _id: "20111209-1231" }, ts: ISODate("2011-12-09T00:00:00.000Z") }, { $inc: { daily: 1 }, $inc: { "minute.23.59": 1 } })

Wednesday, December 5, 12

Page 92: MongoSV Schema Workshop

Lab #6Inventory

• User has a number of "votes" they can use

Wednesday, December 5, 12

Page 93: MongoSV Schema Workshop

Lab #6 - SolutionInventory // Number of votes and who voted for { _id: "alvin", votes: 42, voted_for: [] }

// Subtract a vote and add the voted for tweet // "20111209-1231" > db.user.update( { _id: "alvin", votes : { $gt : 0}, voted_for: { $ne: "20111209-1231" }}, { "$push": { voted_for: "20111209-1231"}, "$inc": { votes: -1} } )

Wednesday, December 5, 12

Page 94: MongoSV Schema Workshop

Lab #6 - SolutionInventory // After vote > db.votes.findOne() { _id: "alvin", votes: 41, voted_for: ["20111209-1231"] }

decremented

added

Wednesday, December 5, 12

Page 95: MongoSV Schema Workshop

Lab #7Statistic Buckets• Record referring web sites on customer sign up• Independent counter for each web site

Wednesday, December 5, 12

Page 101: MongoSV Schema Workshop

Lab #7 - Solution AStatistic Buckets

> db.referers.update( { "referrers.domain": "www.bing.com" }, { $inc: {"referrers.$.count": 1 } }, false, true ) What happens if a new referring site is used?

Wednesday, December 5, 12

Page 102: MongoSV Schema Workshop

Lab #7 - Solution BStatistic Buckets// Need to replace dots with underscores{ _id: "alvin", referrers:      { "www_google_co_uk": 4,        "www_yahoo_com": 1 }, }

// simple $inc will add www_bing_com if not present> db.referers.update( { _id: "alvin" }, { $inc: { "referrers.www_bing_com": 1 } }, true, false);

Wednesday, December 5, 12

Page 103: MongoSV Schema Workshop

Part ThreeSharding

Wednesday, December 5, 12

Page 104: MongoSV Schema Workshop

What is Sharding

• Ad-hoc partitioning

• Consistent hashing• Amazon Dynamo

• Range based partitioning• Google BigTable• Yahoo! PNUTS• MongoDB

Wednesday, December 5, 12

Page 105: MongoSV Schema Workshop

MongoDB Sharding

• Automatic partitioning and management

• Range based

• Convert to sharded system with no downtime

• Fully consistent

• No code changes required

Wednesday, December 5, 12

Page 106: MongoSV Schema Workshop

Sharding - Range distribution

shard01 shard02 shard03

sh.shardCollection("mydb.tweets",  {_id:  1}  ,  false)

Wednesday, December 5, 12

Page 107: MongoSV Schema Workshop

Sharding - Range distribution

shard01 shard02 shard03

a-i j-r s-z

Wednesday, December 5, 12

Page 108: MongoSV Schema Workshop

Sharding - Splits

shard01 shard02 shard03

a-i ja-jz s-z

k-r

Wednesday, December 5, 12

Page 109: MongoSV Schema Workshop

Sharding - Splits

shard01 shard02 shard03

a-i ja-ji s-z

ji-js

js-jw

jz-r

Wednesday, December 5, 12

Page 110: MongoSV Schema Workshop

Sharding - Auto Balancing

shard01 shard02 shard03

a-i ja-ji s-z

ji-js

js-jw

jz-r

js-jw

jz-r

Wednesday, December 5, 12

Page 111: MongoSV Schema Workshop

Sharding - Auto Balancing

shard01 shard02 shard03

a-i ja-ji s-z

ji-js

js-jw

jz-r

Wednesday, December 5, 12

Page 112: MongoSV Schema Workshop

Sharding for caching

Wednesday, December 5, 12

Page 113: MongoSV Schema Workshop

Sharding for caching

shard01

a-i

j-r

s-z

300

GB

Dat

a

300 GB

96 GB Mem3:1 Data/Mem

Wednesday, December 5, 12

Page 114: MongoSV Schema Workshop

Aggregate Horizontal Resources

shard01 shard02 shard03

a-i j-r s-z

96 GB Mem1:1 Data/Mem

100 GB 100 GB 100 GB

300

GB

Dat

a

96 GB Mem1:1 Data/Mem

96 GB Mem1:1 Data/Mem

j-r

s-z

Wednesday, December 5, 12

Page 115: MongoSV Schema Workshop

Sharding Features• Shard data without no downtime • Automatic balancing as data is written• Commands routed (switched) to correct node

• Inserts - must have the Shard Key• Updates - can have the Shard Key• Queries

• With Shard Key - routed to nodes• Without Shard Key - scatter gather

• Indexed / Sorted Queries• With Shard Key - routed in order• Without Shard Key - distributed sort merge

Wednesday, December 5, 12

Page 116: MongoSV Schema Workshop

Lab #8Sharding Twitter Pictures

User can upload pictures to Twitter feed

{ photo_id : ???? , data : <binary> }

What should photo_id be?How will photo_id be sharded?

Wednesday, December 5, 12

Page 117: MongoSV Schema Workshop

Lab #8Sharding Key

{ photo_id : ???? , data : <binary> }

What’s the right key?• auto increment• MD5( data )• month() + MD5( data )

Wednesday, December 5, 12

Page 118: MongoSV Schema Workshop

• Only have to keep small portion in ram• Right shard "hot" • Time Based

• ObjectId• Auto Increment

Right balanced access

Wednesday, December 5, 12

Page 119: MongoSV Schema Workshop

• Have to keep entire index in ram• All shards "warm"

• Hash

Random access

Wednesday, December 5, 12

Page 120: MongoSV Schema Workshop

• Have to keep some index in ram• Some shards "warm"

•Month + Hash

Segmented access

Wednesday, December 5, 12

Page 121: MongoSV Schema Workshop

Lab #9Single Identities// Shard by _idids:{ _id : "alvin", email: "[email protected]", addresses: [ { state : "CA", country: "USA" }, { country: "UK" } ] }

How would the following queries be executed?

> db.ids.find( { _id: "alvin"} )> db.ids.find( { email: "[email protected]" } )

Wednesday, December 5, 12

Page 122: MongoSV Schema Workshop

Sharding - Routed Query

shard01 shard02 shard03

a-i ja-ji s-z

ji-js

js-jw

jz-r

find(  {  _id:  "alvin"}  )

Wednesday, December 5, 12

Page 123: MongoSV Schema Workshop

Sharding - Routed Query

shard01 shard02 shard03

a-i ja-ji s-z

ji-js

find(  {  _id:  "alvin"}  )

js-jw

jz-r

Wednesday, December 5, 12

Page 124: MongoSV Schema Workshop

Sharding - Scatter Gather

shard01 shard02 shard03

a-i ja-ji s-z

ji-js

js-jw

jz-r

find(  {  email:  "[email protected]"  }  )

Wednesday, December 5, 12

Page 125: MongoSV Schema Workshop

Sharding - Scatter Gather

shard01 shard02 shard03

a-i ja-ji s-z

ji-js

js-jw

jz-r

find(  {  email:  "[email protected]"  }  )

Wednesday, December 5, 12

Page 126: MongoSV Schema Workshop

Lab #9Multiple Identities

User can have multiple identities• twitter name• email address• facebook name• etc.

What is the best sharding key & schema design?

Wednesday, December 5, 12

Page 127: MongoSV Schema Workshop

Lab #9 - Solution AMultiple Identities

// Shard by _id{ _id: "alvin", email: "[email protected]", fb: "alvin.richards", // facebook li: "alvin.j.richards", // linkedin tweets: [ ... ] }

Lookup by _id hits 1 node Lookup by email, li or fb is scatter gather Cannot create a unique index on email, li or fb

Wednesday, December 5, 12

Page 128: MongoSV Schema Workshop

Lab #9 - Solution BMultiple Identitiesidentities{ _id: { _id: "alvin"}, info: "1200-42"}{ _id: { em: "[email protected]"}, info: "1200-42"}{ _id: { li: "alvin.j.richards"}, info: "1200-42"}

tweets{ _id: "1200-42", tweets: [ ... ]}

• Shard identities on { _id: 1}• Can create unique index on _id• Shard info on { _id: 1 }

Wednesday, December 5, 12

Page 129: MongoSV Schema Workshop

Sharding - Multiple Identities

shard01 shard02 shard03

idscollection

tweetscollection

em: a-q em: r-z _id: a-z

li: s-z

li: a-c

li: d-r_id: "Min"-"1100"

_id: "1100"-"1200"

_id: "1200"-"Max"

Wednesday, December 5, 12

Page 130: MongoSV Schema Workshop

Sharding - Multiple Identities

shard01 shard02 shard03em: a-q em: r-z _id: a-z

li: s-z

li: a-c

li: d-r_id: "Min"-"1100"

_id: "1100"-"1200"

_id: "1200"-"Max"

ids.find({  _id:                      {"em","[email protected]  })

idscollection

tweetscollection

Wednesday, December 5, 12

Page 131: MongoSV Schema Workshop

Sharding - Multiple Identities

shard01 shard02 shard03

ids.find({  _id:                      {"em","[email protected]  })

tweets.find({  _id:  "1200-­‐42"  })

idscollection

tweetscollection

em: a-q em: r-z _id: a-z

li: s-z

li: a-c

li: d-r_id: "Min"-"1100"

_id: "1100"-"1200"

_id: "1200"-"Max"

Wednesday, December 5, 12

Page 132: MongoSV Schema Workshop

Part FourReplication

Wednesday, December 5, 12

Page 133: MongoSV Schema Workshop

Types of outage• Planned

• Hardware upgrade• O/S or file-system tuning• Relocation of data to new file-system / storage• Software upgrade

• Unplanned• Hardware failure• Data center failure• Region outage• Human error• Application corruption

Wednesday, December 5, 12

Page 134: MongoSV Schema Workshop

Replica Sets

• Data Protection• Multiple copies of the data• Spread across Data Centers, AZs

• High Availability• Automated Failover• Automated Recovery

Wednesday, December 5, 12

Page 135: MongoSV Schema Workshop

Replica Sets

Primary

Secondary

Secondary

Read

Write

Read

Read

App

Asynchronous Replication

Wednesday, December 5, 12

Page 136: MongoSV Schema Workshop

Replica Sets

Primary

Secondary

Secondary

Read

Write

Read

Read

App

Wednesday, December 5, 12

Page 137: MongoSV Schema Workshop

Replica Sets

Primary

Primary

Secondary

Read

Write

Read

Automatic Election of new Primary

App

Wednesday, December 5, 12

Page 138: MongoSV Schema Workshop

Replica Sets

Recovering

Primary

Secondary

Read

Write

Read

New primary serves data

App

Wednesday, December 5, 12

Page 139: MongoSV Schema Workshop

Replica Sets

Secondary

Primary

Secondary

Read

Write

Read

Read

App

Wednesday, December 5, 12

Page 140: MongoSV Schema Workshop

Elections

During an election• Most up to date• Highest priority• Less than 10s behind failed Primary

Wednesday, December 5, 12

Page 141: MongoSV Schema Workshop

Types of Durability with MongoDB• Fire and forget• Wait for error • Wait for fsync• Wait for journal sync • Wait for replication

Wednesday, December 5, 12

Page 142: MongoSV Schema Workshop

Network Ack- Old Default

Driver Primary

apply  in  memory

write

Wednesday, December 5, 12

Page 143: MongoSV Schema Workshop

Get last error - New default

Driver Primary

getLastError apply  in  memory

write

Wednesday, December 5, 12

Page 144: MongoSV Schema Workshop

Wait for Journal Sync

Driver Primary

apply  in  memory

write

j:trueWrite  to  journal

getLastError

Wednesday, December 5, 12

Page 145: MongoSV Schema Workshop

Wait for replication

Driver Primary

apply  in  memory

write

w:2

Secondary

replicate

getLastError

Wednesday, December 5, 12

Page 146: MongoSV Schema Workshop

Tunable Data DurabilityMemory Journal Secondary Other Data Center

RDBMS

networkACK

w=1

w=1j=true

w="majority"w=n

w="myTag"

Less More

async

sync

Wednesday, December 5, 12

Page 147: MongoSV Schema Workshop

Eventual ConsistencyUsing Replicas for ReadsRead  preference• primary (only)• primaryPreferred• secondary (only)• secondaryPreferred• nearest

Wednesday, December 5, 12

Page 148: MongoSV Schema Workshop

Immediate Consistency

PrimaryThread #1

Insert

Update

Read

Read

v1

v2

Wednesday, December 5, 12

Page 149: MongoSV Schema Workshop

Eventual Consistency

Primary SecondaryThread #1

Insert

Update

Read

Read

v1

Thread #2

v1

✖v2

v2

reads v1

v1 does not exist

✔ reads v2

✔reads v1

Wednesday, December 5, 12

Page 150: MongoSV Schema Workshop

Lab #10Replication

Primary, Secondary or both?

• Show the latest "votes" for a tweet and/or user• Changing your profile picture• Showing your thumbnail with a tweet

Wednesday, December 5, 12

Page 151: MongoSV Schema Workshop

Summary

• Schema design is different in MongoDB

• Basic data design principals stay the same

• Focus on how the application manipulates data

• Rapidly evolve schema to meet your requirements

• Consider sharding early

• Understand the impact of eventual consistency

Wednesday, December 5, 12

Page 152: MongoSV Schema Workshop

@mongodb

conferences,  appearances,  and  meetupshttp://www.10gen.com/events

http://bit.ly/mongo>  Facebook                    |                  Twitter                  |                  LinkedIn

http://linkd.in/joinmongo

download at mongodb.org

Wednesday, December 5, 12