Big Fast Data with MongoDB
Clint Combs
Thursday, April 12, 12
Polyglot Persistence
• Same idea as Polyglot Programming
• DB Variety ~= Data Structure Variety
• Don’t use a list when we need a map.
• The same goes for data persistence.
Thursday, April 12, 12
Many Options
OracleMongoDB
Redis
BigTable CouchDB
Neo4j
Datomic
DB2
Files
CSV
Excel
Logs
MySQL
Thursday, April 12, 12
Many Options
Oracle
MongoDB
Redis
BigTable CouchDB
Neo4j
Datomic
DB2
Files
CSV
Excel
Logs
MySQL
Thursday, April 12, 12
MongoDB
Thursday, April 12, 12
MongoDB
A Scalable High-PerformanceDocument-Oriented Database
Thursday, April 12, 12
When is MongoDB a Good Fit?
• when you have a large data set
• when you have a dynamic schema
• when you need speed
• when you can tolerate eventual consistency
Thursday, April 12, 12
Wordnik
• Migrated from MySQL
• Primary reason: performance
• 500k requests/hour - 4x that during peak
• 12 billion documents in MongoDB
• 3 TB per node
Thursday, April 12, 12
Wordnik (cont.)
• insert 8k documents per second with bursts to 50k per second
• each java client can sustain 10MB/second reading from a mongod
Thursday, April 12, 12
Wordnik (cont.)
• Every type of retrieval is faster
• sample fetch: 400 ms to 60 ms
• dictionary entries: 20 ms to 1 ms
• document metadata: 30 ms to 0.1 ms
• spelling suggestions: 10 ms to 1.2 ms
Thursday, April 12, 12
Wordnik (cont.)
• MongoDB built-in caching benefits
• removed memcached, freed GBs of RAM
• average speed increase of 1-2 ms per request under load
• not all data in RAM, so 60ms avg. includes disk access
Thursday, April 12, 12
Wordnik (cont.)
• full story: http://blog.wordnik.com/12-months-with-mongodb
• others: http://www.mongodb.org/display/DOCS/Production+Deployments
Thursday, April 12, 12
MongoDB Applications
• archiving
• event logging
• document and CMS systems
• geospatial indexing
• CQRSCommand Query Responsibility Separation
Thursday, April 12, 12
CQRS
System of Record(Oracle)
Query Data(MongoDB)
Command Model
QueryModel
Client
Change Events
Thursday, April 12, 12
What does it store?
• JSON documents up to 16MB in size
• stored in BSON format
• larger documents can be stored with GridFS
• all documents stored in collections
Thursday, April 12, 12
Data Types
• JSON types: string, integer, boolean, double, null, array, and object
• other: date, object id, binary data, regular expressions, code
• timestamps are for internal use only
• JavaScript code into system.js
Thursday, April 12, 12
Java Type MappingsMongoDB Type Java Type
string, int, boolean, double String, int, boolean, doubleObject ID com.mongodb.ObjectId
Regular Expression java.util.regex.PatternDates/Times java.util.Date
Database References com.mongodb.DBRefBinary Data byte[]
Timestamp Data BSONTimestampCode Data Code, CodeWScope
Embedded Documents BasicDBObjectArrays anything that extends List
Thursday, April 12, 12
mongo shell
• ‘mongo’ starts the command-line interface
• use <database-name>
• show collections - displays a list of collections in the current database
Thursday, April 12, 12
Collections
• collections of BSON documents
• schema-free
• can be organized in namespaces for user convenience (really a flat model in DB)
Thursday, April 12, 12
Object IDs
• every document in a collection has a unique _id field
• _id can be any type
• _id created as a BSON ObjectID by the database if not supplied
Thursday, April 12, 12
Collections
• create: db.createCollection(“users”)
• rename:db.users.renameCollection(“allUsers”)
• drop: db.users.drop()
• count: db.users.count()
Thursday, April 12, 12
newhart
• example docs in this talk are from newhart
• newhart is an open source project that provides audit tracking with storage in MongoDB
• uses capped collections for audit trails
• will be available atgithub.org/ClintCombs/newhart
Thursday, April 12, 12
newhart document{ "_id" : ObjectId("4f84b0473004b2a7bf9dcdad"), "origin" : "org.newhart.example.AuditUsers", "originKey" : "mary2012-create-1334095943422", "originKeyType" : "User", "criticality" : "major", "auditTS" : ISODate("2012-04-10T22:12:23.426Z"), "createTS" : ISODate("2012-04-10T22:12:23.424Z"), "updateTS" : ISODate("2012-04-10T22:12:23.424Z"), "msg" : "new user created: mary2012", "data" : { "loginName" : { "text" : "mary2012" } }, "errors" : [ ], "warnings" : [ ], "labels" : [ "security", "create" ]}
Thursday, April 12, 12
Insert
> use newhart> show collections> db.users.insert({origin:"shell"})
Thursday, April 12, 12
Queries
> db.users.find();> db.users.findOne();
Thursday, April 12, 12
Queries (cont.)
> db.users.find({origin:"shell"})
Field Selection:> db.users.find({origin:"org.newhart.example.AuditUsers"},{labels:1})
Arrays> db.users.find({labels:"authenticate"});
Thursday, April 12, 12
Skip and Limit
> db.users.find({labels:"failed"},{"data.loginName.text":1}).sort({"data.loginName.text":1}).skip(5).limit(3)
Thursday, April 12, 12
Updates
• db.users.save({"_id" : ObjectId("4f84fecae2f8776e5cdc3ba7"), "origin" : "Clint"})
• update if exists, otherwise insert
• http://www.mongodb.org/display/DOCS/Updating
Thursday, April 12, 12
Delete
remove all items> db.users.remove()
same as above> db.users.remove({})
remove all items with minor criticality> db.users.remove({criticality:"minor"})
Thursday, April 12, 12
Capped Collections
• Fixed-size, FIFO collections
• create:db.createCollection(“users”,{capped:true, size:1000000});
• size is number of bytes, including database headers
Thursday, April 12, 12
Sorting
> db.users.find({labels:"failed"},{"data.loginName.text":1}).sort({"data.loginName.text":1})
> db.users.find().sort({$natural:-1}).limit(1)
Thursday, April 12, 12
Indexes> db.users.ensureIndex({origin:1})
Unique Index> db.contacts.ensureIndex({email:1},{unique:true})
Indexing Embedded Fields> db.users.ensureIndex({"data.loginName.text":1})
Compound Keys> db.contacts.ensureIndex({last:1, first:1})
Thursday, April 12, 12
Optimization with Explain
> db.users.find({"data.loginName.text":"mary2012"}).explain(){ "cursor" : "BtreeCursor data.loginName.text_1", "nscanned" : 6, "nscannedObjects" : 6, "n" : 6, "millis" : 0, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : false, "indexOnly" : false, "indexBounds" : { "data.loginName.text" : [ [ "mary2012", "mary2012" ] ] }}
Thursday, April 12, 12
Fault Tolerance & ScalingReplica Sets and Sharding
Thursday, April 12, 12
Replica Sets
R1
R2
R3
R4
Arbiter
Client
Thursday, April 12, 12
Replica Sets
R1 R3
R4
Arbiter
Client
Thursday, April 12, 12
Replica Sets
R1 R3
R4
Arbiter
Client
R2mongod
logs
data
journal
file lock
Thursday, April 12, 12
Replica Sets
R1 R3
R4
Arbiter
Client
Thursday, April 12, 12
Replica Sets
R1
R2
R3
R4
Arbiter
Client
Thursday, April 12, 12
Replica Sets
R1
R2
R3
R4
Arbiter
Client
R2
Thursday, April 12, 12
Replica Sets
R1
R2
R3
R4
Arbiter
Client
R3
R2
Thursday, April 12, 12
Replica Sets
R1
R2
R3
R4
Arbiter
Client
R3R3
R2
Thursday, April 12, 12
Replica Sets
R1
R2
R3
R4
Arbiter
Client
R3
R2
R3
R2
Thursday, April 12, 12
Replica Sets
R1
R2
R3
R4
Arbiter
Client
R2
R3
R2
R3
R2
Thursday, April 12, 12
Replica Sets
R1
R2
R3
R4
Arbiter
Client
Thursday, April 12, 12
Replica Sets
R1
R2
R3
R4
Arbiter
Client
Thursday, April 12, 12
Replica Sets
R1
R2
R3
R4
Arbiter
Client
Thursday, April 12, 12
Replica Sets
R1
R2
R3
R4
Arbiter
Client
Thursday, April 12, 12
slaveOk
• used when querying a replica set
• drivers route requests to master by default
• set slaveOk to query a secondary member of the replica set
• rs.slaveOk()
Thursday, April 12, 12
Read from Secondary
R1
R2
R3
R4
Arbiter
Client
Thursday, April 12, 12
Read from Secondary
R1
R2
R3
R4
Arbiter
Client
Thursday, April 12, 12
Read from Secondary
R1
R2
R3
R4
Arbiter
Client
Thursday, April 12, 12
Read from Secondary
R1
R2
R3
R4
Arbiter
Client
Thursday, April 12, 12
Sharding
Client
R1
R2
R3
R1
R2
R3
R1
R2
R3
R1
R2
R3
Replica Set
mongod c2mongos mongos mongos
mongod c1
Shard 1A-F
Shard 2G-L
Shard 3M-R
Shard 4T-Z
Thursday, April 12, 12
UI Clients
• MongoVUE
• JMongoBrowser
• others...
Thursday, April 12, 12
Drivers
• Standard Java Driver: http://github.com/mongodb/mongo-java-driver
• Hammersmith: high-performance async Java driver (Akka 2.0 durable mailboxes)https://github.com/bwmcadams/hammersmith
• Casbah Scala Driver:http://github.com/mongodb/casbah
Thursday, April 12, 12
mongodb.org Drivers
• C
• C++
• Erlang
• Haskell
• JavaScript
• .NET
• Perl
• PHP
• Python
• Ruby
Thursday, April 12, 12
Community Drivers
• ActionScript
• Clojure
• ColdFusion
• D
• Dart
• Fantom
• F#
• Go
• Groovy
• Lisp
• Smalltalk
• and more...
Thursday, April 12, 12
Quick Reference Cards and Cookbook
• http://www.10gen.com/reference
• Commands
• Queries
• Indexing
• Replica Sets
• http://cookbook.mongodb.org/
Thursday, April 12, 12
Summary
• MongoDB is one of many persistence alternatives
• Document-Oriented
• Big and Fast
• Flexible - no schema
Thursday, April 12, 12
Contact Me
• Twitter: @ClintCombs
• Presentation posted at:http://ccombs.net/presentations
Thursday, April 12, 12
Top Related