Big Fast Data with MongoDB - Squarespace · MongoDB Redis BigTable CouchDB Neo4j Datomic DB2 Files...

Post on 22-Apr-2018

238 views 4 download

Transcript of Big Fast Data with MongoDB - Squarespace · MongoDB Redis BigTable CouchDB Neo4j Datomic DB2 Files...

Big Fast Data with MongoDB

Clint Combs

Thursday, April 12, 12

Polyglot Persistence

• Same idea as Polyglot Programming

• DB Variety ~= Data Structure Variety

• Don’t use a list when we need a map.

• The same goes for data persistence.

Thursday, April 12, 12

Many Options

OracleMongoDB

Redis

BigTable CouchDB

Neo4j

Datomic

DB2

Files

CSV

Excel

Logs

MySQL

Thursday, April 12, 12

Many Options

Oracle

MongoDB

Redis

BigTable CouchDB

Neo4j

Datomic

DB2

Files

CSV

Excel

Logs

MySQL

Thursday, April 12, 12

MongoDB

Thursday, April 12, 12

MongoDB

A Scalable High-PerformanceDocument-Oriented Database

Thursday, April 12, 12

When is MongoDB a Good Fit?

• when you have a large data set

• when you have a dynamic schema

• when you need speed

• when you can tolerate eventual consistency

Thursday, April 12, 12

Wordnik

• Migrated from MySQL

• Primary reason: performance

• 500k requests/hour - 4x that during peak

• 12 billion documents in MongoDB

• 3 TB per node

Thursday, April 12, 12

Wordnik (cont.)

• insert 8k documents per second with bursts to 50k per second

• each java client can sustain 10MB/second reading from a mongod

Thursday, April 12, 12

Wordnik (cont.)

• Every type of retrieval is faster

• sample fetch: 400 ms to 60 ms

• dictionary entries: 20 ms to 1 ms

• document metadata: 30 ms to 0.1 ms

• spelling suggestions: 10 ms to 1.2 ms

Thursday, April 12, 12

Wordnik (cont.)

• MongoDB built-in caching benefits

• removed memcached, freed GBs of RAM

• average speed increase of 1-2 ms per request under load

• not all data in RAM, so 60ms avg. includes disk access

Thursday, April 12, 12

MongoDB Applications

• archiving

• event logging

• document and CMS systems

• geospatial indexing

• CQRSCommand Query Responsibility Separation

Thursday, April 12, 12

CQRS

System of Record(Oracle)

Query Data(MongoDB)

Command Model

QueryModel

Client

Change Events

Thursday, April 12, 12

What does it store?

• JSON documents up to 16MB in size

• stored in BSON format

• larger documents can be stored with GridFS

• all documents stored in collections

Thursday, April 12, 12

Data Types

• JSON types: string, integer, boolean, double, null, array, and object

• other: date, object id, binary data, regular expressions, code

• timestamps are for internal use only

• JavaScript code into system.js

Thursday, April 12, 12

Java Type MappingsMongoDB Type Java Type

string, int, boolean, double String, int, boolean, doubleObject ID com.mongodb.ObjectId

Regular Expression java.util.regex.PatternDates/Times java.util.Date

Database References com.mongodb.DBRefBinary Data byte[]

Timestamp Data BSONTimestampCode Data Code, CodeWScope

Embedded Documents BasicDBObjectArrays anything that extends List

Thursday, April 12, 12

mongo shell

• ‘mongo’ starts the command-line interface

• use <database-name>

• show collections - displays a list of collections in the current database

Thursday, April 12, 12

Collections

• collections of BSON documents

• schema-free

• can be organized in namespaces for user convenience (really a flat model in DB)

Thursday, April 12, 12

Object IDs

• every document in a collection has a unique _id field

• _id can be any type

• _id created as a BSON ObjectID by the database if not supplied

Thursday, April 12, 12

Collections

• create: db.createCollection(“users”)

• rename:db.users.renameCollection(“allUsers”)

• drop: db.users.drop()

• count: db.users.count()

Thursday, April 12, 12

newhart

• example docs in this talk are from newhart

• newhart is an open source project that provides audit tracking with storage in MongoDB

• uses capped collections for audit trails

• will be available atgithub.org/ClintCombs/newhart

Thursday, April 12, 12

newhart document{ "_id" : ObjectId("4f84b0473004b2a7bf9dcdad"), "origin" : "org.newhart.example.AuditUsers", "originKey" : "mary2012-create-1334095943422", "originKeyType" : "User", "criticality" : "major", "auditTS" : ISODate("2012-04-10T22:12:23.426Z"), "createTS" : ISODate("2012-04-10T22:12:23.424Z"), "updateTS" : ISODate("2012-04-10T22:12:23.424Z"), "msg" : "new user created: mary2012", "data" : { "loginName" : { "text" : "mary2012" } }, "errors" : [ ], "warnings" : [ ], "labels" : [ "security", "create" ]}

Thursday, April 12, 12

Insert

> use newhart> show collections> db.users.insert({origin:"shell"})

Thursday, April 12, 12

Queries

> db.users.find();> db.users.findOne();

Thursday, April 12, 12

Queries (cont.)

> db.users.find({origin:"shell"})

Field Selection:> db.users.find({origin:"org.newhart.example.AuditUsers"},{labels:1})

Arrays> db.users.find({labels:"authenticate"});

Thursday, April 12, 12

Skip and Limit

> db.users.find({labels:"failed"},{"data.loginName.text":1}).sort({"data.loginName.text":1}).skip(5).limit(3)

Thursday, April 12, 12

Updates

• db.users.save({"_id" : ObjectId("4f84fecae2f8776e5cdc3ba7"), "origin" : "Clint"})

• update if exists, otherwise insert

• http://www.mongodb.org/display/DOCS/Updating

Thursday, April 12, 12

Delete

remove all items> db.users.remove()

same as above> db.users.remove({})

remove all items with minor criticality> db.users.remove({criticality:"minor"})

Thursday, April 12, 12

Capped Collections

• Fixed-size, FIFO collections

• create:db.createCollection(“users”,{capped:true, size:1000000});

• size is number of bytes, including database headers

Thursday, April 12, 12

Sorting

> db.users.find({labels:"failed"},{"data.loginName.text":1}).sort({"data.loginName.text":1})

> db.users.find().sort({$natural:-1}).limit(1)

Thursday, April 12, 12

Indexes> db.users.ensureIndex({origin:1})

Unique Index> db.contacts.ensureIndex({email:1},{unique:true})

Indexing Embedded Fields> db.users.ensureIndex({"data.loginName.text":1})

Compound Keys> db.contacts.ensureIndex({last:1, first:1})

Thursday, April 12, 12

Optimization with Explain

> db.users.find({"data.loginName.text":"mary2012"}).explain(){ "cursor" : "BtreeCursor data.loginName.text_1", "nscanned" : 6, "nscannedObjects" : 6, "n" : 6, "millis" : 0, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : false, "indexOnly" : false, "indexBounds" : { "data.loginName.text" : [ [ "mary2012", "mary2012" ] ] }}

Thursday, April 12, 12

Fault Tolerance & ScalingReplica Sets and Sharding

Thursday, April 12, 12

Replica Sets

R1

R2

R3

R4

Arbiter

Client

Thursday, April 12, 12

Replica Sets

R1 R3

R4

Arbiter

Client

Thursday, April 12, 12

Replica Sets

R1 R3

R4

Arbiter

Client

R2mongod

logs

data

journal

file lock

Thursday, April 12, 12

Replica Sets

R1 R3

R4

Arbiter

Client

Thursday, April 12, 12

Replica Sets

R1

R2

R3

R4

Arbiter

Client

Thursday, April 12, 12

Replica Sets

R1

R2

R3

R4

Arbiter

Client

R2

Thursday, April 12, 12

Replica Sets

R1

R2

R3

R4

Arbiter

Client

R3

R2

Thursday, April 12, 12

Replica Sets

R1

R2

R3

R4

Arbiter

Client

R3R3

R2

Thursday, April 12, 12

Replica Sets

R1

R2

R3

R4

Arbiter

Client

R3

R2

R3

R2

Thursday, April 12, 12

Replica Sets

R1

R2

R3

R4

Arbiter

Client

R2

R3

R2

R3

R2

Thursday, April 12, 12

Replica Sets

R1

R2

R3

R4

Arbiter

Client

Thursday, April 12, 12

Replica Sets

R1

R2

R3

R4

Arbiter

Client

Thursday, April 12, 12

Replica Sets

R1

R2

R3

R4

Arbiter

Client

Thursday, April 12, 12

Replica Sets

R1

R2

R3

R4

Arbiter

Client

Thursday, April 12, 12

slaveOk

• used when querying a replica set

• drivers route requests to master by default

• set slaveOk to query a secondary member of the replica set

• rs.slaveOk()

Thursday, April 12, 12

Read from Secondary

R1

R2

R3

R4

Arbiter

Client

Thursday, April 12, 12

Read from Secondary

R1

R2

R3

R4

Arbiter

Client

Thursday, April 12, 12

Read from Secondary

R1

R2

R3

R4

Arbiter

Client

Thursday, April 12, 12

Read from Secondary

R1

R2

R3

R4

Arbiter

Client

Thursday, April 12, 12

Sharding

Client

R1

R2

R3

R1

R2

R3

R1

R2

R3

R1

R2

R3

Replica Set

mongod c2mongos mongos mongos

mongod c1

Shard 1A-F

Shard 2G-L

Shard 3M-R

Shard 4T-Z

Thursday, April 12, 12

UI Clients

• MongoVUE

• JMongoBrowser

• others...

Thursday, April 12, 12

Drivers

• Standard Java Driver: http://github.com/mongodb/mongo-java-driver

• Hammersmith: high-performance async Java driver (Akka 2.0 durable mailboxes)https://github.com/bwmcadams/hammersmith

• Casbah Scala Driver:http://github.com/mongodb/casbah

Thursday, April 12, 12

mongodb.org Drivers

• C

• C++

• Erlang

• Haskell

• JavaScript

• .NET

• Perl

• PHP

• Python

• Ruby

Thursday, April 12, 12

Community Drivers

• ActionScript

• Clojure

• ColdFusion

• D

• Dart

• Fantom

• F#

• Go

• Groovy

• Lisp

• Smalltalk

• and more...

Thursday, April 12, 12

Quick Reference Cards and Cookbook

• http://www.10gen.com/reference

• Commands

• Queries

• Indexing

• Replica Sets

• http://cookbook.mongodb.org/

Thursday, April 12, 12

Summary

• MongoDB is one of many persistence alternatives

• Document-Oriented

• Big and Fast

• Flexible - no schema

Thursday, April 12, 12

Contact Me

• Twitter: @ClintCombs

• Presentation posted at:http://ccombs.net/presentations

Thursday, April 12, 12