A Morning with MongoDB Barcelona: MongoDB Basic Concepts

Post on 12-May-2015

1.771 views 0 download

Tags:

description

http://www.10gen.com/events/MongoDB-Morning-Barcelona

Transcript of A Morning with MongoDB Barcelona: MongoDB Basic Concepts

MongoDB Basic Concepts

Norberto Leite

Senior Solutions Architect, EMEAnorberto@10gen.com

@nleite

Sunday, 21 October 12

Agenda

•Overview•Replication•Scalability•Consistency & Durability•Flexibility, Developer Experienc

Sunday, 21 October 12

http://bit.ly/OT71M4

Your data needs started here...

Sunday, 21 October 12

http://bit.ly/Oxcsis

...but soon you had to be here

Sunday, 21 October 12

Basic Concepts

Horizontally Scalable

{ author : “steve”, date : new Date(), text : “About MongoDB...”, tags : [“tech”, “database”]}

Document Oriented

Application

Fully Consistent

High Performance

Sunday, 21 October 12

depth of functionality

scal

abili

ty &

per

form

ance •memcached

•key/value

• RDBMS

Tradeoff: Scale vs Functionality

Sunday, 21 October 12

Replication

Sunday, 21 October 12

Why do we need replication

•Failover •Backups•Secondary batch jobs •High availability

Sunday, 21 October 12

Replica SetsData Availability across nodes• Data Protection

• Multiple copies of the data• Spread across Data Centers, AZs

• High Availability• Automated Failover• Automated Recovery

Sunday, 21 October 12

Replica Sets

Primary

Secondary

Secondary

Read

Write

Read

Read

App

Asynchronous Replication

Sunday, 21 October 12

Replica Sets

Primary

Secondary

Secondary

Read

Write

Read

Read

App

Sunday, 21 October 12

Replica Sets

Primary

Primary

Secondary

Read

Write

Read

Automatic Election of new Primary

App

Sunday, 21 October 12

Replica Sets

Recovering

Primary

Secondary

Read

Write

Read

New primary serves data

App

Sunday, 21 October 12

Replica Sets

Secondary

Primary

Secondary

Read

Write

Read

Read

App

Sunday, 21 October 12

Scalability

Sunday, 21 October 12

Horizontal Scalability

Sunday, 21 October 12

ShardingData Distribution across nodes• Data location transparent to your code• Data distribution is automatic• Data re-distribution is automatic• Aggregate system resources horizontally• No code changes

Sunday, 21 October 12

Sharding - Range distribution

shard01 shard02 shard03

sh.shardCollection("test.tweets", {_id: 1} , false)

Sunday, 21 October 12

Sharding - Range distribution

shard01 shard02 shard03

a-i j-r s-z

Sunday, 21 October 12

Sharding - Splits

shard01 shard02 shard03

a-i ja-jz s-z

k-r

Sunday, 21 October 12

Sharding - Splits

shard01 shard02 shard03

a-i ja-ji s-z

ji-js

js-jw

jz-r

Sunday, 21 October 12

Sharding - Auto Balancing

shard01 shard02 shard03

a-i ja-ji s-z

ji-js

js-jw

jz-r

js-jw

jz-r

Sunday, 21 October 12

Sharding - Auto Balancing

shard01 shard02 shard03

a-i ja-ji n-z

ji-js

js-jw

jz-r

Sunday, 21 October 12

Sharding - Routed Query

shard01 shard02 shard03

a-i ja-ji n-z

ji-js

js-jw

jz-r

find({_id: "norberto"})

Sunday, 21 October 12

Sharding - Routed Query

shard01 shard02 shard03

a-i ja-ji n-z

ji-js

find({_id: "norberto"})

js-jw

jz-r

Sunday, 21 October 12

Sharding - Scatter Gather

shard01 shard02 shard03

a-i ja-ji n-z

ji-js

js-jw

jz-r

find({email: "norberto@10gen.com"})

Sunday, 21 October 12

Sharding - Scatter Gather

shard01 shard02 shard03

a-i ja-ji n-z

ji-js

js-jw

jz-r

find({email: "norberto@10gen.com"})

Sunday, 21 October 12

Sharding - Caching

shard01

a-i

j-r

n-z

300

GB

Dat

a

300 GB

96 GB Mem3:1 Data/Mem

Sunday, 21 October 12

Aggregate Horizontal Resources

shard01 shard02 shard03

a-i j-r n-z

96 GB Mem1:1 Data/Mem

100 GB 100 GB 100 GB

300

GB

Dat

a

96 GB Mem1:1 Data/Mem

96 GB Mem1:1 Data/Mem

Sunday, 21 October 12

Consistency & Durability

Sunday, 21 October 12

Two choices for consistency

•Eventual consistency•Allow updates when a system has been partitioned•Resolve conflicts later•Example: CouchDB, Cassandra

•Immediate consistency•Limit the application of updates to a single master node for a given slice of data

•Another node can take over after a failure is detected•Avoids the possibility of conflicts•Example: MongoDB

Sunday, 21 October 12

Durability

•For how long is my data available?•When do I now that my data is safe?•Where?

•Mongodb style•Fire and Forget•Get Last Error•Journal Sync•Replica Safe

Sunday, 21 October 12

Fire and Forget

Driver Primary

Sunday, 21 October 12

Fire and Forget

Driver Primary

write

Sunday, 21 October 12

Fire and Forget

Driver Primary

write

apply in memory

Sunday, 21 October 12

Get Last Error

Driver Primary

Sunday, 21 October 12

Get Last Error

Driver Primary

getLastError

write

Sunday, 21 October 12

Get Last Error

Driver Primary

getLastErrorapply in memory

write

Sunday, 21 October 12

Get Last Error

Driver Primary

getLastErrorapply in memory

write

Sunday, 21 October 12

Journal Sync

Driver Primary

Sunday, 21 October 12

Journal Sync

Driver Primary

getLastErrorapply in memory

write

j:true

Sunday, 21 October 12

Journal Sync

Driver Primary

getLastErrorapply in memory

write

j:true

write to journal

Sunday, 21 October 12

Journal Sync

Driver Primary

getLastErrorapply in memory

write

j:true

write to journal

Sunday, 21 October 12

Replicas Safe

Driver Primary Secondary

Sunday, 21 October 12

Replicas Safe

Driver Primary

getLastErrorapply in memory

write

w:2

Secondary

Sunday, 21 October 12

Replicas Safe

Driver Primary

getLastErrorapply in memory

write

w:2

replicate

Secondary

Sunday, 21 October 12

Replicas Safe

Driver Primary

getLastErrorapply in memory

write

w:2

replicate

Secondary

Sunday, 21 October 12

Flexibility

Sunday, 21 October 12

• Cost effective operationalize abundant data (clickstreams, logs, tweets, ...)

• Relaxed transactional semantics enable easy scale out

• Auto Sharding for scale down and scale up

• Applications store complex data that is easier to model as documents

• Schemaless DB enables faster development cycles

What MongoDB solves

Agility

Flexibility

Cost

Sunday, 21 October 12

Challenges for Databases

✓ Build a database for scaleout• Run on clusters of 100s of commodity machines

•… that enables agile development

•… and is usable for a broad variety of applications

Sunday, 21 October 12

Data Model

• Why JSON?• Provides a simple, well understood encapsulation of data• Maps simply to the object in your OO language• Linking & Embedding to describe relationships

Sunday, 21 October 12

Json

place1 = { name : "10gen HQ", address : "578 Broadway 7th Floor", city : "New York", zip : "10011", tags : [ "business", "tech" ]}

Sunday, 21 October 12

Schema DesignRelational Database

Sunday, 21 October 12

Schema DesignMongoDB embedding

linkingSunday, 21 October 12

Schemas in MongoDB

Design documents that simply map to your application

post = {author: "Hergé", date: new Date(), text: "Destination Moon", tags: ["comic", "adventure"]}

> db.posts.save(post)

Sunday, 21 October 12

> db.blogs.find( { author: "Hergé"} )

{ _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "Hergé", date : ISODate("2011-09-18T09:56:06.298Z"), text : "Destination Moon", tags : [ "comic", "adventure" ], comments : [! {! ! author : "Kyle",! ! date : ISODate("2011-09-19T09:56:06.298Z"),! ! text : "great book"! } ] }

Embedding

Sunday, 21 October 12

JSON & Scaleout

• Embedding removes need for• Distributed Joins• Two Phase commit

• Enables data to be distributed across many nodes without penalty

Sunday, 21 October 12

Sunday, 21 October 12