Scaling for Humongous amounts of data with MongoDB...Scaling for Humongous amounts of data with...

56
Scaling for Hu mongo us amounts of data with MongoDB Alvin Richards Technical Director [email protected] @ jonnyeight alvinonmongodb.com

Transcript of Scaling for Humongous amounts of data with MongoDB...Scaling for Humongous amounts of data with...

Page 1: Scaling for Humongous amounts of data with MongoDB...Scaling for Humongous amounts of data with MongoDB Alvin Richards Technical Director alvin@10gen.com @jonnyeight alvinonmongodb.com

Scaling for Humongous amounts of data with

MongoDB

Alvin Richards Technical Director [email protected]

@jonnyeight alvinonmongodb.com

Page 2: Scaling for Humongous amounts of data with MongoDB...Scaling for Humongous amounts of data with MongoDB Alvin Richards Technical Director alvin@10gen.com @jonnyeight alvinonmongodb.com

2 http://bit.ly/OT71M4

Getting from here to there...

Page 3: Scaling for Humongous amounts of data with MongoDB...Scaling for Humongous amounts of data with MongoDB Alvin Richards Technical Director alvin@10gen.com @jonnyeight alvinonmongodb.com

3

...probably using one of these

http://bit.ly/QDUIUF

Page 4: Scaling for Humongous amounts of data with MongoDB...Scaling for Humongous amounts of data with MongoDB Alvin Richards Technical Director alvin@10gen.com @jonnyeight alvinonmongodb.com

4

Why NoSQL at all?

Page 5: Scaling for Humongous amounts of data with MongoDB...Scaling for Humongous amounts of data with MongoDB Alvin Richards Technical Director alvin@10gen.com @jonnyeight alvinonmongodb.com

5

0 5,000

10,000 15,000 20,000 25,000 30,000

1998

2000

2008

2012

Indexed Pages

Pages Index (Million)

http://bit.ly/VDkDN2 http://bit.ly/108jTHN http://bit.ly/Wt3fl7 http://bit.ly/Qmg8YD

Growth? Scaling? Cost? Flexibility?

Page 6: Scaling for Humongous amounts of data with MongoDB...Scaling for Humongous amounts of data with MongoDB Alvin Richards Technical Director alvin@10gen.com @jonnyeight alvinonmongodb.com

6

•  Build a database for scaleout –  Run on clusters of 100s of commodity machines

•  … that enables agile development

•  … and is usable for a broad variety of applications

Need a Database that...

Page 7: Scaling for Humongous amounts of data with MongoDB...Scaling for Humongous amounts of data with MongoDB Alvin Richards Technical Director alvin@10gen.com @jonnyeight alvinonmongodb.com

7

•  Partitioning of Data –  Hashes (Dynamo) vs Ranges (Big Table) –  Physical vs Logical segments

•  Consistency –  Eventually

•  Multi Master updates, resolve conflicts later –  Immediately

•  Single Master updates, always consistent

Is Scaleout Mission Impossible?

Page 8: Scaling for Humongous amounts of data with MongoDB...Scaling for Humongous amounts of data with MongoDB Alvin Richards Technical Director alvin@10gen.com @jonnyeight alvinonmongodb.com

8

NoSQL and MongoDB

Page 9: Scaling for Humongous amounts of data with MongoDB...Scaling for Humongous amounts of data with MongoDB Alvin Richards Technical Director alvin@10gen.com @jonnyeight alvinonmongodb.com

9

depth of functionality scal

abili

ty &

per

form

ance • memcached

•  key/value

• RDBMS

Tradeoff: Scale vs Functionality

Page 10: Scaling for Humongous amounts of data with MongoDB...Scaling for Humongous amounts of data with MongoDB Alvin Richards Technical Director alvin@10gen.com @jonnyeight alvinonmongodb.com

10

•  Cost effective operationalize abundant data (clickstreams, logs, tweets, ...)

•  Relaxed transactional semantics enable easy scale out •  Auto Sharding for scale down and scale up

•  Applications store complex data that is easier to model as documents •  Schemaless DB enables faster development cycles

What MongoDB solves

Agility

Flexibility

Cost

Page 11: Scaling for Humongous amounts of data with MongoDB...Scaling for Humongous amounts of data with MongoDB Alvin Richards Technical Director alvin@10gen.com @jonnyeight alvinonmongodb.com

11

•  Build a database for scaleout –  Run on clusters of 100s of commodity machines

•  … that enables agile development

•  … and is usable for a broad variety of applications

How does MongoDB shape up?

Page 12: Scaling for Humongous amounts of data with MongoDB...Scaling for Humongous amounts of data with MongoDB Alvin Richards Technical Director alvin@10gen.com @jonnyeight alvinonmongodb.com

12

Purpose: •  Aggregate system resources horizontally •  Scaling writes •  Scaling consistent reads

Goals: •  Data location transparent to your code

•  Data distribution is automatic •  No code changes required

Data Distribution across nodes - Sharding

Page 13: Scaling for Humongous amounts of data with MongoDB...Scaling for Humongous amounts of data with MongoDB Alvin Richards Technical Director alvin@10gen.com @jonnyeight alvinonmongodb.com

13

Sharding - Range distribution

shard01 shard02 shard03

sh.shardCollection("test.tweets",3{_id:31}3,3false)3

Page 14: Scaling for Humongous amounts of data with MongoDB...Scaling for Humongous amounts of data with MongoDB Alvin Richards Technical Director alvin@10gen.com @jonnyeight alvinonmongodb.com

14

Sharding - Range distribution

shard01 shard02 shard03

a-i j-r s-z

Page 15: Scaling for Humongous amounts of data with MongoDB...Scaling for Humongous amounts of data with MongoDB Alvin Richards Technical Director alvin@10gen.com @jonnyeight alvinonmongodb.com

15

Sharding - Splits

shard01 shard02 shard03

a-i ja-jz s-z k-r

Page 16: Scaling for Humongous amounts of data with MongoDB...Scaling for Humongous amounts of data with MongoDB Alvin Richards Technical Director alvin@10gen.com @jonnyeight alvinonmongodb.com

16

Sharding - Splits

shard01 shard02 shard03

a-i ja-ji s-z ji-js js-jw jz-r

Page 17: Scaling for Humongous amounts of data with MongoDB...Scaling for Humongous amounts of data with MongoDB Alvin Richards Technical Director alvin@10gen.com @jonnyeight alvinonmongodb.com

17

Sharding - Auto Balancing

shard01 shard02 shard03

a-i ja-ji s-z ji-js js-jw jz-r

js-jw jz-r

Page 18: Scaling for Humongous amounts of data with MongoDB...Scaling for Humongous amounts of data with MongoDB Alvin Richards Technical Director alvin@10gen.com @jonnyeight alvinonmongodb.com

18

Sharding - Goal Equilibrium

shard01 shard02 shard03

a-i ja-ji s-z ji-js

js-jw jz-r

Page 19: Scaling for Humongous amounts of data with MongoDB...Scaling for Humongous amounts of data with MongoDB Alvin Richards Technical Director alvin@10gen.com @jonnyeight alvinonmongodb.com

19

Sharding - Find by Key

shard01 shard02 shard03

a-i ja-ji s-z ji-js

js-jw jz-r

find({_id:3"alvin"})3

Page 20: Scaling for Humongous amounts of data with MongoDB...Scaling for Humongous amounts of data with MongoDB Alvin Richards Technical Director alvin@10gen.com @jonnyeight alvinonmongodb.com

20

Sharding - Find by Key

shard01 shard02 shard03

a-i ja-ji s-z ji-js

js-jw jz-r

find({_id:3"alvin"})3

Page 21: Scaling for Humongous amounts of data with MongoDB...Scaling for Humongous amounts of data with MongoDB Alvin Richards Technical Director alvin@10gen.com @jonnyeight alvinonmongodb.com

21

Sharding - Find by Attribute

shard01 shard02 shard03

a-i ja-ji s-z ji-js

js-jw jz-r

find({email:3"[email protected]"})3

Page 22: Scaling for Humongous amounts of data with MongoDB...Scaling for Humongous amounts of data with MongoDB Alvin Richards Technical Director alvin@10gen.com @jonnyeight alvinonmongodb.com

22

Sharding - Find by Attribute

shard01 shard02 shard03

a-i ja-ji s-z ji-js

js-jw jz-r

find({email:3"[email protected]"})3

Page 23: Scaling for Humongous amounts of data with MongoDB...Scaling for Humongous amounts of data with MongoDB Alvin Richards Technical Director alvin@10gen.com @jonnyeight alvinonmongodb.com

23

Sharding - Caching

shard01

a-i j-r s-z

300

GB

Dat

a

300 GB

96 GB Mem 3:1 Data/Mem

Page 24: Scaling for Humongous amounts of data with MongoDB...Scaling for Humongous amounts of data with MongoDB Alvin Richards Technical Director alvin@10gen.com @jonnyeight alvinonmongodb.com

24

Aggregate Horizontal Resources

shard01 shard02 shard03

a-i j-r s-z

96 GB Mem 1:1 Data/Mem

100 GB 100 GB 100 GB

300

GB

Dat

a

96 GB Mem 1:1 Data/Mem

96 GB Mem 1:1 Data/Mem

j-r s-z

Page 25: Scaling for Humongous amounts of data with MongoDB...Scaling for Humongous amounts of data with MongoDB Alvin Richards Technical Director alvin@10gen.com @jonnyeight alvinonmongodb.com

25

•  Partitions data across many nodes –  Scales Read & Writes

•  What happens if a node fails? –  Data in that partition is lost

•  Must have copies of partition across –  Nodes –  Data Centers –  Geographic regions

Sharding

Page 26: Scaling for Humongous amounts of data with MongoDB...Scaling for Humongous amounts of data with MongoDB Alvin Richards Technical Director alvin@10gen.com @jonnyeight alvinonmongodb.com

26

Replica Sets

Primary

Secondary

Secondary

Read

Write App Asynchronous Replication

Page 27: Scaling for Humongous amounts of data with MongoDB...Scaling for Humongous amounts of data with MongoDB Alvin Richards Technical Director alvin@10gen.com @jonnyeight alvinonmongodb.com

27

Replica Sets

Primary

Secondary

Secondary

App

Read

Write

Page 28: Scaling for Humongous amounts of data with MongoDB...Scaling for Humongous amounts of data with MongoDB Alvin Richards Technical Director alvin@10gen.com @jonnyeight alvinonmongodb.com

28

Replica Sets

Primary

Primary

Secondary

Automatic Election of new Primary

App

Read

Write

Page 29: Scaling for Humongous amounts of data with MongoDB...Scaling for Humongous amounts of data with MongoDB Alvin Richards Technical Director alvin@10gen.com @jonnyeight alvinonmongodb.com

29

Replica Sets

Recovering

Primary

Secondary

Read

Write New primary serves data

App

Page 30: Scaling for Humongous amounts of data with MongoDB...Scaling for Humongous amounts of data with MongoDB Alvin Richards Technical Director alvin@10gen.com @jonnyeight alvinonmongodb.com

30

Replica Sets

Secondary

Primary

Secondary

Read

Write

App

Page 31: Scaling for Humongous amounts of data with MongoDB...Scaling for Humongous amounts of data with MongoDB Alvin Richards Technical Director alvin@10gen.com @jonnyeight alvinonmongodb.com

31

Scale Eventually Consistent Reads

Secondary

Primary

Secondary

Read

Write

Read

Read

App

Page 32: Scaling for Humongous amounts of data with MongoDB...Scaling for Humongous amounts of data with MongoDB Alvin Richards Technical Director alvin@10gen.com @jonnyeight alvinonmongodb.com

32

•  Read Preferences –  PRIMARY, PRIMARY PREFERRED !–  SECONDARY, SECONDARY PREFERRED !–  NEAREST

Java example ReadPreference pref = ReadPreference.primaryPreferred();! DBCursor cur = new DBCursor(collection, query, ! null, pref); !

Eventual Consistency Using Replicas for Read Scaling

Page 33: Scaling for Humongous amounts of data with MongoDB...Scaling for Humongous amounts of data with MongoDB Alvin Richards Technical Director alvin@10gen.com @jonnyeight alvinonmongodb.com

33

Immediate Consistency

Primary Thread #1

Insert

Update

Read

Read

v1

✔"

✔"

v2

Page 34: Scaling for Humongous amounts of data with MongoDB...Scaling for Humongous amounts of data with MongoDB Alvin Richards Technical Director alvin@10gen.com @jonnyeight alvinonmongodb.com

34

Eventual Consistency

Primary Secondary Thread #1

Insert

Read

v1

Thread #2

✔"v1 ✖"

v1 does not exist

✔"reads v1

Page 35: Scaling for Humongous amounts of data with MongoDB...Scaling for Humongous amounts of data with MongoDB Alvin Richards Technical Director alvin@10gen.com @jonnyeight alvinonmongodb.com

35

Eventual Consistency

Primary Secondary Thread #1

Insert

Update

Read

Read

v1

Thread #2

✔"

✔"

v1 ✖"

✖"v2

v2

reads v1

v1 does not exist

✔" reads v2

✔"reads v1

Page 36: Scaling for Humongous amounts of data with MongoDB...Scaling for Humongous amounts of data with MongoDB Alvin Richards Technical Director alvin@10gen.com @jonnyeight alvinonmongodb.com

36

Tunable Data Durability Memory Journal Secondary Other Data Center

RDBMS

fire & forget

w=1

w=1 j=true w="majority" w=n

w="myTag"

Less More

async

sync

Page 37: Scaling for Humongous amounts of data with MongoDB...Scaling for Humongous amounts of data with MongoDB Alvin Richards Technical Director alvin@10gen.com @jonnyeight alvinonmongodb.com

37

•  Capped Collections –  Limit data by size, acts as a circular buffer / FIFO –  Use cases: Audit, history, logs

•  Time To Live (TTL) collections –  Expire data based on timestamp –  Use cases: Archiving, purging, sessions

•  Text Search –  Search by word, phrase, stemming, stop words –  Use cases: Consistent text search

Other MongoDB features

Page 38: Scaling for Humongous amounts of data with MongoDB...Scaling for Humongous amounts of data with MongoDB Alvin Richards Technical Director alvin@10gen.com @jonnyeight alvinonmongodb.com

38

✓ Build a database for scaleout –  Run on clusters of 100s of commodity machines

•  … that enables agile development

•  … and is usable for a broad variety of applications

How does MongoDB shape up?

Page 39: Scaling for Humongous amounts of data with MongoDB...Scaling for Humongous amounts of data with MongoDB Alvin Richards Technical Director alvin@10gen.com @jonnyeight alvinonmongodb.com

39

•  Why JSON? –  Simple, well understood encapsulation of data –  Maps simply to objects in your OO language –  Linking & Embedding to describe relationships

Data Model

Page 40: Scaling for Humongous amounts of data with MongoDB...Scaling for Humongous amounts of data with MongoDB Alvin Richards Technical Director alvin@10gen.com @jonnyeight alvinonmongodb.com

40

Why Mess with the Data Model?

Page 41: Scaling for Humongous amounts of data with MongoDB...Scaling for Humongous amounts of data with MongoDB Alvin Richards Technical Director alvin@10gen.com @jonnyeight alvinonmongodb.com

41

posts

authors comments

Mapping Objects to RDBMS select * !from posts p, ! authors a, ! comments c !where p.author_id = a.id! and p.id = c.post_id! and p.id = 123 !!

start transaction !!insert into comments (...) !!update posts ! set comment_count = comment_count + 1 ! where post_id = 123 !!commit !!

Page 42: Scaling for Humongous amounts of data with MongoDB...Scaling for Humongous amounts of data with MongoDB Alvin Richards Technical Director alvin@10gen.com @jonnyeight alvinonmongodb.com

42

posts

authors comments

server a server b server c

Mapping Objects to Distributed RDBMS select * !from posts p, ! authors a, ! comments c !where p.author_id = a.id! and p.id = c.post_id! and p.id = 123 !!

start transaction !!insert into comments (...) !!update posts ! set comment_count = comment_count + 1 ! where post_id = 123 !!commit !!

Page 43: Scaling for Humongous amounts of data with MongoDB...Scaling for Humongous amounts of data with MongoDB Alvin Richards Technical Director alvin@10gen.com @jonnyeight alvinonmongodb.com

43

Same Schema in MongoDB

embedding linking

Page 44: Scaling for Humongous amounts of data with MongoDB...Scaling for Humongous amounts of data with MongoDB Alvin Richards Technical Director alvin@10gen.com @jonnyeight alvinonmongodb.com

44

posts

db.posts.find({_id:3123})3!

db.posts.update(333{_id:3123},333{3"$push":3{comments:3new_comment},33333"$inc":33{comments_count:31}3}3)3author

comments comments comments

server a server b server c

Mapping Object with MongoDB

Page 45: Scaling for Humongous amounts of data with MongoDB...Scaling for Humongous amounts of data with MongoDB Alvin Richards Technical Director alvin@10gen.com @jonnyeight alvinonmongodb.com

45

•  Design documents that simply map to your application

post3=3{author:3"Hergé", 33333333date:3new3Date(), 33333333text:3"Destination3Moon", 33333333tags:3["comic",3"adventure"]} 3>3db.posts.save(post)

Schemas in MongoDB

Page 46: Scaling for Humongous amounts of data with MongoDB...Scaling for Humongous amounts of data with MongoDB Alvin Richards Technical Director alvin@10gen.com @jonnyeight alvinonmongodb.com

46

// Find the object !> db.blogs.find( { text: "Destination Moon" } )!!// Find posts with tags !> db.blogs.find( { tags: { $exists: true } } )!!// Regular expressions: posts where author starts with h!> db.blogs.find( { author: /^h/i } ) !!// Counting: number of posts written by Hergé!> db.blogs.find( { author: "Hergé" } ).count() !

Examples

Page 47: Scaling for Humongous amounts of data with MongoDB...Scaling for Humongous amounts of data with MongoDB Alvin Richards Technical Director alvin@10gen.com @jonnyeight alvinonmongodb.com

47

•  Conditional Query Operators –  Scalar: $ne, $mod, $exists, $type, $lt, $lte, $gt, $gte,

$ne –  Vector: $in, $nin, $all, $size

•  Atomic Update Operators –  Scalar: $inc, $set, $unset –  Vector: $push, $pop, $pull, $pushAll, $pullAll,

$addToSet

Data Manipulation

Page 48: Scaling for Humongous amounts of data with MongoDB...Scaling for Humongous amounts of data with MongoDB Alvin Richards Technical Director alvin@10gen.com @jonnyeight alvinonmongodb.com

48

>3db.blogs.update(333333333333{3text:3"Destination3Moon"3},333333333333{3"$push":3{3comments:3new_comment3},33333333333333"$inc":33{3comments_count:313}3}3)3333{3_id:3ObjectId("4c4ba5c0672c685e5e8aabf3"),333333text:3"Destination3Moon",33333comments:3[3

333{33 3author:3"Kyle",33 3date:3ISODate("2011Z09Z19T09:56:06.298Z"),33 3text:3"great3book"3333}3

3333],33333comment_count:31333}3

Extending the schema

Page 49: Scaling for Humongous amounts of data with MongoDB...Scaling for Humongous amounts of data with MongoDB Alvin Richards Technical Director alvin@10gen.com @jonnyeight alvinonmongodb.com

49

✓ Build a database for scaleout –  Run on clusters of 100s of commodity machines

✓ … that enables agile development

•  … and is usable for a broad variety of applications

How does MongoDB shape up?

Page 50: Scaling for Humongous amounts of data with MongoDB...Scaling for Humongous amounts of data with MongoDB Alvin Richards Technical Director alvin@10gen.com @jonnyeight alvinonmongodb.com

50

Big Data = MongoDB = Solved

Data Hub User Data Management

Big Data Content Mgmt & Delivery Mobile & Social

Page 51: Scaling for Humongous amounts of data with MongoDB...Scaling for Humongous amounts of data with MongoDB Alvin Richards Technical Director alvin@10gen.com @jonnyeight alvinonmongodb.com

51

✓ Build a database for scaleout –  Run on clusters of 100s of commodity machines

✓ … that enables agile development

✓ … and is usable for a broad variety of applications

How does MongoDB shape up?

Page 52: Scaling for Humongous amounts of data with MongoDB...Scaling for Humongous amounts of data with MongoDB Alvin Richards Technical Director alvin@10gen.com @jonnyeight alvinonmongodb.com

52

10gen is the organization behind MongoDB

190+ employees 500+ customers

Over $81 million in funding Offices in New York, Palo Alto, Washington DC, London, Dublin, Barcelona and Sydney

Page 53: Scaling for Humongous amounts of data with MongoDB...Scaling for Humongous amounts of data with MongoDB Alvin Richards Technical Director alvin@10gen.com @jonnyeight alvinonmongodb.com

53

10gen Products and Services

Consulting Expert Resources for All Phases of MongoDB Implementations

Training Online and In-Person for Developers and Administrators

MongoDB Monitoring Service (MMS) Free, Cloud-Based Service for Monitoring and Alerts

Subscriptions Professional Support, Subscriber Edition and Commercial License

Page 54: Scaling for Humongous amounts of data with MongoDB...Scaling for Humongous amounts of data with MongoDB Alvin Richards Technical Director alvin@10gen.com @jonnyeight alvinonmongodb.com

54

Indeed.com Trends

Top Job Trends

1.  HTML 5 2.  MongoDB 3.  iOS 4.  Android 5.  Mobile Apps 6.  Puppet 7.  Hadoop 8.  jQuery 9.  PaaS 10.  Social Media

MongoDB is the Leading NoSQL Database

LinkedIn Job Skills

MongoDB

Competitor 1

Competitor 2

Competitor 3

Competitor 4

Competitor 5

All Others

Google Search

MongoDB

Competitor 1

Competitor 2

Competitor 3

Competitor 4

Jaspersoft Big Data Index

Direct Real-Time Downloads

MongoDB

Competitor 1

Competitor 2

Competitor 3

Page 55: Scaling for Humongous amounts of data with MongoDB...Scaling for Humongous amounts of data with MongoDB Alvin Richards Technical Director alvin@10gen.com @jonnyeight alvinonmongodb.com

55

The Evolution of MongoDB

2.2 Aug �12

2.4 March�13

2.0 Sept �11

1.8 March �11

Journaling Sharding and Replica set enhancements Spherical geo search

Index enhancements to improve size and performance Authentication with sharded clusters Replica Set Enhancements Concurrency improvements

Aggregation Framework Multi-Data Center Deployments Improved Performance and Concurrency

Kerberos/SASL Hash Shard Key V8 Intersecting polygons Aggregation enhancements Text Search

Page 56: Scaling for Humongous amounts of data with MongoDB...Scaling for Humongous amounts of data with MongoDB Alvin Richards Technical Director alvin@10gen.com @jonnyeight alvinonmongodb.com

@mongodb(

Drop(by(on(the(5th(floor(and(meet(an(Engineer(and(

Get(a(discount(code(for(MongoDB(London(April(9th(

http://bit.ly/mongoC((Facebook((((((((((|(((((((((Twitter(((((((((|(((((((((LinkedIn(

http://linkd.in/joinmongo(

download at mongodb.org