Scaling for Humongous amounts of data with MongoDB

56
Scaling for Hu mongo us amounts of data with MongoDB Alvin Richards Technical Director [email protected] @ jonnyeight alvinonmongodb.com

description

Its always exciting when your first customer signs up. Before long, you have your millionth signup and your data needs to span multiple machines across several geographies. The business wants more information, better accuracy and faster. What do you do?In this talk we will cover:• Basic Schema Design• Schema design for scaling: bucketing, time series, linking versus embedding• Implications of sharding keys with alternatives• Read scaling through replication• Challenges of eventual consistency

Transcript of Scaling for Humongous amounts of data with MongoDB

Page 1: Scaling for Humongous amounts of data with MongoDB

Scaling for Humongous amounts of data with

MongoDB

Alvin Richards Technical Director [email protected]

@jonnyeight alvinonmongodb.com

Page 2: Scaling for Humongous amounts of data with MongoDB

2 http://bit.ly/OT71M4

Getting from here to there...

Page 3: Scaling for Humongous amounts of data with MongoDB

3

...probably using one of these

http://bit.ly/QDUIUF

Page 4: Scaling for Humongous amounts of data with MongoDB

4

Why NoSQL at all?

Page 5: Scaling for Humongous amounts of data with MongoDB

5

0 5,000

10,000 15,000 20,000 25,000 30,000

1998

2000

2008

2012

Indexed Pages

Pages Index (Million)

http://bit.ly/VDkDN2 http://bit.ly/108jTHN http://bit.ly/Wt3fl7 http://bit.ly/Qmg8YD

Growth? Scaling? Cost? Flexibility?

Page 6: Scaling for Humongous amounts of data with MongoDB

6

•  Build a database for scaleout –  Run on clusters of 100s of commodity machines

•  … that enables agile development

•  … and is usable for a broad variety of applications

Need a Database that...

Page 7: Scaling for Humongous amounts of data with MongoDB

7

•  Partitioning of Data –  Hashes (Dynamo) vs Ranges (Big Table) –  Physical vs Logical segments

•  Consistency –  Eventually

•  Multi Master updates, resolve conflicts later –  Immediately

•  Single Master updates, always consistent

Is Scaleout Mission Impossible?

Page 8: Scaling for Humongous amounts of data with MongoDB

8

NoSQL and MongoDB

Page 9: Scaling for Humongous amounts of data with MongoDB

9

depth of functionality scal

abili

ty &

per

form

ance • memcached

•  key/value

• RDBMS

Tradeoff: Scale vs Functionality

Page 10: Scaling for Humongous amounts of data with MongoDB

10

•  Cost effective operationalize abundant data (clickstreams, logs, tweets, ...)

•  Relaxed transactional semantics enable easy scale out •  Auto Sharding for scale down and scale up

•  Applications store complex data that is easier to model as documents •  Schemaless DB enables faster development cycles

What MongoDB solves

Agility

Flexibility

Cost

Page 11: Scaling for Humongous amounts of data with MongoDB

11

•  Build a database for scaleout –  Run on clusters of 100s of commodity machines

•  … that enables agile development

•  … and is usable for a broad variety of applications

How does MongoDB shape up?

Page 12: Scaling for Humongous amounts of data with MongoDB

12

Purpose: •  Aggregate system resources horizontally •  Scaling writes •  Scaling consistent reads

Goals: •  Data location transparent to your code

•  Data distribution is automatic •  No code changes required

Data Distribution across nodes - Sharding

Page 13: Scaling for Humongous amounts of data with MongoDB

13

Sharding - Range distribution

shard01 shard02 shard03

sh.shardCollection("test.tweets",3{_id:31}3,3false)3

Page 14: Scaling for Humongous amounts of data with MongoDB

14

Sharding - Range distribution

shard01 shard02 shard03

a-i j-r s-z

Page 15: Scaling for Humongous amounts of data with MongoDB

15

Sharding - Splits

shard01 shard02 shard03

a-i ja-jz s-z k-r

Page 16: Scaling for Humongous amounts of data with MongoDB

16

Sharding - Splits

shard01 shard02 shard03

a-i ja-ji s-z ji-js js-jw jz-r

Page 17: Scaling for Humongous amounts of data with MongoDB

17

Sharding - Auto Balancing

shard01 shard02 shard03

a-i ja-ji s-z ji-js js-jw jz-r

js-jw jz-r

Page 18: Scaling for Humongous amounts of data with MongoDB

18

Sharding - Goal Equilibrium

shard01 shard02 shard03

a-i ja-ji s-z ji-js

js-jw jz-r

Page 19: Scaling for Humongous amounts of data with MongoDB

19

Sharding - Find by Key

shard01 shard02 shard03

a-i ja-ji s-z ji-js

js-jw jz-r

find({_id:3"alvin"})3

Page 20: Scaling for Humongous amounts of data with MongoDB

20

Sharding - Find by Key

shard01 shard02 shard03

a-i ja-ji s-z ji-js

js-jw jz-r

find({_id:3"alvin"})3

Page 21: Scaling for Humongous amounts of data with MongoDB

21

Sharding - Find by Attribute

shard01 shard02 shard03

a-i ja-ji s-z ji-js

js-jw jz-r

find({email:3"[email protected]"})3

Page 22: Scaling for Humongous amounts of data with MongoDB

22

Sharding - Find by Attribute

shard01 shard02 shard03

a-i ja-ji s-z ji-js

js-jw jz-r

find({email:3"[email protected]"})3

Page 23: Scaling for Humongous amounts of data with MongoDB

23

Sharding - Caching

shard01

a-i j-r s-z

300

GB

Dat

a

300 GB

96 GB Mem 3:1 Data/Mem

Page 24: Scaling for Humongous amounts of data with MongoDB

24

Aggregate Horizontal Resources

shard01 shard02 shard03

a-i j-r s-z

96 GB Mem 1:1 Data/Mem

100 GB 100 GB 100 GB

300

GB

Dat

a

96 GB Mem 1:1 Data/Mem

96 GB Mem 1:1 Data/Mem

j-r s-z

Page 25: Scaling for Humongous amounts of data with MongoDB

25

•  Partitions data across many nodes –  Scales Read & Writes

•  What happens if a node fails? –  Data in that partition is lost

•  Must have copies of partition across –  Nodes –  Data Centers –  Geographic regions

Sharding

Page 26: Scaling for Humongous amounts of data with MongoDB

26

Replica Sets

Primary

Secondary

Secondary

Read

Write App Asynchronous Replication

Page 27: Scaling for Humongous amounts of data with MongoDB

27

Replica Sets

Primary

Secondary

Secondary

App

Read

Write

Page 28: Scaling for Humongous amounts of data with MongoDB

28

Replica Sets

Primary

Primary

Secondary

Automatic Election of new Primary

App

Read

Write

Page 29: Scaling for Humongous amounts of data with MongoDB

29

Replica Sets

Recovering

Primary

Secondary

Read

Write New primary serves data

App

Page 30: Scaling for Humongous amounts of data with MongoDB

30

Replica Sets

Secondary

Primary

Secondary

Read

Write

App

Page 31: Scaling for Humongous amounts of data with MongoDB

31

Scale Eventually Consistent Reads

Secondary

Primary

Secondary

Read

Write

Read

Read

App

Page 32: Scaling for Humongous amounts of data with MongoDB

32

•  Read Preferences –  PRIMARY, PRIMARY PREFERRED !–  SECONDARY, SECONDARY PREFERRED !–  NEAREST

Java example ReadPreference pref = ReadPreference.primaryPreferred();! DBCursor cur = new DBCursor(collection, query, ! null, pref); !

Eventual Consistency Using Replicas for Read Scaling

Page 33: Scaling for Humongous amounts of data with MongoDB

33

Immediate Consistency

Primary Thread #1

Insert

Update

Read

Read

v1

✔"

✔"

v2

Page 34: Scaling for Humongous amounts of data with MongoDB

34

Eventual Consistency

Primary Secondary Thread #1

Insert

Read

v1

Thread #2

✔"v1 ✖"

v1 does not exist

✔"reads v1

Page 35: Scaling for Humongous amounts of data with MongoDB

35

Eventual Consistency

Primary Secondary Thread #1

Insert

Update

Read

Read

v1

Thread #2

✔"

✔"

v1 ✖"

✖"v2

v2

reads v1

v1 does not exist

✔" reads v2

✔"reads v1

Page 36: Scaling for Humongous amounts of data with MongoDB

36

Tunable Data Durability Memory Journal Secondary Other Data Center

RDBMS

fire & forget

w=1

w=1 j=true w="majority" w=n

w="myTag"

Less More

async

sync

Page 37: Scaling for Humongous amounts of data with MongoDB

37

•  Capped Collections –  Limit data by size, acts as a circular buffer / FIFO –  Use cases: Audit, history, logs

•  Time To Live (TTL) collections –  Expire data based on timestamp –  Use cases: Archiving, purging, sessions

•  Text Search –  Search by word, phrase, stemming, stop words –  Use cases: Consistent text search

Other MongoDB features

Page 38: Scaling for Humongous amounts of data with MongoDB

38

✓ Build a database for scaleout –  Run on clusters of 100s of commodity machines

•  … that enables agile development

•  … and is usable for a broad variety of applications

How does MongoDB shape up?

Page 39: Scaling for Humongous amounts of data with MongoDB

39

•  Why JSON? –  Simple, well understood encapsulation of data –  Maps simply to objects in your OO language –  Linking & Embedding to describe relationships

Data Model

Page 40: Scaling for Humongous amounts of data with MongoDB

40

Why Mess with the Data Model?

Page 41: Scaling for Humongous amounts of data with MongoDB

41

posts

authors comments

Mapping Objects to RDBMS select * !from posts p, ! authors a, ! comments c !where p.author_id = a.id! and p.id = c.post_id! and p.id = 123 !!

start transaction !!insert into comments (...) !!update posts ! set comment_count = comment_count + 1 ! where post_id = 123 !!commit !!

Page 42: Scaling for Humongous amounts of data with MongoDB

42

posts

authors comments

server a server b server c

Mapping Objects to Distributed RDBMS select * !from posts p, ! authors a, ! comments c !where p.author_id = a.id! and p.id = c.post_id! and p.id = 123 !!

start transaction !!insert into comments (...) !!update posts ! set comment_count = comment_count + 1 ! where post_id = 123 !!commit !!

Page 43: Scaling for Humongous amounts of data with MongoDB

43

Same Schema in MongoDB

embedding linking

Page 44: Scaling for Humongous amounts of data with MongoDB

44

posts

db.posts.find({_id:3123})3!

db.posts.update(333{_id:3123},333{3"$push":3{comments:3new_comment},33333"$inc":33{comments_count:31}3}3)3author

comments comments comments

server a server b server c

Mapping Object with MongoDB

Page 45: Scaling for Humongous amounts of data with MongoDB

45

•  Design documents that simply map to your application

post3=3{author:3"Hergé", 33333333date:3new3Date(), 33333333text:3"Destination3Moon", 33333333tags:3["comic",3"adventure"]} 3>3db.posts.save(post)

Schemas in MongoDB

Page 46: Scaling for Humongous amounts of data with MongoDB

46

// Find the object !> db.blogs.find( { text: "Destination Moon" } )!!// Find posts with tags !> db.blogs.find( { tags: { $exists: true } } )!!// Regular expressions: posts where author starts with h!> db.blogs.find( { author: /^h/i } ) !!// Counting: number of posts written by Hergé!> db.blogs.find( { author: "Hergé" } ).count() !

Examples

Page 47: Scaling for Humongous amounts of data with MongoDB

47

•  Conditional Query Operators –  Scalar: $ne, $mod, $exists, $type, $lt, $lte, $gt, $gte,

$ne –  Vector: $in, $nin, $all, $size

•  Atomic Update Operators –  Scalar: $inc, $set, $unset –  Vector: $push, $pop, $pull, $pushAll, $pullAll,

$addToSet

Data Manipulation

Page 48: Scaling for Humongous amounts of data with MongoDB

48

>3db.blogs.update(333333333333{3text:3"Destination3Moon"3},333333333333{3"$push":3{3comments:3new_comment3},33333333333333"$inc":33{3comments_count:313}3}3)3333{3_id:3ObjectId("4c4ba5c0672c685e5e8aabf3"),333333text:3"Destination3Moon",33333comments:3[3

333{33 3author:3"Kyle",33 3date:3ISODate("2011Z09Z19T09:56:06.298Z"),33 3text:3"great3book"3333}3

3333],33333comment_count:31333}3

Extending the schema

Page 49: Scaling for Humongous amounts of data with MongoDB

49

✓ Build a database for scaleout –  Run on clusters of 100s of commodity machines

✓ … that enables agile development

•  … and is usable for a broad variety of applications

How does MongoDB shape up?

Page 50: Scaling for Humongous amounts of data with MongoDB

50

Big Data = MongoDB = Solved

Data Hub User Data Management

Big Data Content Mgmt & Delivery Mobile & Social

Page 51: Scaling for Humongous amounts of data with MongoDB

51

✓ Build a database for scaleout –  Run on clusters of 100s of commodity machines

✓ … that enables agile development

✓ … and is usable for a broad variety of applications

How does MongoDB shape up?

Page 52: Scaling for Humongous amounts of data with MongoDB

52

10gen is the organization behind MongoDB

190+ employees 500+ customers

Over $81 million in funding Offices in New York, Palo Alto, Washington DC, London, Dublin, Barcelona and Sydney

Page 53: Scaling for Humongous amounts of data with MongoDB

53

10gen Products and Services

Consulting Expert Resources for All Phases of MongoDB Implementations

Training Online and In-Person for Developers and Administrators

MongoDB Monitoring Service (MMS) Free, Cloud-Based Service for Monitoring and Alerts

Subscriptions Professional Support, Subscriber Edition and Commercial License

Page 54: Scaling for Humongous amounts of data with MongoDB

54

Indeed.com Trends

Top Job Trends

1.  HTML 5 2.  MongoDB 3.  iOS 4.  Android 5.  Mobile Apps 6.  Puppet 7.  Hadoop 8.  jQuery 9.  PaaS 10.  Social Media

MongoDB is the Leading NoSQL Database

LinkedIn Job Skills

MongoDB

Competitor 1

Competitor 2

Competitor 3

Competitor 4

Competitor 5

All Others

Google Search

MongoDB

Competitor 1

Competitor 2

Competitor 3

Competitor 4

Jaspersoft Big Data Index

Direct Real-Time Downloads

MongoDB

Competitor 1

Competitor 2

Competitor 3

Page 55: Scaling for Humongous amounts of data with MongoDB

55

The Evolution of MongoDB

2.2 Aug �12

2.4 March�13

2.0 Sept �11

1.8 March �11

Journaling Sharding and Replica set enhancements Spherical geo search

Index enhancements to improve size and performance Authentication with sharded clusters Replica Set Enhancements Concurrency improvements

Aggregation Framework Multi-Data Center Deployments Improved Performance and Concurrency

Kerberos/SASL Hash Shard Key V8 Intersecting polygons Aggregation enhancements Text Search

Page 56: Scaling for Humongous amounts of data with MongoDB

@mongodb(

Drop(by(on(the(5th(floor(and(meet(an(Engineer(and(

Get(a(discount(code(for(MongoDB(London(April(9th(

http://bit.ly/mongoC((Facebook((((((((((|(((((((((Twitter(((((((((|(((((((((LinkedIn(

http://linkd.in/joinmongo(

download at mongodb.org