Scaling MongoDB for real time analytics

55
SHORTCUTS AROUND THE MISTAKES WE’VE MADE SCALING MONGODB David Tollmyr, Platform lead @effata slideshare.net/tollmyr

description

 

Transcript of Scaling MongoDB for real time analytics

Page 1: Scaling MongoDB for real time analytics

SHORTCUTSAROUND THE

MISTAKES WE’VEMADE SCALING

MONGODB

David Tollmyr, Platform lead

@effataslideshare.net/tollmyr

Page 2: Scaling MongoDB for real time analytics

What we doWe want to make digital advertising an amazing user experience.There is more to metrics that clicks.

Page 3: Scaling MongoDB for real time analytics

Ads

Page 4: Scaling MongoDB for real time analytics

Data

Page 5: Scaling MongoDB for real time analytics

Assembling sessionsexposure

pingping

ping ping

ping

event

event

ping

session➔ ➔

Page 6: Scaling MongoDB for real time analytics

Information

Page 7: Scaling MongoDB for real time analytics

Crunching

session

session

session

session

sessionsession

session session

session

session

session

session

session

➔ ➔ 42

Page 8: Scaling MongoDB for real time analytics
Page 9: Scaling MongoDB for real time analytics

Metrics

Page 10: Scaling MongoDB for real time analytics

Reports

Page 11: Scaling MongoDB for real time analytics

What we doTrack ads, make pretty reports.

Page 12: Scaling MongoDB for real time analytics

That doesn’t sound so hardWe don’t know when sessions endThere’s a lot of dataIt’s all done in (close to) real time

Page 13: Scaling MongoDB for real time analytics

Numbers200 Gb logs100 million data pointsper day~300 metrics per data point= 6000 updates / s at peak

Page 14: Scaling MongoDB for real time analytics

How we use(d) MongoDB“Virtual memory” to offload data while we wait for sessions to finishShort time storage (<48 hours) for batch jobs, replays and manual analysisMetrics storage

Page 15: Scaling MongoDB for real time analytics

Why we use MongoDBSchemalessness makes things so much easier, the data we collect changes as we come up with new ideasSharding makes it possible to scale writesSecondary indexes and rich query language are great features (for the metrics store)It’s just… nice

Page 16: Scaling MongoDB for real time analytics

Btw.We use JRuby, it’s awesome

Page 17: Scaling MongoDB for real time analytics

STANDING ON THE SHOULDERS

OF GIANTS WITH JRUBY

slideshare.net/iconara

Page 18: Scaling MongoDB for real time analytics

A story in 9 iterations

Page 19: Scaling MongoDB for real time analytics

secondary indexes and updates1st iteration

One document per session, update as new data comes alongOutcome: 1000% write lock

Page 20: Scaling MongoDB for real time analytics

#1Everything is aboutworking around the

GLOBALWRITELOCK

Page 21: Scaling MongoDB for real time analytics

MongoDB 1.8.1

db.coll.update({_id: "xyz"}, {$inc: {x: 1}}, true)

db.coll.update({_id: "abc"}, {$push: {x: “...”}}, true)

Page 22: Scaling MongoDB for real time analytics

MongoDB 2.0.0

db.coll.update({_id: "xyz"}, {$inc: {x: 1}}, true)

db.coll.update({_id: "abc"}, {$push: {x: “...”}}, true)

Page 23: Scaling MongoDB for real time analytics

using scans for two step assembling2nd iteration

Instead of updating, save each fragment, then scan over _id to assemble sessions

Page 24: Scaling MongoDB for real time analytics

using scans for two step assembling2nd iteration

Outcome: not as much lock, but still not great performance. We also realised we couldn’t remove data fast enough

Page 25: Scaling MongoDB for real time analytics

#2Everything is aboutworking around the

GLOBALWRITELOCK

Page 26: Scaling MongoDB for real time analytics

#3Give a lot of

thought to your

PRIMARYKEY

Page 27: Scaling MongoDB for real time analytics

partitioning3rd iteration

Partitioning the data by writing to a new collection every hourOutcome: complicated, fragmented database

Page 28: Scaling MongoDB for real time analytics

#4Make sure you can

REMOVE OLD DATA

Page 29: Scaling MongoDB for real time analytics

sharding4th iteration

To get around the global write lock and get higher write performance we moved to a sharded cluster.Outcome: higher write performance, lots of problems, lots of ops time spent debugging

Page 30: Scaling MongoDB for real time analytics

#5Everything is aboutworking around the

GLOBALWRITELOCK

Page 31: Scaling MongoDB for real time analytics

#6SHARDINGIS NOT A

SILVER BULLETand it’s complex, if you can, avoid it

Page 32: Scaling MongoDB for real time analytics
Page 33: Scaling MongoDB for real time analytics

#7IT WILL FAIL

design for it

Page 34: Scaling MongoDB for real time analytics
Page 35: Scaling MongoDB for real time analytics
Page 36: Scaling MongoDB for real time analytics

moving things to separate clusters5th iteration

We saw very different loads on the shards and realised we had databases with very different usage patterns, some that made autosharding not work. We moved these off the cluster.Outcome: a more balanced and stable cluster

Page 37: Scaling MongoDB for real time analytics

#8Everything is aboutworking around the

GLOBALWRITELOCK

Page 38: Scaling MongoDB for real time analytics

#9ONE DATABASE

with one usage pattern

PER CLUSTER

Page 39: Scaling MongoDB for real time analytics

#10MONITOR

EVERYTHINGlook at your health

graphs daily

Page 40: Scaling MongoDB for real time analytics

monster machines6th iteration

We got new problems removing data and needed some room to breathe and think Solution: upgraded the servers to High-Memory Quadruple Extra Large (with cheese).

♥I

Page 41: Scaling MongoDB for real time analytics

#11Don’t try to scale up

SCALE OUT

Page 42: Scaling MongoDB for real time analytics

#12When you’re out of ideas

CALL THE EXPERTS

Page 43: Scaling MongoDB for real time analytics

partitioning (again) and pre-chunking7th iteration

We rewrote the database layer to write to a new database each day, and we created all chunks in advance. We also decreased the size of our documents by a lot.Outcome: no more problems removing data.

Page 44: Scaling MongoDB for real time analytics

#13Smaller objects means a smaller database, and a smaller database means

LESS RAM NEEDED

Page 45: Scaling MongoDB for real time analytics

#14Give a lot of

thought to your

PRIMARYKEY

Page 46: Scaling MongoDB for real time analytics

#15Everything is aboutworking around the

GLOBALWRITELOCK

Page 47: Scaling MongoDB for real time analytics

realize when you have the wrong tool8th iteration

Transient data might not need all the bells and whistles.

Outcome: Redis gave us 100x performance in the assembling step

Page 48: Scaling MongoDB for real time analytics

#16When all you have is a

HAMMEReverything looks like a

NAIL

Page 49: Scaling MongoDB for real time analytics

rinse and repeat9th iteration

We now have the same scaling issues later in the chain.

Outcome: Upcoming rewrite to make writes/updated more effectiveRedis was actually slower

Page 50: Scaling MongoDB for real time analytics

#17Everything is aboutworking around the

GLOBALWRITELOCK

Page 51: Scaling MongoDB for real time analytics

Thank you

@effataslideshare.net/tollmyr

engineering.burtcorp.comburtcorp.com

richmetrics.com

Page 52: Scaling MongoDB for real time analytics

Since we got time…

Page 53: Scaling MongoDB for real time analytics

EC2Tips

You have three copies of your data, do you really need EBS?Instance store disks are included in the price and they have predictable performance.m1.xlarge comes with 1.7 TB of storage.

Page 54: Scaling MongoDB for real time analytics

Avoid bulk insertsTips

Very dangerous if there’s a possibility of duplicate key errors

It’s not fixed in 2.0 even though the driver has a flag for it.

Page 55: Scaling MongoDB for real time analytics

Safe modeTips

Run every Nth insert in safe modeThis will give you warnings when bad things happen; like failovers