Mythbusting: Understanding How We Measure Performance at MongoDB

Post on 20-Jun-2015

597 views 0 download

Tags:

description

Benchmarking, benchmarking, benchmarking. We all do it, mostly it tells us what we want to hear but often hides a mountain of misinformation. In this talk we will walk through the pitfalls that you might find yourself in by looking at some examples where things go wrong. We will then walk through how MongoDB performance is measured, the processes and methodology and ways to present and look at the information.

Transcript of Mythbusting: Understanding How We Measure Performance at MongoDB

VP, Engineering, MongoDB

Andrew Erlichson

#MongoDBDays

Mythbusting: Understanding How We Measure the Performance of MongoDB

Before we start…

• We are going to look a lot at– C++ kernel code– Java benchmarks– JavaScript tests

• And lots of charts

• And its going to be awesome!

Goals of Benchmarking

– You have an idea– It must be tested– Others must be able

to reproduce your work.

– Explanations that contribute to knowledge are key.

– Should have practical applications

Academia

Industry Benchmarketing

• Prove you are faster than the competition

• Emphasize your best attributes

• Repeatable

• Explanations

Industry Benchmarketing

• Prove you are faster than the competition

• Emphasize your best attributes

• Repeatable*

• Explanations*

*OPTIONAL

Goals of Internal Benchmarking

• Always be improving

• Understand our bottlenecks

• Main customer is engineering

• Explanations are somewhat important.

Overview – Internal Benchmarking

• Some common traps

• Performance measurement & diagnosis

• What's next

Part OneSome Common Traps

long startTime = System.currentTimeMillis();

for (int roundNum = 0; roundNum < numRounds; roundNum++) {

for (int i = 0; i < documentsPerInsert; i++) {

id++;

BasicDBObject doc = new BasicDBObject();

doc.put("_id",id);

doc.put("k",rand.nextInt(numMaxInserts)+1);

String cVal = "…"

doc.put("c",cVal);

String padVal = "…";

doc.put("pad",padVal);

aDocs[i]=doc;

}

coll.insert(aDocs);

numInserts += documentsPerInsert;

globalInserts.addAndGet(documentsPerInsert);

}

long endTime = System.currentTimeMillis();

#1 Time taken to Insert x Documents

long startTime = System.currentTimeMillis();

for (int roundNum = 0; roundNum < numRounds; roundNum++) {

for (int i = 0; i < documentsPerInsert; i++) {

id++;

BasicDBObject doc = new BasicDBObject();

doc.put("_id",id);

doc.put("k",rand.nextInt(numMaxInserts)+1);

String cVal = "…"

doc.put("c",cVal);

String padVal = "…";

doc.put("pad",padVal);

aDocs[i]=doc;

}

coll.insert(aDocs);

numInserts += documentsPerInsert;

globalInserts.addAndGet(documentsPerInsert);

}

long endTime = System.currentTimeMillis();

#1 Time taken to Insert x Documents

long startTime = System.currentTimeMillis();

for (int roundNum = 0; roundNum < numRounds; roundNum++) {

for (int i = 0; i < documentsPerInsert; i++) {

id++;

BasicDBObject doc = new BasicDBObject();

doc.put("_id",id);

doc.put("k",rand.nextInt(numMaxInserts)+1);

String cVal = "…"

doc.put("c",cVal);

String padVal = "…";

doc.put("pad",padVal);

aDocs[i]=doc;

}

coll.insert(aDocs);

numInserts += documentsPerInsert;

globalInserts.addAndGet(documentsPerInsert);

}

long endTime = System.currentTimeMillis();

#1 Time taken to Insert x Documents

long startTime = System.currentTimeMillis();

for (int roundNum = 0; roundNum < numRounds; roundNum++) {

for (int i = 0; i < documentsPerInsert; i++) {

id++;

BasicDBObject doc = new BasicDBObject();

doc.put("_id",id);

doc.put("k",rand.nextInt(numMaxInserts)+1);

String cVal = "…"

doc.put("c",cVal);

String padVal = "…";

doc.put("pad",padVal);

aDocs[i]=doc;

}

coll.insert(aDocs);

numInserts += documentsPerInsert;

globalInserts.addAndGet(documentsPerInsert);

}

long endTime = System.currentTimeMillis();

#1 Time taken to Insert x Documents

long startTime = System.currentTimeMillis();

for (int roundNum = 0; roundNum < numRounds; roundNum++) {

for (int i = 0; i < documentsPerInsert; i++) {

id++;

BasicDBObject doc = new BasicDBObject();

doc.put("_id",id);

doc.put("k",rand.nextInt(numMaxInserts)+1);

String cVal = "…"

doc.put("c",cVal);

String padVal = "…";

doc.put("pad",padVal);

aDocs[i]=doc;

}

coll.insert(aDocs);

numInserts += documentsPerInsert;

globalInserts.addAndGet(documentsPerInsert);

}

long endTime = System.currentTimeMillis();

#1 Time taken to Insert x Documents

long startTime = System.currentTimeMillis();

for (int roundNum = 0; roundNum < numRounds; roundNum++) {

for (int i = 0; i < documentsPerInsert; i++) {

id++;

BasicDBObject doc = new BasicDBObject();

doc.put("_id",id);

doc.put("k",rand.nextInt(numMaxInserts)+1);

String cVal = "…"

doc.put("c",cVal);

String padVal = "…";

doc.put("pad",padVal);

aDocs[i]=doc;

}

coll.insert(aDocs);

numInserts += documentsPerInsert;

globalInserts.addAndGet(documentsPerInsert);

}

long endTime = System.currentTimeMillis();

So that looks ok, right?

long startTime = System.currentTimeMillis();

for (int roundNum = 0; roundNum < numRounds; roundNum++) {

for (int i = 0; i < documentsPerInsert; i++) {

id++;

BasicDBObject doc = new BasicDBObject();

doc.put("_id",id);

doc.put("k",rand.nextInt(numMaxInserts)+1);

String cVal = "…"

doc.put("c",cVal);

String padVal = "…";

doc.put("pad",padVal);

aDocs[i]=doc;

}

coll.insert(aDocs);

numInserts += documentsPerInsert;

globalInserts.addAndGet(documentsPerInsert);

}

long endTime = System.currentTimeMillis();

What are else you measuring?

Object creation and GC management?

long startTime = System.currentTimeMillis();

for (int roundNum = 0; roundNum < numRounds; roundNum++) {

for (int i = 0; i < documentsPerInsert; i++) {

id++;

BasicDBObject doc = new BasicDBObject();

doc.put("_id",id);

doc.put("k",rand.nextInt(numMaxInserts)+1);

String cVal = "…"

doc.put("c",cVal);

String padVal = "…";

doc.put("pad",padVal);

aDocs[i]=doc;

}

coll.insert(aDocs);

numInserts += documentsPerInsert;

globalInserts.addAndGet(documentsPerInsert);

}

long endTime = System.currentTimeMillis();

What are else you measuring?

Thread contention on nextInt()?

Object creation and GC management?

long startTime = System.currentTimeMillis();

for (int roundNum = 0; roundNum < numRounds; roundNum++) {

for (int i = 0; i < documentsPerInsert; i++) {

id++;

BasicDBObject doc = new BasicDBObject();

doc.put("_id",id);

doc.put("k",rand.nextInt(numMaxInserts)+1);

String cVal = "…"

doc.put("c",cVal);

String padVal = "…";

doc.put("pad",padVal);

aDocs[i]=doc;

}

coll.insert(aDocs);

numInserts += documentsPerInsert;

globalInserts.addAndGet(documentsPerInsert);

}

long endTime = System.currentTimeMillis();

What are else you measuring?

Time to synthesize data?

Object creation and GC management?

Thread contention on nextInt()?

long startTime = System.currentTimeMillis();

for (int roundNum = 0; roundNum < numRounds; roundNum++) {

for (int i = 0; i < documentsPerInsert; i++) {

id++;

BasicDBObject doc = new BasicDBObject();

doc.put("_id",id);

doc.put("k",rand.nextInt(numMaxInserts)+1);

String cVal = "…"

doc.put("c",cVal);

String padVal = "…";

doc.put("pad",padVal);

aDocs[i]=doc;

}

coll.insert(aDocs);

numInserts += documentsPerInsert;

globalInserts.addAndGet(documentsPerInsert);

}

long endTime = System.currentTimeMillis();

What are else you measuring?

Object creation and GC management?

Thread contention on addAndGet()?

Thread contention on nextInt()?

Time to synthesize data?

long startTime = System.currentTimeMillis();

for (int roundNum = 0; roundNum < numRounds; roundNum++) {

for (int i = 0; i < documentsPerInsert; i++) {

id++;

BasicDBObject doc = new BasicDBObject();

doc.put("_id",id);

doc.put("k",rand.nextInt(numMaxInserts)+1);

String cVal = "…"

doc.put("c",cVal);

String padVal = "…";

doc.put("pad",padVal);

aDocs[i]=doc;

}

coll.insert(aDocs);

numInserts += documentsPerInsert;

globalInserts.addAndGet(documentsPerInsert);

}

long endTime = System.currentTimeMillis();

What are else you measuring?

Object creation and GC management?

Clock resolution?

Thread contention on nextInt()?

Time to synthesize data?

Thread contention on addAndGet()?

// Pre Create the Object outside the Loop

BasicDBObject[] aDocs = new BasicDBObject[documentsPerInsert];

for (int i=0; i < documentsPerInsert; i++) {

BasicDBObject doc = new BasicDBObject();

String cVal = "…";

doc.put("c",cVal);

String padVal = "…";

doc.put("pad",padVal);

aDocs[i] = doc;

}

Solution: Pre-Create the objects

Pre-create non varying data outside the timing loop

Alternative

• Pre-create the data in a file; load from file

// Use ThreadLocalRandom generator or an instance of java.util.Random per thread

java.util.concurrent.ThreadLocalRandom rand;

for (long roundNum = 0; roundNum < numRounds; roundNum++) {

for (int i = 0; i < documentsPerInsert; i++) {

id++;

doc = aDocs[i];

doc.put("_id",id);

doc.put("k", nextInt(rand, numMaxInserts)+1);

}

coll.insert(aDocs);

numInserts += documentsPerInsert;

}

// Maintain count outside the loop

globalInserts.addAndGet(documentsPerInsert * roundNum);

Solution: Remove contention

Remove contention nextInt() by making Thread local

// Use ThreadLocalRandom generator or an instance of java.util.Random per thread

java.util.concurrent.ThreadLocalRandom rand;

for (long roundNum = 0; roundNum < numRounds; roundNum++) {

for (int i = 0; i < documentsPerInsert; i++) {

id++;

doc = aDocs[i];

doc.put("_id",id);

doc.put("k", nextInt(rand, numMaxInserts)+1);

}

coll.insert(aDocs);

numInserts += documentsPerInsert;

}

// Maintain count outside the loop

globalInserts.addAndGet(documentsPerInsert * roundNum);

Solution: Remove contention

Remove contention on addAndGet()

Remove contention nextInt() by making Thread local

long startTime = System.currentTimeMillis();

long endTime = System.currentTimeMillis();

long startTime = System.nanoTime();

long endTime = System.nanoTime() - startTime;

Solution: Timer resolution

"resolution is at least as good as that of currentTimeMillis()"

"granularity of the value depends on the underlying operating system and may be larger"

Source

• http://docs.oracle.com/javase/7/docs/api/java/lang/System.html

General Principal #1Know what you are measuring

BasicDBObject doc = new BasicDBObject();

doc.put("v", str); // str is a 2k string

for (int i=0; i < 1000; i++) {

doc.put("_id",i); coll.insert(doc);

}

BasicDBObject predicate = new BasicDBObject();

long startTime = System.currentTimeMillis();

DBCursor cur = coll.find(predicate);

DBObject foundObj;

while (cur.hasNext()) {

foundObj = cur.next();

}

long endTime = System.currentTimeMillis();

#2 Response time to return all results

BasicDBObject doc = new BasicDBObject();

doc.put("v", str); // str is a 2k string

for (int i=0; i < 1000; i++) {

doc.put("_id",i); coll.insert(doc);

}

BasicDBObject predicate = new BasicDBObject();

long startTime = System.currentTimeMillis();

DBCursor cur = coll.find(predicate);

DBObject foundObj;

while (cur.hasNext()) {

foundObj = cur.next();

}

long endTime = System.currentTimeMillis();

#2 Response time to return all results

BasicDBObject doc = new BasicDBObject();

doc.put("v", str); // str is a 2k string

for (int i=0; i < 1000; i++) {

doc.put("_id",i); coll.insert(doc);

}

BasicDBObject predicate = new BasicDBObject();

long startTime = System.currentTimeMillis();

DBCursor cur = coll.find(predicate);

DBObject foundObj;

while (cur.hasNext()) {

foundObj = cur.next();

}

long endTime = System.currentTimeMillis();

#2 Response time to return all results

BasicDBObject doc = new BasicDBObject();

doc.put("v", str); // str is a 2k string

for (int i=0; i < 1000; i++) {

doc.put("_id",i); coll.insert(doc);

}

BasicDBObject predicate = new BasicDBObject();

long startTime = System.currentTimeMillis();

DBCursor cur = coll.find(predicate);

DBObject foundObj;

while (cur.hasNext()) {

foundObj = cur.next();

}

long endTime = System.currentTimeMillis();

#2 Response time to return all results

BasicDBObject doc = new BasicDBObject();

doc.put("v", str); // str is a 2k string

for (int i=0; i < 1000; i++) {

doc.put("_id",i); coll.insert(doc);

}

BasicDBObject predicate = new BasicDBObject();

long startTime = System.currentTimeMillis();

DBCursor cur = coll.find(predicate);

DBObject foundObj;

while (cur.hasNext()) {

foundObj = cur.next();

}

long endTime = System.currentTimeMillis();

So that looks ok, right?

BasicDBObject doc = new BasicDBObject();

doc.put("v", str); // str is a 2k string

for (int i=0; i < 1000; i++) {

doc.put("_id",i); coll.insert(doc);

}

BasicDBObject predicate = new BasicDBObject();

long startTime = System.currentTimeMillis();

DBCursor cur = coll.find(predicate);

DBObject foundObj;

while (cur.hasNext()) {

foundObj = cur.next();

}

long endTime = System.currentTimeMillis();

What else are you measuring?

Each doc is is 4080 bytes on disk with powerOf2Sizes

BasicDBObject doc = new BasicDBObject();

doc.put("v", str); // str is a 2k string

for (int i=0; i < 1000; i++) {

doc.put("_id",i); coll.insert(doc);

}

BasicDBObject predicate = new BasicDBObject();

long startTime = System.currentTimeMillis();

DBCursor cur = coll.find(predicate);

DBObject foundObj;

while (cur.hasNext()) {

foundObj = cur.next();

}

long endTime = System.currentTimeMillis();

What else are you measuring?

Each doc is is 4080 bytes on disk with powerOf2Sizes

Unrestricted predicate?

BasicDBObject doc = new BasicDBObject();

doc.put("v", str); // str is a 2k string

for (int i=0; i < 1000; i++) {

doc.put("_id",i); coll.insert(doc);

}

BasicDBObject predicate = new BasicDBObject();

long startTime = System.currentTimeMillis();

DBCursor cur = coll.find(predicate);

DBObject foundObj;

while (cur.hasNext()) {

foundObj = cur.next();

}

long endTime = System.currentTimeMillis();

What else are you measuring?

Each doc is is 4080 bytes on disk with powerOf2Sizes

Measuring• Time to parse &

execute query• Time to retrieve all

documentBut also• Cost of shipping

~4MB data through network stack

Unrestricted predicate?

BasicDBObject predicate = new BasicDBObject();

predicate.put("_id", new BasicDBObject("$gte", 10).append("$lte", 20));

BasicDBObject projection = new BasicDBObject();

projection.put("_id", 1);

long startTime = System.currentTimeMillis();

DBCursor cur = coll.find(predicate, projection );

DBObject foundObj;

while (cur.hasNext()) {

foundObj = cur.next();

}

long endTime = System.currentTimeMillis();

Solution: Limit the projection

Return fixed range

BasicDBObject predicate = new BasicDBObject();

predicate.put("_id", new BasicDBObject("$gte", 10).append("$lte", 20));

BasicDBObject projection = new BasicDBObject();

projection.put("_id", 1);

long startTime = System.currentTimeMillis();

DBCursor cur = coll.find(predicate, projection );

DBObject foundObj;

while (cur.hasNext()) {

foundObj = cur.next();

}

long endTime = System.currentTimeMillis();

Solution: Limit the projection

Only project _id

Return fixed range

BasicDBObject predicate = new BasicDBObject();

predicate.put("_id", new BasicDBObject("$gte", 10).append("$lte", 20));

BasicDBObject projection = new BasicDBObject();

projection.put("_id", 1);

long startTime = System.currentTimeMillis();

DBCursor cur = coll.find(predicate, projection );

DBObject foundObj;

while (cur.hasNext()) {

foundObj = cur.next();

}

long endTime = System.currentTimeMillis();

Solution: Limit the projection

Only project _id

Only 46k transferred through network stack

Return fixed range

General Principal #2Measure only what you need to measure

Part TwoPerformance measurement & diagnosis

Broad categories

• Micro Benchmarks

• Workloads

Micro benchmarks: mongo-perf

mongo-perf: goals

• Measure– commands

• Configure– Single mongod, ReplSet size (1 -> n), Sharding– Single vs. Multiple DB– O/S

• Characterize– Throughput (ops/second) by thread count

• Compare

What do you get?

Better

Thread Count

Ops/

Sec

What do you get?

Measured improvement

between rc0 and rc2

Better

tests.push( { name: "Commands.CountsIntIDRange", pre: function( collection ) { collection.drop(); for ( var i = 0; i < 1000; i++ ) { collection.insert( { _id : i } ); } collection.getDB().getLastError(); }, ops: [ { op: "command", ns : "testdb", command : { count : "mycollection", query : { _id : { "$gt" : 10, "$lt" : 100 } } } } ] } );

Benchmark source code

tests.push( { name: "Commands.CountsIntIDRange", pre: function( collection ) { collection.drop(); for ( var i = 0; i < 1000; i++ ) { collection.insert( { _id : i } ); } collection.getDB().getLastError(); }, ops: [ { op: "command", ns : "testdb", command : { count : "mycollection", query : { _id : { "$gt" : 10, "$lt" : 100 } } } } ] } );

Benchmark source code

tests.push( { name: "Commands.CountsIntIDRange", pre: function( collection ) { collection.drop(); for ( var i = 0; i < 1000; i++ ) { collection.insert( { _id : i } ); } collection.getDB().getLastError(); }, ops: [ { op: "command", ns : "testdb", command : { count : "mycollection", query : { _id : { "$gt" : 10, "$lt" : 100 } } } } ] } );

Benchmark source code

tests.push( { name: "Commands.CountsIntIDRange", pre: function( collection ) { collection.drop(); for ( var i = 0; i < 1000; i++ ) { collection.insert( { _id : i } ); } collection.getDB().getLastError(); }, ops: [ { op: "command", ns : "testdb", command : { count : "mycollection", query : { _id : { "$gt" : 10, "$lt" : 100 } } } } ] } );

Benchmark source code

Code Change

Workloads

• "public" workloads– YCSB– Sysbench

• "real world" simulations– Inbox fan in/out– Message Stores– Content Management

Example: Bulk Load Performance16m Documents

Better

55% degradation 2.6.0-rc1 vs

2.4.10

Ouch… where's the tree in the woods?

• 2.4.10 -> 2.6.0– 4495 git commits

git-bisect

• Bisect between good/bad hashes

• git-bisect nominates a new githash– Build against githash– Re-run test– Confirm if this githash is good/bad

• Rinse and repeat

Code Change - Bad Githash

Code Change - Fix

Bulk Load Performance - Fix

Better

11% improvement

2.6.1 vs 2.4.10

The problem with measurement

• Observability– What can you observe on the system?

• Effect– What effects does the observation cause?

mtools

mtools

• MongoDB log file analysis– Filter logs for operations, events– Response time, lock durations– Plot

• https://github.com/rueckstiess/mtools

Code Change – Yielding Policy

Code Change

Response TimesBulk Insert 2.6.0 vs 2.6.1

Ceiling similar, lower floor resulting in 40% improvement in

throughput

Secondary effects of Yield policy changeWrite lock time reduced

Order of magnitude reduction of write lock

duration

> db.serverStatus()

Yes – will cause a read lock to be acquired

> db.serverStatus({recordStats:0})

No – lock is not acquired

> mongostat

Yes - until SERVER-14008 resolved, uses db.serverStatus()

Unexpected side effects of measurement?

CPU sampling

• Get an impression of– Call Graphs– CPU time spent on node and called nodes

> sudo apt-get install google-perftools

> sudo apt-get install libunwind7-dev

> scons --use-cpu-profiler mongod

Setup & building with google-profiler

> mongod –dbpath <…>

Note: Do not use –fork

> mongo

> use admin

> db.runCommand({_cpuProfilerStart: {profileFilename: 'foo.prof'}})

Execute some commands that you want to profile

> db.runCommand({_cpuProfilerStop: 1})

Start the profiling

Sample start vs. end of workload

Sample start vs. end of workload

Code change

Public Benchmarks – Not all forks are the same…

• YCSB– https://github.com/achille/YCSB

• sysbench-mongodb– https://github.com/mdcallag/sysbench-mongodb

andrew@mongodb.com

VP, Engineering

Andrew Erlichson

#MongoDBDays

Thank You