Mythbusting: Understanding How We Measure Performance at MongoDB

VP, Engineering, MongoDB

Andrew Erlichson

#MongoDBDays

Mythbusting: Understanding How We Measure the Performance of MongoDB

Before we start…

• We are going to look a lot at– C++ kernel code– Java benchmarks– JavaScript tests

• And lots of charts

• And its going to be awesome!

Goals of Benchmarking

– You have an idea– It must be tested– Others must be able

to reproduce your work.

– Explanations that contribute to knowledge are key.

– Should have practical applications

Academia

Industry Benchmarketing

• Prove you are faster than the competition

• Emphasize your best attributes

• Repeatable

• Explanations

Industry Benchmarketing

• Prove you are faster than the competition

• Emphasize your best attributes

• Repeatable*

• Explanations*

*OPTIONAL

Goals of Internal Benchmarking

• Always be improving

• Understand our bottlenecks

• Main customer is engineering

• Explanations are somewhat important.

Overview – Internal Benchmarking

• Some common traps

• Performance measurement & diagnosis

• What's next

Part OneSome Common Traps

long startTime = System.currentTimeMillis();

for (int roundNum = 0; roundNum < numRounds; roundNum++) {

for (int i = 0; i < documentsPerInsert; i++) {

id++;

BasicDBObject doc = new BasicDBObject();

doc.put("_id",id);

doc.put("k",rand.nextInt(numMaxInserts)+1);

String cVal = "…"

doc.put("c",cVal);

String padVal = "…";

doc.put("pad",padVal);

aDocs[i]=doc;

}

coll.insert(aDocs);

numInserts += documentsPerInsert;

globalInserts.addAndGet(documentsPerInsert);

}

long endTime = System.currentTimeMillis();

#1 Time taken to Insert x Documents




id++;


doc.put("_id",id);


String cVal = "…"

doc.put("c",cVal);



aDocs[i]=doc;

}

coll.insert(aDocs);



}


So that looks ok, right?




id++;


doc.put("_id",id);


String cVal = "…"

doc.put("c",cVal);



aDocs[i]=doc;

}

coll.insert(aDocs);



}


What are else you measuring?

Object creation and GC management?




id++;


doc.put("_id",id);


String cVal = "…"

doc.put("c",cVal);



aDocs[i]=doc;

}

coll.insert(aDocs);



}



Thread contention on nextInt()?





id++;


doc.put("_id",id);


String cVal = "…"

doc.put("c",cVal);



aDocs[i]=doc;

}

coll.insert(aDocs);



}



Time to synthesize data?






id++;


doc.put("_id",id);


String cVal = "…"

doc.put("c",cVal);



aDocs[i]=doc;

}

coll.insert(aDocs);



}




Thread contention on addAndGet()?






id++;


doc.put("_id",id);


String cVal = "…"

doc.put("c",cVal);



aDocs[i]=doc;

}

coll.insert(aDocs);



}




Clock resolution?



Thread contention on addAndGet()?

// Pre Create the Object outside the Loop

BasicDBObject[] aDocs = new BasicDBObject[documentsPerInsert];

for (int i=0; i < documentsPerInsert; i++) {


String cVal = "…";

doc.put("c",cVal);



aDocs[i] = doc;

}

Solution: Pre-Create the objects

Pre-create non varying data outside the timing loop

Alternative

• Pre-create the data in a file; load from file

// Use ThreadLocalRandom generator or an instance of java.util.Random per thread

java.util.concurrent.ThreadLocalRandom rand;

for (long roundNum = 0; roundNum < numRounds; roundNum++) {


id++;

doc = aDocs[i];

doc.put("_id",id);

doc.put("k", nextInt(rand, numMaxInserts)+1);

}

coll.insert(aDocs);


}

// Maintain count outside the loop

globalInserts.addAndGet(documentsPerInsert * roundNum);

Solution: Remove contention

Remove contention nextInt() by making Thread local

// Use ThreadLocalRandom generator or an instance of java.util.Random per thread

java.util.concurrent.ThreadLocalRandom rand;

for (long roundNum = 0; roundNum < numRounds; roundNum++) {


id++;

doc = aDocs[i];

doc.put("_id",id);

doc.put("k", nextInt(rand, numMaxInserts)+1);

}

coll.insert(aDocs);


}

// Maintain count outside the loop

globalInserts.addAndGet(documentsPerInsert * roundNum);

Solution: Remove contention

Remove contention on addAndGet()

Remove contention nextInt() by making Thread local


…


long startTime = System.nanoTime();

…

long endTime = System.nanoTime() - startTime;

Solution: Timer resolution

"resolution is at least as good as that of currentTimeMillis()"

"granularity of the value depends on the underlying operating system and may be larger"

Source

• http://docs.oracle.com/javase/7/docs/api/java/lang/System.html

General Principal #1Know what you are measuring


doc.put("v", str); // str is a 2k string

for (int i=0; i < 1000; i++) {

doc.put("_id",i); coll.insert(doc);

}

BasicDBObject predicate = new BasicDBObject();


DBCursor cur = coll.find(predicate);

DBObject foundObj;

while (cur.hasNext()) {

foundObj = cur.next();

}


#2 Response time to return all results



for (int i=0; i < 1000; i++) {


}




DBObject foundObj;



}


So that looks ok, right?



for (int i=0; i < 1000; i++) {


}




DBObject foundObj;



}


What else are you measuring?

Each doc is is 4080 bytes on disk with powerOf2Sizes



for (int i=0; i < 1000; i++) {


}




DBObject foundObj;



}




Unrestricted predicate?



for (int i=0; i < 1000; i++) {


}




DBObject foundObj;



}




Measuring• Time to parse &

execute query• Time to retrieve all

documentBut also• Cost of shipping

~4MB data through network stack

Unrestricted predicate?


predicate.put("_id", new BasicDBObject("$gte", 10).append("$lte", 20));

BasicDBObject projection = new BasicDBObject();

projection.put("_id", 1);


DBCursor cur = coll.find(predicate, projection );

DBObject foundObj;



}


Solution: Limit the projection

Return fixed range







DBObject foundObj;



}



Only project _id

Return fixed range







DBObject foundObj;



}



Only project _id

Only 46k transferred through network stack

Return fixed range

General Principal #2Measure only what you need to measure

Part TwoPerformance measurement & diagnosis

Broad categories

• Micro Benchmarks

• Workloads

Micro benchmarks: mongo-perf

mongo-perf: goals

• Measure– commands

• Configure– Single mongod, ReplSet size (1 -> n), Sharding– Single vs. Multiple DB– O/S

• Characterize– Throughput (ops/second) by thread count

• Compare

What do you get?

Better

Thread Count

Ops/

Sec

What do you get?

Measured improvement

between rc0 and rc2

Better

tests.push( { name: "Commands.CountsIntIDRange", pre: function( collection ) { collection.drop(); for ( var i = 0; i < 1000; i++ ) { collection.insert( { _id : i } ); } collection.getDB().getLastError(); }, ops: [ { op: "command", ns : "testdb", command : { count : "mycollection", query : { _id : { "$gt" : 10, "$lt" : 100 } } } } ] } );

Benchmark source code

Code Change

Workloads

• "public" workloads– YCSB– Sysbench

• "real world" simulations– Inbox fan in/out– Message Stores– Content Management

Example: Bulk Load Performance16m Documents

Better

55% degradation 2.6.0-rc1 vs

2.4.10

Ouch… where's the tree in the woods?

• 2.4.10 -> 2.6.0– 4495 git commits

git-bisect

• Bisect between good/bad hashes

• git-bisect nominates a new githash– Build against githash– Re-run test– Confirm if this githash is good/bad

• Rinse and repeat

Code Change - Bad Githash

Code Change - Fix

Bulk Load Performance - Fix

Better

11% improvement

2.6.1 vs 2.4.10

The problem with measurement

• Observability– What can you observe on the system?

• Effect– What effects does the observation cause?

mtools

mtools

• MongoDB log file analysis– Filter logs for operations, events– Response time, lock durations– Plot

• https://github.com/rueckstiess/mtools

Code Change – Yielding Policy

Code Change

Response TimesBulk Insert 2.6.0 vs 2.6.1

Ceiling similar, lower floor resulting in 40% improvement in

throughput

Secondary effects of Yield policy changeWrite lock time reduced

Order of magnitude reduction of write lock

duration

> db.serverStatus()

Yes – will cause a read lock to be acquired

> db.serverStatus({recordStats:0})

No – lock is not acquired

> mongostat

Yes - until SERVER-14008 resolved, uses db.serverStatus()

Unexpected side effects of measurement?

CPU sampling

• Get an impression of– Call Graphs– CPU time spent on node and called nodes

> sudo apt-get install google-perftools

> sudo apt-get install libunwind7-dev

> scons --use-cpu-profiler mongod

Setup & building with google-profiler

> mongod –dbpath <…>

Note: Do not use –fork

> mongo

> use admin

> db.runCommand({_cpuProfilerStart: {profileFilename: 'foo.prof'}})

Execute some commands that you want to profile

> db.runCommand({_cpuProfilerStop: 1})

Start the profiling

Sample start vs. end of workload

Code change

Public Benchmarks – Not all forks are the same…

• YCSB– https://github.com/achille/YCSB

• sysbench-mongodb– https://github.com/mdcallag/sysbench-mongodb

[email protected]

VP, Engineering

Andrew Erlichson

#MongoDBDays

Thank You

mailto:[email protected]

Mythbusting: Understanding How We Measure Performance at MongoDB

Technology

Transcript of Mythbusting: Understanding How We Measure Performance at MongoDB