Mythbusting: Understanding How We Measure Performance at MongoDB
-
Upload
mongodb -
Category
Technology
-
view
595 -
download
0
description
Transcript of Mythbusting: Understanding How We Measure Performance at MongoDB
VP, Engineering, MongoDB
Andrew Erlichson
#MongoDBDays
Mythbusting: Understanding How We Measure the Performance of MongoDB
Before we start…
• We are going to look a lot at– C++ kernel code– Java benchmarks– JavaScript tests
• And lots of charts
• And its going to be awesome!
Goals of Benchmarking
– You have an idea– It must be tested– Others must be able
to reproduce your work.
– Explanations that contribute to knowledge are key.
– Should have practical applications
Academia
Industry Benchmarketing
• Prove you are faster than the competition
• Emphasize your best attributes
• Repeatable
• Explanations
Industry Benchmarketing
• Prove you are faster than the competition
• Emphasize your best attributes
• Repeatable*
• Explanations*
*OPTIONAL
Goals of Internal Benchmarking
• Always be improving
• Understand our bottlenecks
• Main customer is engineering
• Explanations are somewhat important.
Overview – Internal Benchmarking
• Some common traps
• Performance measurement & diagnosis
• What's next
Part OneSome Common Traps
long startTime = System.currentTimeMillis();
for (int roundNum = 0; roundNum < numRounds; roundNum++) {
for (int i = 0; i < documentsPerInsert; i++) {
id++;
BasicDBObject doc = new BasicDBObject();
doc.put("_id",id);
doc.put("k",rand.nextInt(numMaxInserts)+1);
String cVal = "…"
doc.put("c",cVal);
String padVal = "…";
doc.put("pad",padVal);
aDocs[i]=doc;
}
coll.insert(aDocs);
numInserts += documentsPerInsert;
globalInserts.addAndGet(documentsPerInsert);
}
long endTime = System.currentTimeMillis();
#1 Time taken to Insert x Documents
long startTime = System.currentTimeMillis();
for (int roundNum = 0; roundNum < numRounds; roundNum++) {
for (int i = 0; i < documentsPerInsert; i++) {
id++;
BasicDBObject doc = new BasicDBObject();
doc.put("_id",id);
doc.put("k",rand.nextInt(numMaxInserts)+1);
String cVal = "…"
doc.put("c",cVal);
String padVal = "…";
doc.put("pad",padVal);
aDocs[i]=doc;
}
coll.insert(aDocs);
numInserts += documentsPerInsert;
globalInserts.addAndGet(documentsPerInsert);
}
long endTime = System.currentTimeMillis();
#1 Time taken to Insert x Documents
long startTime = System.currentTimeMillis();
for (int roundNum = 0; roundNum < numRounds; roundNum++) {
for (int i = 0; i < documentsPerInsert; i++) {
id++;
BasicDBObject doc = new BasicDBObject();
doc.put("_id",id);
doc.put("k",rand.nextInt(numMaxInserts)+1);
String cVal = "…"
doc.put("c",cVal);
String padVal = "…";
doc.put("pad",padVal);
aDocs[i]=doc;
}
coll.insert(aDocs);
numInserts += documentsPerInsert;
globalInserts.addAndGet(documentsPerInsert);
}
long endTime = System.currentTimeMillis();
#1 Time taken to Insert x Documents
long startTime = System.currentTimeMillis();
for (int roundNum = 0; roundNum < numRounds; roundNum++) {
for (int i = 0; i < documentsPerInsert; i++) {
id++;
BasicDBObject doc = new BasicDBObject();
doc.put("_id",id);
doc.put("k",rand.nextInt(numMaxInserts)+1);
String cVal = "…"
doc.put("c",cVal);
String padVal = "…";
doc.put("pad",padVal);
aDocs[i]=doc;
}
coll.insert(aDocs);
numInserts += documentsPerInsert;
globalInserts.addAndGet(documentsPerInsert);
}
long endTime = System.currentTimeMillis();
#1 Time taken to Insert x Documents
long startTime = System.currentTimeMillis();
for (int roundNum = 0; roundNum < numRounds; roundNum++) {
for (int i = 0; i < documentsPerInsert; i++) {
id++;
BasicDBObject doc = new BasicDBObject();
doc.put("_id",id);
doc.put("k",rand.nextInt(numMaxInserts)+1);
String cVal = "…"
doc.put("c",cVal);
String padVal = "…";
doc.put("pad",padVal);
aDocs[i]=doc;
}
coll.insert(aDocs);
numInserts += documentsPerInsert;
globalInserts.addAndGet(documentsPerInsert);
}
long endTime = System.currentTimeMillis();
#1 Time taken to Insert x Documents
long startTime = System.currentTimeMillis();
for (int roundNum = 0; roundNum < numRounds; roundNum++) {
for (int i = 0; i < documentsPerInsert; i++) {
id++;
BasicDBObject doc = new BasicDBObject();
doc.put("_id",id);
doc.put("k",rand.nextInt(numMaxInserts)+1);
String cVal = "…"
doc.put("c",cVal);
String padVal = "…";
doc.put("pad",padVal);
aDocs[i]=doc;
}
coll.insert(aDocs);
numInserts += documentsPerInsert;
globalInserts.addAndGet(documentsPerInsert);
}
long endTime = System.currentTimeMillis();
So that looks ok, right?
long startTime = System.currentTimeMillis();
for (int roundNum = 0; roundNum < numRounds; roundNum++) {
for (int i = 0; i < documentsPerInsert; i++) {
id++;
BasicDBObject doc = new BasicDBObject();
doc.put("_id",id);
doc.put("k",rand.nextInt(numMaxInserts)+1);
String cVal = "…"
doc.put("c",cVal);
String padVal = "…";
doc.put("pad",padVal);
aDocs[i]=doc;
}
coll.insert(aDocs);
numInserts += documentsPerInsert;
globalInserts.addAndGet(documentsPerInsert);
}
long endTime = System.currentTimeMillis();
What are else you measuring?
Object creation and GC management?
long startTime = System.currentTimeMillis();
for (int roundNum = 0; roundNum < numRounds; roundNum++) {
for (int i = 0; i < documentsPerInsert; i++) {
id++;
BasicDBObject doc = new BasicDBObject();
doc.put("_id",id);
doc.put("k",rand.nextInt(numMaxInserts)+1);
String cVal = "…"
doc.put("c",cVal);
String padVal = "…";
doc.put("pad",padVal);
aDocs[i]=doc;
}
coll.insert(aDocs);
numInserts += documentsPerInsert;
globalInserts.addAndGet(documentsPerInsert);
}
long endTime = System.currentTimeMillis();
What are else you measuring?
Thread contention on nextInt()?
Object creation and GC management?
long startTime = System.currentTimeMillis();
for (int roundNum = 0; roundNum < numRounds; roundNum++) {
for (int i = 0; i < documentsPerInsert; i++) {
id++;
BasicDBObject doc = new BasicDBObject();
doc.put("_id",id);
doc.put("k",rand.nextInt(numMaxInserts)+1);
String cVal = "…"
doc.put("c",cVal);
String padVal = "…";
doc.put("pad",padVal);
aDocs[i]=doc;
}
coll.insert(aDocs);
numInserts += documentsPerInsert;
globalInserts.addAndGet(documentsPerInsert);
}
long endTime = System.currentTimeMillis();
What are else you measuring?
Time to synthesize data?
Object creation and GC management?
Thread contention on nextInt()?
long startTime = System.currentTimeMillis();
for (int roundNum = 0; roundNum < numRounds; roundNum++) {
for (int i = 0; i < documentsPerInsert; i++) {
id++;
BasicDBObject doc = new BasicDBObject();
doc.put("_id",id);
doc.put("k",rand.nextInt(numMaxInserts)+1);
String cVal = "…"
doc.put("c",cVal);
String padVal = "…";
doc.put("pad",padVal);
aDocs[i]=doc;
}
coll.insert(aDocs);
numInserts += documentsPerInsert;
globalInserts.addAndGet(documentsPerInsert);
}
long endTime = System.currentTimeMillis();
What are else you measuring?
Object creation and GC management?
Thread contention on addAndGet()?
Thread contention on nextInt()?
Time to synthesize data?
long startTime = System.currentTimeMillis();
for (int roundNum = 0; roundNum < numRounds; roundNum++) {
for (int i = 0; i < documentsPerInsert; i++) {
id++;
BasicDBObject doc = new BasicDBObject();
doc.put("_id",id);
doc.put("k",rand.nextInt(numMaxInserts)+1);
String cVal = "…"
doc.put("c",cVal);
String padVal = "…";
doc.put("pad",padVal);
aDocs[i]=doc;
}
coll.insert(aDocs);
numInserts += documentsPerInsert;
globalInserts.addAndGet(documentsPerInsert);
}
long endTime = System.currentTimeMillis();
What are else you measuring?
Object creation and GC management?
Clock resolution?
Thread contention on nextInt()?
Time to synthesize data?
Thread contention on addAndGet()?
// Pre Create the Object outside the Loop
BasicDBObject[] aDocs = new BasicDBObject[documentsPerInsert];
for (int i=0; i < documentsPerInsert; i++) {
BasicDBObject doc = new BasicDBObject();
String cVal = "…";
doc.put("c",cVal);
String padVal = "…";
doc.put("pad",padVal);
aDocs[i] = doc;
}
Solution: Pre-Create the objects
Pre-create non varying data outside the timing loop
Alternative
• Pre-create the data in a file; load from file
// Use ThreadLocalRandom generator or an instance of java.util.Random per thread
java.util.concurrent.ThreadLocalRandom rand;
for (long roundNum = 0; roundNum < numRounds; roundNum++) {
for (int i = 0; i < documentsPerInsert; i++) {
id++;
doc = aDocs[i];
doc.put("_id",id);
doc.put("k", nextInt(rand, numMaxInserts)+1);
}
coll.insert(aDocs);
numInserts += documentsPerInsert;
}
// Maintain count outside the loop
globalInserts.addAndGet(documentsPerInsert * roundNum);
Solution: Remove contention
Remove contention nextInt() by making Thread local
// Use ThreadLocalRandom generator or an instance of java.util.Random per thread
java.util.concurrent.ThreadLocalRandom rand;
for (long roundNum = 0; roundNum < numRounds; roundNum++) {
for (int i = 0; i < documentsPerInsert; i++) {
id++;
doc = aDocs[i];
doc.put("_id",id);
doc.put("k", nextInt(rand, numMaxInserts)+1);
}
coll.insert(aDocs);
numInserts += documentsPerInsert;
}
// Maintain count outside the loop
globalInserts.addAndGet(documentsPerInsert * roundNum);
Solution: Remove contention
Remove contention on addAndGet()
Remove contention nextInt() by making Thread local
long startTime = System.currentTimeMillis();
…
long endTime = System.currentTimeMillis();
long startTime = System.nanoTime();
…
long endTime = System.nanoTime() - startTime;
Solution: Timer resolution
"resolution is at least as good as that of currentTimeMillis()"
"granularity of the value depends on the underlying operating system and may be larger"
Source
• http://docs.oracle.com/javase/7/docs/api/java/lang/System.html
General Principal #1Know what you are measuring
BasicDBObject doc = new BasicDBObject();
doc.put("v", str); // str is a 2k string
for (int i=0; i < 1000; i++) {
doc.put("_id",i); coll.insert(doc);
}
BasicDBObject predicate = new BasicDBObject();
long startTime = System.currentTimeMillis();
DBCursor cur = coll.find(predicate);
DBObject foundObj;
while (cur.hasNext()) {
foundObj = cur.next();
}
long endTime = System.currentTimeMillis();
#2 Response time to return all results
BasicDBObject doc = new BasicDBObject();
doc.put("v", str); // str is a 2k string
for (int i=0; i < 1000; i++) {
doc.put("_id",i); coll.insert(doc);
}
BasicDBObject predicate = new BasicDBObject();
long startTime = System.currentTimeMillis();
DBCursor cur = coll.find(predicate);
DBObject foundObj;
while (cur.hasNext()) {
foundObj = cur.next();
}
long endTime = System.currentTimeMillis();
#2 Response time to return all results
BasicDBObject doc = new BasicDBObject();
doc.put("v", str); // str is a 2k string
for (int i=0; i < 1000; i++) {
doc.put("_id",i); coll.insert(doc);
}
BasicDBObject predicate = new BasicDBObject();
long startTime = System.currentTimeMillis();
DBCursor cur = coll.find(predicate);
DBObject foundObj;
while (cur.hasNext()) {
foundObj = cur.next();
}
long endTime = System.currentTimeMillis();
#2 Response time to return all results
BasicDBObject doc = new BasicDBObject();
doc.put("v", str); // str is a 2k string
for (int i=0; i < 1000; i++) {
doc.put("_id",i); coll.insert(doc);
}
BasicDBObject predicate = new BasicDBObject();
long startTime = System.currentTimeMillis();
DBCursor cur = coll.find(predicate);
DBObject foundObj;
while (cur.hasNext()) {
foundObj = cur.next();
}
long endTime = System.currentTimeMillis();
#2 Response time to return all results
BasicDBObject doc = new BasicDBObject();
doc.put("v", str); // str is a 2k string
for (int i=0; i < 1000; i++) {
doc.put("_id",i); coll.insert(doc);
}
BasicDBObject predicate = new BasicDBObject();
long startTime = System.currentTimeMillis();
DBCursor cur = coll.find(predicate);
DBObject foundObj;
while (cur.hasNext()) {
foundObj = cur.next();
}
long endTime = System.currentTimeMillis();
So that looks ok, right?
BasicDBObject doc = new BasicDBObject();
doc.put("v", str); // str is a 2k string
for (int i=0; i < 1000; i++) {
doc.put("_id",i); coll.insert(doc);
}
BasicDBObject predicate = new BasicDBObject();
long startTime = System.currentTimeMillis();
DBCursor cur = coll.find(predicate);
DBObject foundObj;
while (cur.hasNext()) {
foundObj = cur.next();
}
long endTime = System.currentTimeMillis();
What else are you measuring?
Each doc is is 4080 bytes on disk with powerOf2Sizes
BasicDBObject doc = new BasicDBObject();
doc.put("v", str); // str is a 2k string
for (int i=0; i < 1000; i++) {
doc.put("_id",i); coll.insert(doc);
}
BasicDBObject predicate = new BasicDBObject();
long startTime = System.currentTimeMillis();
DBCursor cur = coll.find(predicate);
DBObject foundObj;
while (cur.hasNext()) {
foundObj = cur.next();
}
long endTime = System.currentTimeMillis();
What else are you measuring?
Each doc is is 4080 bytes on disk with powerOf2Sizes
Unrestricted predicate?
BasicDBObject doc = new BasicDBObject();
doc.put("v", str); // str is a 2k string
for (int i=0; i < 1000; i++) {
doc.put("_id",i); coll.insert(doc);
}
BasicDBObject predicate = new BasicDBObject();
long startTime = System.currentTimeMillis();
DBCursor cur = coll.find(predicate);
DBObject foundObj;
while (cur.hasNext()) {
foundObj = cur.next();
}
long endTime = System.currentTimeMillis();
What else are you measuring?
Each doc is is 4080 bytes on disk with powerOf2Sizes
Measuring• Time to parse &
execute query• Time to retrieve all
documentBut also• Cost of shipping
~4MB data through network stack
Unrestricted predicate?
BasicDBObject predicate = new BasicDBObject();
predicate.put("_id", new BasicDBObject("$gte", 10).append("$lte", 20));
BasicDBObject projection = new BasicDBObject();
projection.put("_id", 1);
long startTime = System.currentTimeMillis();
DBCursor cur = coll.find(predicate, projection );
DBObject foundObj;
while (cur.hasNext()) {
foundObj = cur.next();
}
long endTime = System.currentTimeMillis();
Solution: Limit the projection
Return fixed range
BasicDBObject predicate = new BasicDBObject();
predicate.put("_id", new BasicDBObject("$gte", 10).append("$lte", 20));
BasicDBObject projection = new BasicDBObject();
projection.put("_id", 1);
long startTime = System.currentTimeMillis();
DBCursor cur = coll.find(predicate, projection );
DBObject foundObj;
while (cur.hasNext()) {
foundObj = cur.next();
}
long endTime = System.currentTimeMillis();
Solution: Limit the projection
Only project _id
Return fixed range
BasicDBObject predicate = new BasicDBObject();
predicate.put("_id", new BasicDBObject("$gte", 10).append("$lte", 20));
BasicDBObject projection = new BasicDBObject();
projection.put("_id", 1);
long startTime = System.currentTimeMillis();
DBCursor cur = coll.find(predicate, projection );
DBObject foundObj;
while (cur.hasNext()) {
foundObj = cur.next();
}
long endTime = System.currentTimeMillis();
Solution: Limit the projection
Only project _id
Only 46k transferred through network stack
Return fixed range
General Principal #2Measure only what you need to measure
Part TwoPerformance measurement & diagnosis
Broad categories
• Micro Benchmarks
• Workloads
Micro benchmarks: mongo-perf
mongo-perf: goals
• Measure– commands
• Configure– Single mongod, ReplSet size (1 -> n), Sharding– Single vs. Multiple DB– O/S
• Characterize– Throughput (ops/second) by thread count
• Compare
What do you get?
Better
Thread Count
Ops/
Sec
What do you get?
Measured improvement
between rc0 and rc2
Better
tests.push( { name: "Commands.CountsIntIDRange", pre: function( collection ) { collection.drop(); for ( var i = 0; i < 1000; i++ ) { collection.insert( { _id : i } ); } collection.getDB().getLastError(); }, ops: [ { op: "command", ns : "testdb", command : { count : "mycollection", query : { _id : { "$gt" : 10, "$lt" : 100 } } } } ] } );
Benchmark source code
tests.push( { name: "Commands.CountsIntIDRange", pre: function( collection ) { collection.drop(); for ( var i = 0; i < 1000; i++ ) { collection.insert( { _id : i } ); } collection.getDB().getLastError(); }, ops: [ { op: "command", ns : "testdb", command : { count : "mycollection", query : { _id : { "$gt" : 10, "$lt" : 100 } } } } ] } );
Benchmark source code
tests.push( { name: "Commands.CountsIntIDRange", pre: function( collection ) { collection.drop(); for ( var i = 0; i < 1000; i++ ) { collection.insert( { _id : i } ); } collection.getDB().getLastError(); }, ops: [ { op: "command", ns : "testdb", command : { count : "mycollection", query : { _id : { "$gt" : 10, "$lt" : 100 } } } } ] } );
Benchmark source code
tests.push( { name: "Commands.CountsIntIDRange", pre: function( collection ) { collection.drop(); for ( var i = 0; i < 1000; i++ ) { collection.insert( { _id : i } ); } collection.getDB().getLastError(); }, ops: [ { op: "command", ns : "testdb", command : { count : "mycollection", query : { _id : { "$gt" : 10, "$lt" : 100 } } } } ] } );
Benchmark source code
Code Change
Workloads
• "public" workloads– YCSB– Sysbench
• "real world" simulations– Inbox fan in/out– Message Stores– Content Management
Example: Bulk Load Performance16m Documents
Better
55% degradation 2.6.0-rc1 vs
2.4.10
Ouch… where's the tree in the woods?
• 2.4.10 -> 2.6.0– 4495 git commits
git-bisect
• Bisect between good/bad hashes
• git-bisect nominates a new githash– Build against githash– Re-run test– Confirm if this githash is good/bad
• Rinse and repeat
Code Change - Bad Githash
Code Change - Fix
Bulk Load Performance - Fix
Better
11% improvement
2.6.1 vs 2.4.10
The problem with measurement
• Observability– What can you observe on the system?
• Effect– What effects does the observation cause?
mtools
mtools
• MongoDB log file analysis– Filter logs for operations, events– Response time, lock durations– Plot
• https://github.com/rueckstiess/mtools
Code Change – Yielding Policy
Code Change
Response TimesBulk Insert 2.6.0 vs 2.6.1
Ceiling similar, lower floor resulting in 40% improvement in
throughput
Secondary effects of Yield policy changeWrite lock time reduced
Order of magnitude reduction of write lock
duration
> db.serverStatus()
Yes – will cause a read lock to be acquired
> db.serverStatus({recordStats:0})
No – lock is not acquired
> mongostat
Yes - until SERVER-14008 resolved, uses db.serverStatus()
Unexpected side effects of measurement?
CPU sampling
• Get an impression of– Call Graphs– CPU time spent on node and called nodes
> sudo apt-get install google-perftools
> sudo apt-get install libunwind7-dev
> scons --use-cpu-profiler mongod
Setup & building with google-profiler
> mongod –dbpath <…>
Note: Do not use –fork
> mongo
> use admin
> db.runCommand({_cpuProfilerStart: {profileFilename: 'foo.prof'}})
Execute some commands that you want to profile
> db.runCommand({_cpuProfilerStop: 1})
Start the profiling
Sample start vs. end of workload
Sample start vs. end of workload
Code change
Public Benchmarks – Not all forks are the same…
• YCSB– https://github.com/achille/YCSB
• sysbench-mongodb– https://github.com/mdcallag/sysbench-mongodb