Debugging Ruby (Aman Gupta)
-
Upload
mongosf -
Category
Technology
-
view
5.081 -
download
1
description
Transcript of Debugging Ruby (Aman Gupta)
Debugging Rubywith MongoDB
Aman Gupta@tmm1
debugging ruby?
• i use ruby
debugging ruby?
• i use ruby
• my ruby processes use a lot of ram
debugging ruby?
• i use ruby
• my ruby processes use a lot of ram
• i want to fix this
let’s build a debugger
• step 1: collect data
• list of all ruby objects in memory
let’s build a debugger
• step 1: collect data
• list of all ruby objects in memory
• step 2: analyze data
• group by type
• group by file/line
• simple patch to ruby VM (300 lines of C)
• http://gist.github.com/73674
• simple text based output format
0x154750 @ -e:1 is OBJECT of type: T0x15476c @ -e:1 is HASH which has data0x154788 @ -e:1 is ARRAY of len: 00x1547c0 @ -e:1 is STRING (SHARED) len: 2 and val: hi0x1547dc @ -e:1 is STRING len: 1 and val: T0x154814 @ -e:1 is CLASS named: T inherits from Object0x154a98 @ -e:1 is STRING len: 2 and val: hi0x154b40 @ -e:1 is OBJECT of type: Range
version 1: collect data
version 1: analyze data$ wc -l /tmp/ruby.heap
1571529 /tmp/ruby.heap
version 1: analyze data
$ cat /tmp/ruby.heap | awk '{ print $3 }' | sort | uniq -c | sort -g | tail -1
236840 memcached/memcached.rb:316
$ wc -l /tmp/ruby.heap
1571529 /tmp/ruby.heap
version 1: analyze data
$ cat /tmp/ruby.heap | awk '{ print $3 }' | sort | uniq -c | sort -g | tail -1
236840 memcached/memcached.rb:316
$ grep "memcached.rb:316" /tmp/ruby.heap | awk '{ print $5 }' | sort | uniq -c | sort -g | tail -5
10948 ARRAY 20355 OBJECT 30744 DATA 64952 HASH 123290 STRING
$ wc -l /tmp/ruby.heap
1571529 /tmp/ruby.heap
version 1
version 1
• it works!
version 1
• it works!
• but...
version 1
• it works!
• but...
• must patch and rebuild ruby binary
version 1
• it works!
• but...
• must patch and rebuild ruby binary
• no information about references between objects
version 1
• it works!
• but...
• must patch and rebuild ruby binary
• no information about references between objects
• limited analysis via shell scripting
version 2 goals
• better data format
version 2 goals
• better data format
• simple: one line of text per object
version 2 goals
• better data format
• simple: one line of text per object
• expressive: include all details about object contents and references
version 2 goals
• better data format
• simple: one line of text per object
• expressive: include all details about object contents and references
• easy to use: easy to generate from C code & easy to consume from various scripting languages
version 2 goals
JSON!
version 2 is memprof
version 2 is memprof• no patches to ruby necessary
• gem install memprof
• require ‘memprof’
• Memprof.dump_all(“/tmp/app.json”)
version 2 is memprof• no patches to ruby necessary
• gem install memprof
• require ‘memprof’
• Memprof.dump_all(“/tmp/app.json”)
• C extension for MRI ruby VMhttp://github.com/ice799/memprof
• uses libyajl to dump out all ruby objects as json
stringsMemprof.dump{ “hello” + “world”}
{ "_id": "0x19c610",
"file": "-e", "line": 1,
"type": "string", "class": "0x1ba7f0", "class_name": "String",
"length": 10, "data": "helloworld"}
memory address of object
stringsMemprof.dump{ “hello” + “world”}
{ "_id": "0x19c610",
"file": "-e", "line": 1,
"type": "string", "class": "0x1ba7f0", "class_name": "String",
"length": 10, "data": "helloworld"}
memory address of object
file and line where string was created
stringsMemprof.dump{ “hello” + “world”}
{ "_id": "0x19c610",
"file": "-e", "line": 1,
"type": "string", "class": "0x1ba7f0", "class_name": "String",
"length": 10, "data": "helloworld"}
memory address of object
file and line where string was created
address of the class object “String”
stringsMemprof.dump{ “hello” + “world”}
{ "_id": "0x19c610",
"file": "-e", "line": 1,
"type": "string", "class": "0x1ba7f0", "class_name": "String",
"length": 10, "data": "helloworld"}
memory address of object
file and line where string was created
length and contentsof this string instance
address of the class object “String”
stringsMemprof.dump{ “hello” + “world”}
arraysMemprof.dump{ [ 1, :b, 2.2, “d” ]}
{ "_id": "0x19c5c0",
"class": "0x1b0d18", "class_name": "Array",
"length": 4, "data": [ 1, ":b",
"0x19c750", "0x19c598" ]}
arraysMemprof.dump{ [ 1, :b, 2.2, “d” ]}
{ "_id": "0x19c5c0",
"class": "0x1b0d18", "class_name": "Array",
"length": 4, "data": [ 1, ":b",
"0x19c750", "0x19c598" ]}
integers and symbols are stored in the array itself
arraysMemprof.dump{ [ 1, :b, 2.2, “d” ]}
floats and strings are separate ruby objects
{ "_id": "0x19c5c0",
"class": "0x1b0d18", "class_name": "Array",
"length": 4, "data": [ 1, ":b",
"0x19c750", "0x19c598" ]}
integers and symbols are stored in the array itself
arraysMemprof.dump{ [ 1, :b, 2.2, “d” ]}
hashesMemprof.dump{ { :a => 1, “b” => 2.2 }}
hashes{ "_id": "0x19c598",
"type": "hash", "class": "0x1af170", "class_name": "Hash",
"default": null,
"length": 2, "data": [ [ ":a", 1 ], [ "0xc728", "0xc750" ] ]}
Memprof.dump{ { :a => 1, “b” => 2.2 }}
hashes{ "_id": "0x19c598",
"type": "hash", "class": "0x1af170", "class_name": "Hash",
"default": null,
"length": 2, "data": [ [ ":a", 1 ], [ "0xc728", "0xc750" ] ]}
hash entries as key/value pairs
Memprof.dump{ { :a => 1, “b” => 2.2 }}
hashes{ "_id": "0x19c598",
"type": "hash", "class": "0x1af170", "class_name": "Hash",
"default": null,
"length": 2, "data": [ [ ":a", 1 ], [ "0xc728", "0xc750" ] ]}
hash entries as key/value pairs
no default proc
Memprof.dump{ { :a => 1, “b” => 2.2 }}
classesMemprof.dump{ class Hello @@var=1 Const=2 def world() end end}
classesMemprof.dump{ class Hello @@var=1 Const=2 def world() end end}
{ "_id": "0x19c408",
"type": "class", "name": "Hello", "super": "0x1bfa48", "super_name": "Object",
"ivars": { "@@var": 1, "Const": 2 }, "methods": { "world": "0x19c318" }}
classesMemprof.dump{ class Hello @@var=1 Const=2 def world() end end}
{ "_id": "0x19c408",
"type": "class", "name": "Hello", "super": "0x1bfa48", "super_name": "Object",
"ivars": { "@@var": 1, "Const": 2 }, "methods": { "world": "0x19c318" }}
superclass object reference
classesMemprof.dump{ class Hello @@var=1 Const=2 def world() end end}
{ "_id": "0x19c408",
"type": "class", "name": "Hello", "super": "0x1bfa48", "super_name": "Object",
"ivars": { "@@var": 1, "Const": 2 }, "methods": { "world": "0x19c318" }}
class variables and constants are stored in the instance variable table
superclass object reference
classesMemprof.dump{ class Hello @@var=1 Const=2 def world() end end}
{ "_id": "0x19c408",
"type": "class", "name": "Hello", "super": "0x1bfa48", "super_name": "Object",
"ivars": { "@@var": 1, "Const": 2 }, "methods": { "world": "0x19c318" }}
class variables and constants are stored in the instance variable table
superclass object reference
references to method objects
version 2: analyze data
version 2: memprof.coma web-based heap visualizer and leak analyzer
built on...
built on...
$ mongoimport -d memprof -c rails --file /tmp/app.json$ mongo memprof
built on...
$ mongoimport -d memprof -c rails --file /tmp/app.json$ mongo memprof
let’s run some queries.
how many objects?> db.rails.count()809816
• ruby scripts create a lot of objects
• usually not a problem, but...
• MRI has a naïve stop-the-world mark/sweep GC
• fewer objects = faster GC = better performance
what types of objects?> db.rails.distinct(‘type’)
[‘array’, ‘bignum’, ‘class’, ‘float’, ‘hash’, ‘module’, ‘node’, ‘object’, ‘regexp’, ‘string’, ...]
mongodb: distinct
mongodb: distinct• distinct(‘type’)
list of types of objects
mongodb: distinct• distinct(‘type’)
list of types of objects
• distinct(‘file’)list of source files
mongodb: distinct• distinct(‘type’)
list of types of objects
• distinct(‘file’)list of source files
• distinct(‘class_name’)list of instance class names
mongodb: distinct• distinct(‘type’)
list of types of objects
• distinct(‘file’)list of source files
• distinct(‘class_name’)list of instance class names
• optionally filter first
• distinct(‘name’, {type:“class”})names of all defined classes
improve performancewith indexes
> db.rails.ensureIndex({‘type’:1})
improve performancewith indexes
> db.rails.ensureIndex({‘type’:1})
> db.rails.ensureIndex( {‘file’:1}, {background:true})
mongodb: ensureIndex
• add an index on a field (if it doesn’t exist yet)
• improve performance of queries against common fields: type, class_name, super, file
mongodb: ensureIndex
• add an index on a field (if it doesn’t exist yet)
• improve performance of queries against common fields: type, class_name, super, file
• can index embedded field names
• ensureIndex(‘methods.add’)
• find({‘methods.add’:{$exists:true}})find classes that define the method add
> db.rails.group({ initial: {count:0}, key: {type:true}, cond: {}, reduce: function(obj, out) { out.count++ }}).sort(function(a,b){ return a.count - b.count})
how many objs per type?
group on type
> db.rails.group({ initial: {count:0}, key: {type:true}, cond: {}, reduce: function(obj, out) { out.count++ }}).sort(function(a,b){ return a.count - b.count})
how many objs per type?
group on type
increment countfor each obj
> db.rails.group({ initial: {count:0}, key: {type:true}, cond: {}, reduce: function(obj, out) { out.count++ }}).sort(function(a,b){ return a.count - b.count})
how many objs per type?
group on type
increment countfor each obj
sort results
[ ..., {type: ‘array’, count: 7621}, {type: ‘string’, count: 69139}, {type: ‘node’, count: 365285}]
how many objs per type?
[ ..., {type: ‘array’, count: 7621}, {type: ‘string’, count: 69139}, {type: ‘node’, count: 365285}]
lots of nodes
how many objs per type?
[ ..., {type: ‘array’, count: 7621}, {type: ‘string’, count: 69139}, {type: ‘node’, count: 365285}]
• nodes represent ruby code
• stored like any other ruby object
• makes ruby completely dynamic
lots of nodes
how many objs per type?
mongodb: group
mongodb: group
• cond: query to filter objects before grouping
mongodb: group
• cond: query to filter objects before grouping
• key: field(s) to group on
mongodb: group
• cond: query to filter objects before grouping
• key: field(s) to group on
• initial: initial values for each group’s results
mongodb: group
• cond: query to filter objects before grouping
• key: field(s) to group on
• initial: initial values for each group’s results
• reduce: aggregation function
mongodb: group
mongodb: group• by type or class
• key: {type:1}• key: {class_name:1}
mongodb: group• by type or class
• key: {type:1}• key: {class_name:1}
• by file & line• key: {file:1, line:1}
mongodb: group• by type or class
• key: {type:1}• key: {class_name:1}
• by file & line• key: {file:1, line:1}
• by type in a specific file• cond: {file: “app.rb”},
key: {file:1, line:1}
mongodb: group• by type or class
• key: {type:1}• key: {class_name:1}
• by file & line• key: {file:1, line:1}
• by type in a specific file• cond: {file: “app.rb”},
key: {file:1, line:1}
• by length of strings in a specific file• cond: {file:“app.rb”,type:‘string’},
key: {length:1}
what subclasses String?> db.rails.find( {super_name:"String"}, {name:1})
{name: "ActiveSupport::SafeBuffer"}{name: "ActiveSupport::StringInquirer"}{name: "SQLite3::Blob"}{name: "ActiveModel::Name"}{name: "Arel::Attribute::Expressions"}{name: "ActiveSupport::JSON::Variable"}
what subclasses String?> db.rails.find( {super_name:"String"}, {name:1})
{name: "ActiveSupport::SafeBuffer"}{name: "ActiveSupport::StringInquirer"}{name: "SQLite3::Blob"}{name: "ActiveModel::Name"}{name: "Arel::Attribute::Expressions"}{name: "ActiveSupport::JSON::Variable"}
select only name field
mongodb: find
mongodb: find
• find({type:‘string’})all strings
mongodb: find
• find({type:‘string’})all strings
• find({type:{$ne:‘string’}})everything except strings
mongodb: find
• find({type:‘string’})all strings
• find({type:{$ne:‘string’}})everything except strings
• find({type:‘string’}, {data:1})only select string’s data field
the largest objects?> db.rails.find( {type: {$in:['string','array','hash']} }, {type:1,length:1}).sort({length:-1}).limit(3) {type: "string", length: 2308}{type: "string", length: 1454}{type: "string", length: 1238}
mongodb: sort, limit/skip
mongodb: sort, limit/skip
• sort({length:-1,file:1})sort by length desc, file asc
mongodb: sort, limit/skip
• sort({length:-1,file:1})sort by length desc, file asc
• limit(10)first 10 results
mongodb: sort, limit/skip
• sort({length:-1,file:1})sort by length desc, file asc
• limit(10)first 10 results
• skip(10).limit(10)second 10 results
when were objs created?• useful to look at objects over time
• each obj has a timestamp of when it was created
when were objs created?• useful to look at objects over time
• each obj has a timestamp of when it was created
• find minimum time, call it start_time
when were objs created?• useful to look at objects over time
• each obj has a timestamp of when it was created
• find minimum time, call it start_time
• create buckets for every minute of execution sincestart
when were objs created?• useful to look at objects over time
• each obj has a timestamp of when it was created
• find minimum time, call it start_time
• create buckets for every minute of execution sincestart
• place objects into buckets
when were objs created?> db.rails.mapReduce(function(){ var secs = this.time - start_time; var mins_since_start = secs % 60; emit(mins_since_start, 1); }, function(key, vals){ for(var i=0,sum=0; i<vals.length; sum += vals[i++]); return sum; }, { scope: { start_time: db.rails.find().sort({time:1}).limit(1)[0].time } }){result:"tmp.mr_1272615772_3"}
start_time = min(time)
mongodb: mapReduce• arguments
•map: function that emits one or more key/value pairs given each object this
• reduce: function to return aggregate result, given key and list of values
• scope: global variables to set for funcs
mongodb: mapReduce• arguments
•map: function that emits one or more key/value pairs given each object this
• reduce: function to return aggregate result, given key and list of values
• scope: global variables to set for funcs
• results
• stored in a temporary collection(tmp.mr_1272615772_3)
when were objs created?> db.tmp.mr_1272615772_3.count()12
script was running for 12 minutes
when were objs created?> db.tmp.mr_1272615772_3.count()12
script was running for 12 minutes
> db.tmp.mr_1272615772_3.find().sort({value:-1}).limit(1){_id: 8, value: 41231}
41k objects created 8 minutes after start
references to this object?ary = [“a”,”b”,”c”]
ary references “a”“b” referenced by ary
• ruby makes it easy to “leak” references
• an object will stay around until all references to it are gone
• more objects = longer GC = bad performance
• must find references to fix leaks
references to this object?• db.rails_refs.insert({
_id:"0xary", refs:["0xa","0xb","0xc"]})create references lookup table
references to this object?• db.rails_refs.insert({
_id:"0xary", refs:["0xa","0xb","0xc"]})create references lookup table
• db.rails_refs.ensureIndex({refs:1})add ‘multikey’ index to refs array
references to this object?• db.rails_refs.insert({
_id:"0xary", refs:["0xa","0xb","0xc"]})create references lookup table
• db.rails_refs.ensureIndex({refs:1})add ‘multikey’ index to refs array
• db.rails_refs.find({refs:“0xa”})efficiently lookup all objs holding a ref to 0xa
mongodb: multikeys
• indexes on array values create a ‘multikey’ index
• classic example: nested array of tags
• find({tags: “ruby”})find objs where obj.tags includes “ruby”
version 2: memprof.coma web-based heap visualizer and leak analyzer
memprof.coma web-based heap visualizer and leak analyzer
memprof.coma web-based heap visualizer and leak analyzer
memprof.coma web-based heap visualizer and leak analyzer
memprof.coma web-based heap visualizer and leak analyzer
memprof.coma web-based heap visualizer and leak analyzer
memprof.coma web-based heap visualizer and leak analyzer
memprof.coma web-based heap visualizer and leak analyzer
memprof.coma web-based heap visualizer and leak analyzer
memprof.coma web-based heap visualizer and leak analyzer
plugging a leak in rails3• in dev mode, rails3 is leaking 10mb per request
plugging a leak in rails3• in dev mode, rails3 is leaking 10mb per request
# in environment.rbrequire `gem which memprof/signal`.strip
let’s use memprof to find it!
plugging a leak in rails3
send the app some requests so it leaks
$ ab -c 1 -n 30 http://localhost:3000/
plugging a leak in rails3
tell memprof to dump out the entire heap to json
$ memprof --pid <pid> --name <dump name> --key <api key>
send the app some requests so it leaks
$ ab -c 1 -n 30 http://localhost:3000/
plugging a leak in rails3
tell memprof to dump out the entire heap to json
$ memprof --pid <pid> --name <dump name> --key <api key>
send the app some requests so it leaks
$ ab -c 1 -n 30 http://localhost:3000/
2519 classes
2519 classes
30 copies of TestController
2519 classes
30 copies of TestController
2519 classes
30 copies of TestController
mongo query for all TestController classes
2519 classes
30 copies of TestController
mongo query for all TestController classes
details for one copy of TestController
find references to object
find references to object
find references to object
holding references to all controllers
find references to object
holding references to all controllers
“leak” is on line 178
• In development mode, Rails reloads all your application code on every request
• In development mode, Rails reloads all your application code on every request
• ActionView::Partials::PartialRenderer is caching partials used by each controller as an optimization
• In development mode, Rails reloads all your application code on every request
• ActionView::Partials::PartialRenderer is caching partials used by each controller as an optimization
• But.. it ends up holding a reference to every single reloaded version of those controllers
• In development mode, Rails reloads all your application code on every request
• ActionView::Partials::PartialRenderer is caching partials used by each controller as an optimization
• But.. it ends up holding a reference to every single reloaded version of those controllers
Questions?
Aman Gupta@tmm1