Debugging Ruby (Aman Gupta)

Debugging Rubywith MongoDB

Aman Gupta@tmm1

debugging ruby?

• i use ruby

debugging ruby?

• i use ruby

• my ruby processes use a lot of ram

debugging ruby?

• i use ruby

• my ruby processes use a lot of ram

• i want to fix this

let’s build a debugger

• step 1: collect data

• list of all ruby objects in memory

let’s build a debugger

• step 1: collect data

• list of all ruby objects in memory

• step 2: analyze data

• group by type

• group by file/line

• simple patch to ruby VM (300 lines of C)

• http://gist.github.com/73674

• simple text based output format

0x154750 @ -e:1 is OBJECT of type: T0x15476c @ -e:1 is HASH which has data0x154788 @ -e:1 is ARRAY of len: 00x1547c0 @ -e:1 is STRING (SHARED) len: 2 and val: hi0x1547dc @ -e:1 is STRING len: 1 and val: T0x154814 @ -e:1 is CLASS named: T inherits from Object0x154a98 @ -e:1 is STRING len: 2 and val: hi0x154b40 @ -e:1 is OBJECT of type: Range

version 1: collect data

http://gist.github.com/73674

http://gist.github.com/73674

version 1: analyze data$ wc -l /tmp/ruby.heap

1571529 /tmp/ruby.heap

version 1: analyze data

$ cat /tmp/ruby.heap | awk '{ print $3 }' | sort | uniq -c | sort -g | tail -1

236840 memcached/memcached.rb:316

$ wc -l /tmp/ruby.heap



$ cat /tmp/ruby.heap | awk '{ print $3 }' | sort | uniq -c | sort -g | tail -1

236840 memcached/memcached.rb:316

$ grep "memcached.rb:316" /tmp/ruby.heap | awk '{ print $5 }' | sort | uniq -c | sort -g | tail -5

10948 ARRAY 20355 OBJECT 30744 DATA 64952 HASH 123290 STRING

$ wc -l /tmp/ruby.heap


version 1

version 1

• it works!

version 1

• it works!

• but...

version 1

• it works!

• but...

• must patch and rebuild ruby binary

version 1

• it works!

• but...


• no information about references between objects

version 1

• it works!

• but...


• no information about references between objects

• limited analysis via shell scripting

version 2 goals

• better data format

version 2 goals


• simple: one line of text per object

version 2 goals



• expressive: include all details about object contents and references

version 2 goals



• expressive: include all details about object contents and references

• easy to use: easy to generate from C code & easy to consume from various scripting languages

version 2 goals

version 2 is memprof

version 2 is memprof• no patches to ruby necessary

• gem install memprof

• require ‘memprof’

• Memprof.dump_all(“/tmp/app.json”)

version 2 is memprof• no patches to ruby necessary

• gem install memprof

• require ‘memprof’

• Memprof.dump_all(“/tmp/app.json”)

• C extension for MRI ruby VMhttp://github.com/ice799/memprof

• uses libyajl to dump out all ruby objects as json

http://github.com/ice799/memprof

http://github.com/ice799/memprof

stringsMemprof.dump{ “hello” + “world”}

{ "_id": "0x19c610",

"file": "-e", "line": 1,

"type": "string", "class": "0x1ba7f0", "class_name": "String",

"length": 10, "data": "helloworld"}

memory address of object


{ "_id": "0x19c610",

"file": "-e", "line": 1,




file and line where string was created


{ "_id": "0x19c610",

"file": "-e", "line": 1,





address of the class object “String”


{ "_id": "0x19c610",

"file": "-e", "line": 1,





length and contentsof this string instance

address of the class object “String”


arraysMemprof.dump{ [ 1, :b, 2.2, “d” ]}

{ "_id": "0x19c5c0",

"class": "0x1b0d18", "class_name": "Array",

"length": 4, "data": [ 1, ":b",

"0x19c750", "0x19c598" ]}


{ "_id": "0x19c5c0",


"length": 4, "data": [ 1, ":b",

"0x19c750", "0x19c598" ]}

integers and symbols are stored in the array itself


floats and strings are separate ruby objects

{ "_id": "0x19c5c0",


"length": 4, "data": [ 1, ":b",

"0x19c750", "0x19c598" ]}

integers and symbols are stored in the array itself


hashesMemprof.dump{ { :a => 1, “b” => 2.2 }}

hashes{ "_id": "0x19c598",

"type": "hash", "class": "0x1af170", "class_name": "Hash",

"default": null,

"length": 2, "data": [ [ ":a", 1 ], [ "0xc728", "0xc750" ] ]}

Memprof.dump{ { :a => 1, “b” => 2.2 }}

hashes{ "_id": "0x19c598",


"default": null,


hash entries as key/value pairs

Memprof.dump{ { :a => 1, “b” => 2.2 }}

hashes{ "_id": "0x19c598",


"default": null,


hash entries as key/value pairs

no default proc

Memprof.dump{ { :a => 1, “b” => 2.2 }}

classesMemprof.dump{ class Hello @@var=1 Const=2 def world() end end}


{ "_id": "0x19c408",

"type": "class", "name": "Hello", "super": "0x1bfa48", "super_name": "Object",

"ivars": { "@@var": 1, "Const": 2 }, "methods": { "world": "0x19c318" }}


{ "_id": "0x19c408",



superclass object reference


{ "_id": "0x19c408",



class variables and constants are stored in the instance variable table



{ "_id": "0x19c408",



class variables and constants are stored in the instance variable table


references to method objects

version 2: memprof.coma web-based heap visualizer and leak analyzer

built on...

built on...

$ mongoimport -d memprof -c rails --file /tmp/app.json$ mongo memprof

built on...

$ mongoimport -d memprof -c rails --file /tmp/app.json$ mongo memprof

let’s run some queries.

how many objects?> db.rails.count()809816

• ruby scripts create a lot of objects

• usually not a problem, but...

• MRI has a naïve stop-the-world mark/sweep GC

• fewer objects = faster GC = better performance

what types of objects?> db.rails.distinct(‘type’)

[‘array’, ‘bignum’, ‘class’, ‘float’, ‘hash’, ‘module’, ‘node’, ‘object’, ‘regexp’, ‘string’, ...]

mongodb: distinct

mongodb: distinct• distinct(‘type’)

list of types of objects



• distinct(‘file’)list of source files




• distinct(‘class_name’)list of instance class names




• distinct(‘class_name’)list of instance class names

• optionally filter first

• distinct(‘name’, {type:“class”})names of all defined classes

improve performancewith indexes

> db.rails.ensureIndex({‘type’:1})

improve performancewith indexes

> db.rails.ensureIndex({‘type’:1})

> db.rails.ensureIndex( {‘file’:1}, {background:true})

mongodb: ensureIndex

• add an index on a field (if it doesn’t exist yet)

• improve performance of queries against common fields: type, class_name, super, file

mongodb: ensureIndex

• add an index on a field (if it doesn’t exist yet)

• improve performance of queries against common fields: type, class_name, super, file

• can index embedded field names

• ensureIndex(‘methods.add’)

• find({‘methods.add’:{$exists:true}})find classes that define the method add

> db.rails.group({ initial: {count:0}, key: {type:true}, cond: {}, reduce: function(obj, out) { out.count++ }}).sort(function(a,b){ return a.count - b.count})

how many objs per type?

group on type



group on type

increment countfor each obj



group on type

increment countfor each obj

sort results

[ ..., {type: ‘array’, count: 7621}, {type: ‘string’, count: 69139}, {type: ‘node’, count: 365285}]



lots of nodes



• nodes represent ruby code

• stored like any other ruby object

• makes ruby completely dynamic

lots of nodes


mongodb: group

mongodb: group

• cond: query to filter objects before grouping

mongodb: group


• key: field(s) to group on

mongodb: group



• initial: initial values for each group’s results

mongodb: group



• initial: initial values for each group’s results

• reduce: aggregation function

mongodb: group

mongodb: group• by type or class

• key: {type:1}• key: {class_name:1}



• by file & line• key: {file:1, line:1}




• by type in a specific file• cond: {file: “app.rb”},

key: {file:1, line:1}




• by type in a specific file• cond: {file: “app.rb”},

key: {file:1, line:1}

• by length of strings in a specific file• cond: {file:“app.rb”,type:‘string’},

key: {length:1}

what subclasses String?> db.rails.find( {super_name:"String"}, {name:1})

{name: "ActiveSupport::SafeBuffer"}{name: "ActiveSupport::StringInquirer"}{name: "SQLite3::Blob"}{name: "ActiveModel::Name"}{name: "Arel::Attribute::Expressions"}{name: "ActiveSupport::JSON::Variable"}

what subclasses String?> db.rails.find( {super_name:"String"}, {name:1})

{name: "ActiveSupport::SafeBuffer"}{name: "ActiveSupport::StringInquirer"}{name: "SQLite3::Blob"}{name: "ActiveModel::Name"}{name: "Arel::Attribute::Expressions"}{name: "ActiveSupport::JSON::Variable"}

select only name field

mongodb: find

mongodb: find

• find({type:‘string’})all strings

mongodb: find


• find({type:{$ne:‘string’}})everything except strings

mongodb: find


• find({type:{$ne:‘string’}})everything except strings

• find({type:‘string’}, {data:1})only select string’s data field

the largest objects?> db.rails.find( {type: {$in:['string','array','hash']} }, {type:1,length:1}).sort({length:-1}).limit(3) {type: "string", length: 2308}{type: "string", length: 1454}{type: "string", length: 1238}

mongodb: sort, limit/skip


• sort({length:-1,file:1})sort by length desc, file asc



• limit(10)first 10 results



• limit(10)first 10 results

• skip(10).limit(10)second 10 results

when were objs created?• useful to look at objects over time

• each obj has a timestamp of when it was created



• find minimum time, call it start_time




• create buckets for every minute of execution sincestart




• create buckets for every minute of execution sincestart

• place objects into buckets

when were objs created?> db.rails.mapReduce(function(){ var secs = this.time - start_time; var mins_since_start = secs % 60; emit(mins_since_start, 1); }, function(key, vals){ for(var i=0,sum=0; i<vals.length; sum += vals[i++]); return sum; }, { scope: { start_time: db.rails.find().sort({time:1}).limit(1)[0].time } }){result:"tmp.mr_1272615772_3"}

start_time = min(time)

mongodb: mapReduce• arguments

•map: function that emits one or more key/value pairs given each object this

• reduce: function to return aggregate result, given key and list of values

• scope: global variables to set for funcs

mongodb: mapReduce• arguments

•map: function that emits one or more key/value pairs given each object this

• reduce: function to return aggregate result, given key and list of values

• scope: global variables to set for funcs

• results

• stored in a temporary collection(tmp.mr_1272615772_3)

when were objs created?> db.tmp.mr_1272615772_3.count()12

script was running for 12 minutes

when were objs created?> db.tmp.mr_1272615772_3.count()12

script was running for 12 minutes

> db.tmp.mr_1272615772_3.find().sort({value:-1}).limit(1){_id: 8, value: 41231}

41k objects created 8 minutes after start

references to this object?ary = [“a”,”b”,”c”]

ary references “a”“b” referenced by ary

• ruby makes it easy to “leak” references

• an object will stay around until all references to it are gone

• more objects = longer GC = bad performance

• must find references to fix leaks

references to this object?• db.rails_refs.insert({

_id:"0xary", refs:["0xa","0xb","0xc"]})create references lookup table



• db.rails_refs.ensureIndex({refs:1})add ‘multikey’ index to refs array



• db.rails_refs.ensureIndex({refs:1})add ‘multikey’ index to refs array

• db.rails_refs.find({refs:“0xa”})efficiently lookup all objs holding a ref to 0xa

mongodb: multikeys

• indexes on array values create a ‘multikey’ index

• classic example: nested array of tags

• find({tags: “ruby”})find objs where obj.tags includes “ruby”

version 2: memprof.coma web-based heap visualizer and leak analyzer

memprof.coma web-based heap visualizer and leak analyzer

plugging a leak in rails3• in dev mode, rails3 is leaking 10mb per request

plugging a leak in rails3• in dev mode, rails3 is leaking 10mb per request

# in environment.rbrequire `gem which memprof/signal`.strip

let’s use memprof to find it!

plugging a leak in rails3

send the app some requests so it leaks

$ ab -c 1 -n 30 http://localhost:3000/

plugging a leak in rails3

tell memprof to dump out the entire heap to json

$ memprof --pid <pid> --name <dump name> --key <api key>

send the app some requests so it leaks

$ ab -c 1 -n 30 http://localhost:3000/

2519 classes

2519 classes

30 copies of TestController

2519 classes


mongo query for all TestController classes

2519 classes


mongo query for all TestController classes

details for one copy of TestController

find references to object


holding references to all controllers


holding references to all controllers

“leak” is on line 178

• In development mode, Rails reloads all your application code on every request


• ActionView::Partials::PartialRenderer is caching partials used by each controller as an optimization


• ActionView::Partials::PartialRenderer is caching partials used by each controller as an optimization

• But.. it ends up holding a reference to every single reloaded version of those controllers

Questions?

Aman Gupta@tmm1

Debugging Ruby (Aman Gupta)

Technology

Transcript of Debugging Ruby (Aman Gupta)