Debugging Ruby (Aman Gupta)

Post on 31-Aug-2014

5.081 views 1 download

Tags:

description

 

Transcript of Debugging Ruby (Aman Gupta)

Debugging Rubywith MongoDB

Aman Gupta@tmm1

debugging ruby?

• i use ruby

debugging ruby?

• i use ruby

• my ruby processes use a lot of ram

debugging ruby?

• i use ruby

• my ruby processes use a lot of ram

• i want to fix this

let’s build a debugger

• step 1: collect data

• list of all ruby objects in memory

let’s build a debugger

• step 1: collect data

• list of all ruby objects in memory

• step 2: analyze data

• group by type

• group by file/line

• simple patch to ruby VM (300 lines of C)

• http://gist.github.com/73674

• simple text based output format

0x154750 @ -e:1 is OBJECT of type: T0x15476c @ -e:1 is HASH which has data0x154788 @ -e:1 is ARRAY of len: 00x1547c0 @ -e:1 is STRING (SHARED) len: 2 and val: hi0x1547dc @ -e:1 is STRING len: 1 and val: T0x154814 @ -e:1 is CLASS named: T inherits from Object0x154a98 @ -e:1 is STRING len: 2 and val: hi0x154b40 @ -e:1 is OBJECT of type: Range

version 1: collect data

version 1: analyze data$ wc -l /tmp/ruby.heap

 1571529 /tmp/ruby.heap

version 1: analyze data

$ cat /tmp/ruby.heap | awk '{ print $3 }' | sort | uniq -c | sort -g | tail -1

 236840 memcached/memcached.rb:316

$ wc -l /tmp/ruby.heap

 1571529 /tmp/ruby.heap

version 1: analyze data

$ cat /tmp/ruby.heap | awk '{ print $3 }' | sort | uniq -c | sort -g | tail -1

 236840 memcached/memcached.rb:316

$ grep "memcached.rb:316" /tmp/ruby.heap | awk '{ print $5 }' | sort | uniq -c | sort -g | tail -5

   10948 ARRAY   20355 OBJECT   30744 DATA  64952 HASH  123290 STRING

$ wc -l /tmp/ruby.heap

 1571529 /tmp/ruby.heap

version 1

version 1

• it works!

version 1

• it works!

• but...

version 1

• it works!

• but...

• must patch and rebuild ruby binary

version 1

• it works!

• but...

• must patch and rebuild ruby binary

• no information about references between objects

version 1

• it works!

• but...

• must patch and rebuild ruby binary

• no information about references between objects

• limited analysis via shell scripting

version 2 goals

• better data format

version 2 goals

• better data format

• simple: one line of text per object

version 2 goals

• better data format

• simple: one line of text per object

• expressive: include all details about object contents and references

version 2 goals

• better data format

• simple: one line of text per object

• expressive: include all details about object contents and references

• easy to use: easy to generate from C code & easy to consume from various scripting languages

version 2 goals

JSON!

version 2 is memprof

version 2 is memprof• no patches to ruby necessary

• gem install memprof

• require ‘memprof’

• Memprof.dump_all(“/tmp/app.json”)

version 2 is memprof• no patches to ruby necessary

• gem install memprof

• require ‘memprof’

• Memprof.dump_all(“/tmp/app.json”)

• C extension for MRI ruby VMhttp://github.com/ice799/memprof

• uses libyajl to dump out all ruby objects as json

stringsMemprof.dump{ “hello” + “world”}

{ "_id": "0x19c610",

"file": "-e", "line": 1,

"type": "string", "class": "0x1ba7f0", "class_name": "String",

"length": 10, "data": "helloworld"}

memory address of object

stringsMemprof.dump{ “hello” + “world”}

{ "_id": "0x19c610",

"file": "-e", "line": 1,

"type": "string", "class": "0x1ba7f0", "class_name": "String",

"length": 10, "data": "helloworld"}

memory address of object

file and line where string was created

stringsMemprof.dump{ “hello” + “world”}

{ "_id": "0x19c610",

"file": "-e", "line": 1,

"type": "string", "class": "0x1ba7f0", "class_name": "String",

"length": 10, "data": "helloworld"}

memory address of object

file and line where string was created

address of the class object “String”

stringsMemprof.dump{ “hello” + “world”}

{ "_id": "0x19c610",

"file": "-e", "line": 1,

"type": "string", "class": "0x1ba7f0", "class_name": "String",

"length": 10, "data": "helloworld"}

memory address of object

file and line where string was created

length and contentsof this string instance

address of the class object “String”

stringsMemprof.dump{ “hello” + “world”}

arraysMemprof.dump{ [ 1, :b, 2.2, “d” ]}

{ "_id": "0x19c5c0",

"class": "0x1b0d18", "class_name": "Array",

"length": 4, "data": [ 1, ":b",

"0x19c750", "0x19c598" ]}

arraysMemprof.dump{ [ 1, :b, 2.2, “d” ]}

{ "_id": "0x19c5c0",

"class": "0x1b0d18", "class_name": "Array",

"length": 4, "data": [ 1, ":b",

"0x19c750", "0x19c598" ]}

integers and symbols are stored in the array itself

arraysMemprof.dump{ [ 1, :b, 2.2, “d” ]}

floats and strings are separate ruby objects

{ "_id": "0x19c5c0",

"class": "0x1b0d18", "class_name": "Array",

"length": 4, "data": [ 1, ":b",

"0x19c750", "0x19c598" ]}

integers and symbols are stored in the array itself

arraysMemprof.dump{ [ 1, :b, 2.2, “d” ]}

hashesMemprof.dump{ { :a => 1, “b” => 2.2 }}

hashes{ "_id": "0x19c598",

"type": "hash", "class": "0x1af170", "class_name": "Hash",

"default": null,

"length": 2, "data": [ [ ":a", 1 ], [ "0xc728", "0xc750" ] ]}

Memprof.dump{ { :a => 1, “b” => 2.2 }}

hashes{ "_id": "0x19c598",

"type": "hash", "class": "0x1af170", "class_name": "Hash",

"default": null,

"length": 2, "data": [ [ ":a", 1 ], [ "0xc728", "0xc750" ] ]}

hash entries as key/value pairs

Memprof.dump{ { :a => 1, “b” => 2.2 }}

hashes{ "_id": "0x19c598",

"type": "hash", "class": "0x1af170", "class_name": "Hash",

"default": null,

"length": 2, "data": [ [ ":a", 1 ], [ "0xc728", "0xc750" ] ]}

hash entries as key/value pairs

no default proc

Memprof.dump{ { :a => 1, “b” => 2.2 }}

classesMemprof.dump{ class Hello @@var=1 Const=2 def world() end end}

classesMemprof.dump{ class Hello @@var=1 Const=2 def world() end end}

{ "_id": "0x19c408",

"type": "class", "name": "Hello", "super": "0x1bfa48", "super_name": "Object",

"ivars": { "@@var": 1, "Const": 2 }, "methods": { "world": "0x19c318" }}

classesMemprof.dump{ class Hello @@var=1 Const=2 def world() end end}

{ "_id": "0x19c408",

"type": "class", "name": "Hello", "super": "0x1bfa48", "super_name": "Object",

"ivars": { "@@var": 1, "Const": 2 }, "methods": { "world": "0x19c318" }}

superclass object reference

classesMemprof.dump{ class Hello @@var=1 Const=2 def world() end end}

{ "_id": "0x19c408",

"type": "class", "name": "Hello", "super": "0x1bfa48", "super_name": "Object",

"ivars": { "@@var": 1, "Const": 2 }, "methods": { "world": "0x19c318" }}

class variables and constants are stored in the instance variable table

superclass object reference

classesMemprof.dump{ class Hello @@var=1 Const=2 def world() end end}

{ "_id": "0x19c408",

"type": "class", "name": "Hello", "super": "0x1bfa48", "super_name": "Object",

"ivars": { "@@var": 1, "Const": 2 }, "methods": { "world": "0x19c318" }}

class variables and constants are stored in the instance variable table

superclass object reference

references to method objects

version 2: analyze data

version 2: memprof.coma web-based heap visualizer and leak analyzer

built on...

built on...

$ mongoimport -d memprof -c rails --file /tmp/app.json$ mongo memprof

built on...

$ mongoimport -d memprof -c rails --file /tmp/app.json$ mongo memprof

let’s run some queries.

how many objects?> db.rails.count()809816

• ruby scripts create a lot of objects

• usually not a problem, but...

• MRI has a naïve stop-the-world mark/sweep GC

• fewer objects = faster GC = better performance

what types of objects?> db.rails.distinct(‘type’)

[‘array’, ‘bignum’, ‘class’, ‘float’, ‘hash’, ‘module’, ‘node’, ‘object’, ‘regexp’, ‘string’, ...]

mongodb: distinct

mongodb: distinct• distinct(‘type’)

list of types of objects

mongodb: distinct• distinct(‘type’)

list of types of objects

• distinct(‘file’)list of source files

mongodb: distinct• distinct(‘type’)

list of types of objects

• distinct(‘file’)list of source files

• distinct(‘class_name’)list of instance class names

mongodb: distinct• distinct(‘type’)

list of types of objects

• distinct(‘file’)list of source files

• distinct(‘class_name’)list of instance class names

• optionally filter first

• distinct(‘name’, {type:“class”})names of all defined classes

improve performancewith indexes

> db.rails.ensureIndex({‘type’:1})

improve performancewith indexes

> db.rails.ensureIndex({‘type’:1})

> db.rails.ensureIndex( {‘file’:1}, {background:true})

mongodb: ensureIndex

• add an index on a field (if it doesn’t exist yet)

• improve performance of queries against common fields: type, class_name, super, file

mongodb: ensureIndex

• add an index on a field (if it doesn’t exist yet)

• improve performance of queries against common fields: type, class_name, super, file

• can index embedded field names

• ensureIndex(‘methods.add’)

• find({‘methods.add’:{$exists:true}})find classes that define the method add

> db.rails.group({ initial: {count:0}, key: {type:true}, cond: {}, reduce: function(obj, out) { out.count++ }}).sort(function(a,b){ return a.count - b.count})

how many objs per type?

group on type

> db.rails.group({ initial: {count:0}, key: {type:true}, cond: {}, reduce: function(obj, out) { out.count++ }}).sort(function(a,b){ return a.count - b.count})

how many objs per type?

group on type

increment countfor each obj

> db.rails.group({ initial: {count:0}, key: {type:true}, cond: {}, reduce: function(obj, out) { out.count++ }}).sort(function(a,b){ return a.count - b.count})

how many objs per type?

group on type

increment countfor each obj

sort results

[ ..., {type: ‘array’, count: 7621}, {type: ‘string’, count: 69139}, {type: ‘node’, count: 365285}]

how many objs per type?

[ ..., {type: ‘array’, count: 7621}, {type: ‘string’, count: 69139}, {type: ‘node’, count: 365285}]

lots of nodes

how many objs per type?

[ ..., {type: ‘array’, count: 7621}, {type: ‘string’, count: 69139}, {type: ‘node’, count: 365285}]

• nodes represent ruby code

• stored like any other ruby object

• makes ruby completely dynamic

lots of nodes

how many objs per type?

mongodb: group

mongodb: group

• cond: query to filter objects before grouping

mongodb: group

• cond: query to filter objects before grouping

• key: field(s) to group on

mongodb: group

• cond: query to filter objects before grouping

• key: field(s) to group on

• initial: initial values for each group’s results

mongodb: group

• cond: query to filter objects before grouping

• key: field(s) to group on

• initial: initial values for each group’s results

• reduce: aggregation function

mongodb: group

mongodb: group• by type or class

• key: {type:1}• key: {class_name:1}

mongodb: group• by type or class

• key: {type:1}• key: {class_name:1}

• by file & line• key: {file:1, line:1}

mongodb: group• by type or class

• key: {type:1}• key: {class_name:1}

• by file & line• key: {file:1, line:1}

• by type in a specific file• cond: {file: “app.rb”},

key: {file:1, line:1}

mongodb: group• by type or class

• key: {type:1}• key: {class_name:1}

• by file & line• key: {file:1, line:1}

• by type in a specific file• cond: {file: “app.rb”},

key: {file:1, line:1}

• by length of strings in a specific file• cond: {file:“app.rb”,type:‘string’},

key: {length:1}

what subclasses String?> db.rails.find( {super_name:"String"}, {name:1})

{name: "ActiveSupport::SafeBuffer"}{name: "ActiveSupport::StringInquirer"}{name: "SQLite3::Blob"}{name: "ActiveModel::Name"}{name: "Arel::Attribute::Expressions"}{name: "ActiveSupport::JSON::Variable"}

what subclasses String?> db.rails.find( {super_name:"String"}, {name:1})

{name: "ActiveSupport::SafeBuffer"}{name: "ActiveSupport::StringInquirer"}{name: "SQLite3::Blob"}{name: "ActiveModel::Name"}{name: "Arel::Attribute::Expressions"}{name: "ActiveSupport::JSON::Variable"}

select only name field

mongodb: find

mongodb: find

• find({type:‘string’})all strings

mongodb: find

• find({type:‘string’})all strings

• find({type:{$ne:‘string’}})everything except strings

mongodb: find

• find({type:‘string’})all strings

• find({type:{$ne:‘string’}})everything except strings

• find({type:‘string’}, {data:1})only select string’s data field

the largest objects?> db.rails.find( {type: {$in:['string','array','hash']} }, {type:1,length:1}).sort({length:-1}).limit(3) {type: "string", length: 2308}{type: "string", length: 1454}{type: "string", length: 1238}

mongodb: sort, limit/skip

mongodb: sort, limit/skip

• sort({length:-1,file:1})sort by length desc, file asc

mongodb: sort, limit/skip

• sort({length:-1,file:1})sort by length desc, file asc

• limit(10)first 10 results

mongodb: sort, limit/skip

• sort({length:-1,file:1})sort by length desc, file asc

• limit(10)first 10 results

• skip(10).limit(10)second 10 results

when were objs created?• useful to look at objects over time

• each obj has a timestamp of when it was created

when were objs created?• useful to look at objects over time

• each obj has a timestamp of when it was created

• find minimum time, call it start_time

when were objs created?• useful to look at objects over time

• each obj has a timestamp of when it was created

• find minimum time, call it start_time

• create buckets for every minute of execution sincestart

when were objs created?• useful to look at objects over time

• each obj has a timestamp of when it was created

• find minimum time, call it start_time

• create buckets for every minute of execution sincestart

• place objects into buckets

when were objs created?> db.rails.mapReduce(function(){ var secs = this.time - start_time; var mins_since_start = secs % 60; emit(mins_since_start, 1); }, function(key, vals){ for(var i=0,sum=0; i<vals.length; sum += vals[i++]); return sum; }, { scope: { start_time: db.rails.find().sort({time:1}).limit(1)[0].time } }){result:"tmp.mr_1272615772_3"}

start_time = min(time)

mongodb: mapReduce• arguments

•map: function that emits one or more key/value pairs given each object this

• reduce: function to return aggregate result, given key and list of values

• scope: global variables to set for funcs

mongodb: mapReduce• arguments

•map: function that emits one or more key/value pairs given each object this

• reduce: function to return aggregate result, given key and list of values

• scope: global variables to set for funcs

• results

• stored in a temporary collection(tmp.mr_1272615772_3)

when were objs created?> db.tmp.mr_1272615772_3.count()12

script was running for 12 minutes

when were objs created?> db.tmp.mr_1272615772_3.count()12

script was running for 12 minutes

> db.tmp.mr_1272615772_3.find().sort({value:-1}).limit(1){_id: 8, value: 41231}

41k objects created 8 minutes after start

references to this object?ary = [“a”,”b”,”c”]

ary references “a”“b” referenced by ary

• ruby makes it easy to “leak” references

• an object will stay around until all references to it are gone

• more objects = longer GC = bad performance

• must find references to fix leaks

references to this object?• db.rails_refs.insert({

_id:"0xary", refs:["0xa","0xb","0xc"]})create references lookup table

references to this object?• db.rails_refs.insert({

_id:"0xary", refs:["0xa","0xb","0xc"]})create references lookup table

• db.rails_refs.ensureIndex({refs:1})add ‘multikey’ index to refs array

references to this object?• db.rails_refs.insert({

_id:"0xary", refs:["0xa","0xb","0xc"]})create references lookup table

• db.rails_refs.ensureIndex({refs:1})add ‘multikey’ index to refs array

• db.rails_refs.find({refs:“0xa”})efficiently lookup all objs holding a ref to 0xa

mongodb: multikeys

• indexes on array values create a ‘multikey’ index

• classic example: nested array of tags

• find({tags: “ruby”})find objs where obj.tags includes “ruby”

version 2: memprof.coma web-based heap visualizer and leak analyzer

memprof.coma web-based heap visualizer and leak analyzer

memprof.coma web-based heap visualizer and leak analyzer

memprof.coma web-based heap visualizer and leak analyzer

memprof.coma web-based heap visualizer and leak analyzer

memprof.coma web-based heap visualizer and leak analyzer

memprof.coma web-based heap visualizer and leak analyzer

memprof.coma web-based heap visualizer and leak analyzer

memprof.coma web-based heap visualizer and leak analyzer

memprof.coma web-based heap visualizer and leak analyzer

plugging a leak in rails3• in dev mode, rails3 is leaking 10mb per request

plugging a leak in rails3• in dev mode, rails3 is leaking 10mb per request

# in environment.rbrequire `gem which memprof/signal`.strip

let’s use memprof to find it!

plugging a leak in rails3

send the app some requests so it leaks

$ ab -c 1 -n 30 http://localhost:3000/

plugging a leak in rails3

tell memprof to dump out the entire heap to json

$ memprof --pid <pid> --name <dump name> --key <api key>

send the app some requests so it leaks

$ ab -c 1 -n 30 http://localhost:3000/

plugging a leak in rails3

tell memprof to dump out the entire heap to json

$ memprof --pid <pid> --name <dump name> --key <api key>

send the app some requests so it leaks

$ ab -c 1 -n 30 http://localhost:3000/

2519 classes

2519 classes

30 copies of TestController

2519 classes

30 copies of TestController

2519 classes

30 copies of TestController

mongo query for all TestController classes

2519 classes

30 copies of TestController

mongo query for all TestController classes

details for one copy of TestController

find references to object

find references to object

find references to object

holding references to all controllers

find references to object

holding references to all controllers

“leak” is on line 178

• In development mode, Rails reloads all your application code on every request

• In development mode, Rails reloads all your application code on every request

• ActionView::Partials::PartialRenderer is caching partials used by each controller as an optimization

• In development mode, Rails reloads all your application code on every request

• ActionView::Partials::PartialRenderer is caching partials used by each controller as an optimization

• But.. it ends up holding a reference to every single reloaded version of those controllers

• In development mode, Rails reloads all your application code on every request

• ActionView::Partials::PartialRenderer is caching partials used by each controller as an optimization

• But.. it ends up holding a reference to every single reloaded version of those controllers

Questions?

Aman Gupta@tmm1