Python GC

22
Python GC Dmitry Alimov Software Developer Zodiac Interactive 2014

description

Slides for presentation about Python GC (Garbage Collector) and memory management in Python (CPython version 2.7)

Transcript of Python GC

Page 1: Python GC

Python GC

Dmitry AlimovSoftware Developer

Zodiac Interactive

2014

Page 2: Python GC

Garbage collection

The garbage collector, or just collector, attempts to reclaim garbage, or memory occupied by objects that are no longer in use by the program.

Was invented by John McCarthy around 1959 to solve problems in Lisp.

Used in Lisp, Smalltalk, Python, Java, Ruby, Perl, C#, D, Haskell, Schema, Objective-C, etc.

Basic algorithms:- Reference counting- Mark-and-sweep- Mark-and-compact- Copying collector- Generational collector

Page 3: Python GC

Memory in Python

Page 4: Python GC

PyMem_Malloc(), PyMem_Realloc(), PyMem_Free()PyMem_New(), PyMem_Resize(), PyMem_Del()

Memory Management

Other languages have "variables“, Python has "names" or "identifiers".

Everything is an object

>>> b = a>>> a = 2>>> a = 1

Memory management involves a private heap containing all objects and data structures.

Page 5: Python GC

sys.getsizeof(object[, default])

>>> import sys>>> a = 123>>> sys.getsizeof(a)24 # 64-bit version

Return the size of an object in bytes (without GC overhead).

__sizeof__()

>>> a.__sizeof__()24 # 64-bit version

sys.getsizeof and __sizeof__

Return the size of an object in bytes. The object can be any type of object.getsizeof() calls the object’s __sizeof__ method and adds an additional garbage collector overhead if the object is managed by the garbage collector.

>>> sys.getsizeof(tuple((1, 2, 3)))72>>> tuple((1, 2, 3)).__sizeof__()48

Page 6: Python GC

id(object)

>>> a = 123>>> id(a)30522672L

This function returns the string starting at memory address address.

ctypes.string_at(address[, size])

>>> ctypes.string_at(id(a), 24)'\x06\x00\x00\x00\x00\x00\x00\x00\xc0G)\x1e\x00\x00\x00\x00{\x00\x00\x00\x00\x00\x00\x00'>>> struct.unpack('QQQ', ctypes.string_at(id(a), 24))(6, 506021824, 123)

id and ctypes.string_at

Return the “identity” of an object. This is an integer (or long integer) which is guaranteed to be unique and constant for this object during its lifetime.CPython implementation detail: This is the address of the object in memory.

Page 7: Python GC

>>> sys.getrefcount(a)8>>> struct.unpack('QQQ', ctypes.string_at(id(a), 24))(6, 506021824, 123)>>> type(a)<type 'int'>>>> id(type(a))506021824L>>> a123>>> ctypes.c_long.from_address(id(a))c_long(6)

Return the reference count of the object. The count returned is generally one higher than you might expect, because it includes the (temporary) reference as an argument to getrefcount().

sys.getrefcount(object)

Unpack the string (presumably packed by pack(fmt, ...)) according to the given format.

struct.unpack(fmt, string)

Q | unsigned long long | integer type | 8 bytes

Page 8: Python GC

>>> struct.unpack('QQQ', ctypes.string_at(id(a), 24))(6, 506021824, 123)

C code:

typedef struct { PyObject_HEAD long ob_ival; } PyIntObject;

#define PyObject_HEAD \ _PyObject_HEAD_EXTRA \ Py_ssize_t ob_refcnt; \ struct _typeobject *ob_type;

#define _PyObject_HEAD_EXTRA \ struct _object *_ob_next; \ struct _object *_ob_prev;

Page 9: Python GC

Garbage Collector in Python

Page 10: Python GC

First garbage collection algorithm is known as reference counting. It was invented by George Collins in 1960.

Reference Counting

Py_INCREF/Py_DECREF

If something decref'ed to 0, it should have been deallocated immediately at that time.

Page 11: Python GC
Page 12: Python GC

GC methods

gc.get_referrers(*objs)

Return the list of objects that directly refer to any of objs.

gc.get_referents(*objs)

Return a list of objects directly referred to by any of the arguments.

Page 13: Python GC

Cyclic references

Page 14: Python GC

Generational algorithm of GC

3 Generations with thresholds:- generation 0 (youngest): 700 - generation 1 (middle): 10 - generation 2 (oldest): 10

>>> import gc>>> gc.get_threshold()(700, 10, 10)

To limit the cost of garbage collection, there are two strategies:- make each collection faster, e.g. by scanning fewer objects- do less collections

Except objects with a __del__ method! -> gc.garbage

Full collection if the ratio: long_lived_pending / long_lived_total > 25% (Python 2.7+)

Page 15: Python GC

Py_TPFLAGS_HAVE_GC flag

>>> Py_TPFLAGS_HAVE_GC = 1 << 14>>> bool(type(1).__flags__ & Py_TPFLAGS_HAVE_GC)False>>> bool(type([]).__flags__ & Py_TPFLAGS_HAVE_GC)True

TYPE* PyObject_GC_New(TYPE, PyTypeObject *type)TYPE* PyObject_GC_NewVar(TYPE, PyTypeObject *type, Py_ssize_t size)The Py_TPFLAGS_HAVE_GC flag is set.

Need provide an implementation of the tp_traverse handler.

/* Adds op to the set of container objects tracked by GC */void PyObject_GC_Track(PyObject *op)

Object types which are “containers” for other objects

C API:

Page 16: Python GC

Generation 0

Generation 0Linked list

Generation 0

Page 17: Python GC

Generation 0

Generation 1

Page 18: Python GC

Weak References

>>> import weakref>>> class A(object): pass>>> a = A()>>> b = weakref.ref(a)>>> weakref.getweakrefcount(a)1>>> p = weakref.proxy(a)>>> b()<__main__.A object at 0x0000000001EE64A8>>>> del a>>> b()None>>> b<weakref at 0000000001E8C408; dead>>>> p<weakproxy at 0000000001EAC458 to NoneType at 00000001E297348>

Weak reference is a reference that does not protect the referenced object from collection by a garbage collector, unlike a strong reference.

Page 19: Python GC

Debug

gc.DEBUG_*

gc.set_debug(gc.DEBUG_LEAK)

Heapy (http://guppy-pe.sourceforge.net/)

Memory profiler (https://pypi.python.org/pypi/memory_profiler)

Python Object Graphs (http://mg.pov.lt/objgraph/)

gdb-heap (https://fedorahosted.org/gdb-heap/)

Page 20: Python GC

Thank you

Page 21: Python GC

http://en.wikipedia.org/wiki/Garbage_collection_(computer_science)http://docs.python.org/2/library/gc.htmlhttp://svn.python.org/view/python/trunk/Modules/gcmodule.c?revision=81029http://patshaughnessy.net/2013/10/30/generational-gc-in-python-and-rubyhttp://asvetlov.blogspot.ru/2008/11/blog-post.htmlhttp://habrahabr.ru/post/193890/http://python.net/~goodger/projects/pycon/2007/idiomatic/handout.htmlhttp://foobarnbaz.com/2012/07/08/understanding-python-variables/http://habrahabr.ru/company/wargaming/blog/198140/http://en.wikipedia.org/wiki/Weak_reference

References

Page 22: Python GC

Q & A

@delimitry