13.4 Garbage Collection

Python's garbage collection normally proceeds transparently and automatically, but you can choose to exert some direct control. The general principle is that Python collects each object x at some time after x becomes unreachable, that is, when no chain of references can reach x by starting from a local variable of a function that is executing, nor from a global variable of a loaded module. Normally, an object x becomes unreachable when there are no references at all to x. However, a group of objects can also be unreachable when they reference each other.

Classic Python keeps in each object x a count, known as a reference count, of how many references to x are outstanding. When x's reference count drops to 0, CPython immediately collects x. Function getrefcount of module sys accepts any object and returns its reference count (at least 1, since getrefcount itself has a reference to the object it's examining). Other versions of Python, such as Jython, rely on different garbage collection mechanisms, supplied by the platform they run on (e.g., the JVM). Modules gc and weakref therefore apply only to CPython.

When Python garbage-collects x and there are no references at all to x, Python then finalizes x (i.e., calls x._ _del_ _( )) and makes the memory that x occupied available for other uses. If x held any references to other objects, Python removes the references, which in turn may make other objects collectable by leaving them unreachable.

13.4.1 The gc Module

The gc module exposes the functionality of Python's garbage collector. gc deals only with objects that are unreachable in a subtle way, being part of mutual reference loops. In such a loop, each object in the loop refers to others, keeping the reference counts of all objects positive. However, an outside reference no longer exists to the whole set of mutually referencing objects. Therefore, the whole group, also known as cyclic garbage, is unreachable, and therefore garbage collectable. Looking for such cyclic garbage loops takes time, which is why module gc exists.

gc exposes functions you can use to help you keep garbage collection times under control. These functions can sometimes help you track down a memory leakobjects that are not getting collected even though there should be no more references to themby letting you discover what other objects are in fact holding on to references to them.

collect

collect(  )

Forces a full cyclic collection run to happen immediately.

disable

disable(  )

Suspends automatic garbage collection.

enable

enable(  )

Re-enables automatic garbage collection previously suspended with disable.

garbage

A read-only attribute that lists the uncollectable but unreachable objects. This happens if any object in a cyclic garbage loop has a _ _del_ _ special method, as there may be no safe order in which Python can finalize such objects.

get_debug

get_debug(  )

Returns an integer, a bit string corresponding to the garbage collection debug flags set with set_debug.

get_objects New as of Python 2.2

get_objects(  )

Returns a list whose items are all the objects currently tracked by the cyclic garbage collector.

get_referrers

get_referrers(*objs)

Returns a list whose items are all the container objects, currently tracked by the cyclic garbage collector, that refer to any one or more of the arguments.

get_threshold

get_threshold(  )

Returns a three-item tuple (thresh0, thresh1, thresh2) corresponding to the garbage collection thresholds set with set_threshold.

isenabled

isenabled(  )

Returns True if cyclic garbage collection is currently enabled. When collection is currently disabled, isenabled returns False.

set_debug

set_debug(flags)

Sets the debugging flags for garbage collection. flags is an integer, a bit string composed by ORing (with Python's normal bitwise-OR operator |) zero or more of the following constants exposed by module gc:

DEBUG_COLLECTABLE

Prints information on collectable objects found during collection

DEBUG_INSTANCES

Meaningful only if DEBUG_COLLECTABLE and/or DEBUG_UNCOLLECTABLE are also set: prints information on objects found during collection that are instances of classic Python classes

DEBUG_LEAK

The set of debugging flags that make the garbage collector print all information that can help you diagnose memory leaks, equivalent to the inclusive-OR of all other constants (except DEBUG_STATS, which serves a different purpose)

DEBUG_OBJECTS

Meaningful only if DEBUG_COLLECTABLE and/or DEBUG_UNCOLLECTABLE are also set: prints information on objects found during collection that are not instances of classic Python classes

DEBUG_SAVEALL

Saves all collectable objects to list garbage (uncollectable ones are always saved there) to help diagnose leaks

DEBUG_STATS

Prints statistics during collection to help tune the thresholds

DEBUG_UNCOLLECTABLE

Prints information on uncollectable objects found during collection

set_threshold

set_threshold(thresh0[,thresh1[,thresh2]])

Sets the thresholds that control how frequently cyclic garbage collection cycles run. If you set thresh0 to 0, garbage collection is disabled. Garbage collection is an advanced topic, and the details of the generational garbage collection approach used in Python and its thresholds are beyond the scope of this book.

When you know you have no cyclic garbage loops in your program, or when you can't afford the delay of a cyclic garbage collection run at some crucial time, you can suspend automatic garbage collection by calling gc.disable( ). You can enable collection again later by calling gc.enable( ). You can test whether automatic collection is currently enabled by calling gc.isenabled( ), which returns True or False. To control when the time needed for collection is spent, you can call gc.collect( ) to force a full cyclic collection run to happen immediately. An idiom for wrapping some time-critical code is therefore:

import gc
gc_was_enabled = gc.isenabled(  )
if gc_was_enabled:
    gc.collect(  )
    gc.disable(  )
# insert some time-critical code here
if gc_was_enabled:
    gc.enable(  )

The other functionality in module gc is more advanced and rarely used, and can be grouped into two areas. Functions get_threshold and set_threshold and the debug flag DEBUG_STATS can help you fine-tune garbage collection to optimize your program's performance. The rest of gc's functionality is there to help you diagnose memory leaks in your program. While gc itself can automatically fix many such leaks, your program will be faster if it can avoid creating them in the first place.

13.4.2 The weakref Module

Careful design can often avoid reference loops. However, at times you need certain objects to know about each other, and avoiding mutual references would distort and complicate design. For example, a container has references to its items, yet it can often be useful for an object to know about some main container that holds it. The result is a reference loop: due to the mutual references, the container and items keep each other alive, even when all other objects forget about them. Weak references solve this problem by letting you have objects that mutually reference each other as long as both are alive, but do not keep each other alive.

A weak reference is a special object w that refers to some other object x without incrementing x's reference count. When x's reference count goes down to 0, Python finalizes and collects x, then informs w of x's demise. The weak reference w can now either disappear or become invalid in a controlled way. At any time, a given weak reference w refers to either the same target object x as when w was created, or to nothing at all: a weak reference is never re-targeted. Not all types of objects support being the target x of a weak reference w, but class instances and functions do.

Module weakref exposes functions and types to let you create and manage weak references.

getweakrefcount

getweakrefcount(x)

Returns len(getweakrefs(x)).

getweakrefs

getweakrefs(x)

Returns a list of all weak references and proxies whose target is x.

proxy

proxy(x[,f])

Returns a weak proxy p of type ProxyType (CallableProxyType, if x is callable), with object x as the target. In most contexts, using p is just like using x, except that if you use p after x has been deleted, Python raises ReferenceError. p is never hashable (therefore you cannot use p as a dictionary key), even when x is. If f is present, it must be callable with one argument, and is the finalization callback for p (i.e., right before finalizing x, Python calls f(p)). Note that when f is called, x is no longer reachable from p.

ref

ref(x[,f])

Returns a weak reference w of type ReferenceType, with object x as the target. w is callable: calling w( ) returns x if x is still alive, otherwise w( ) returns None. w is hashable if x is hashable. You can compare weak references for equality (= =, !=), but not for order (<, >, <=, >=). Two weak references x and y are equal if their targets are alive and equal, or if x is y. If f is present, it must be callable with one argument, and is the finalization callback for w (i.e., right before finalizing x, Python calls f(w)). Note that when f is called, x is no longer reachable from w.

WeakKeyDictionary

class WeakKeyDictionary(adict={  })

A WeakKeyDictionary d is a mapping that references its keys weakly. When the reference count of a key k in d goes to 0, item d[k] disappears. adict is used to initialize the mapping.

WeakValueDictionary

class WeakValueDictionary(adict={  })

A WeakValueDictionary d is a mapping that references its values weakly. When the reference count of a value v in d goes to 0, all items of d such that d[k] is v disappear. adict is used to initialize the mapping.

WeakKeyDictionary and WeakValueDictionary are useful when you need to non-invasively associate additional data with objects without changing the objects. Weak mappings are also useful to non-invasively record transient associations between objects and to build caches. In each case, the specific consideration that can make a weak mapping preferable to a normal dictionary is that an object that is otherwise garbage-collectable is not kept alive just by being used in a weak mapping.

A typical use could be a class that keeps track of its instances, but does not keep them alive just in order to keep track of them:

import weakref
class Tracking:
    _instances_dict = weakref.WeakValueDictionary(  )
    _num_generated = 0
    def _ _init_ _(self):
        Tracking._num_generated += 1
        Tracking._instances_dict[Tracking._num_generated] = self
    def instances(  ): return _instances_dict.values(  )
    instances = staticmethod(instances)


    Part III: Python Library and Extension Modules