8.5 The copy Module

As discussed in Chapter 4, assignment in Python does not copy the right-hand side object being assigned. Rather, assignment adds a reference to the right-hand side object. When you want a copy of object x, you can ask x for a copy of itself. If x is a list, x[:] is a copy of x. If x is a dictionary, x.copy( ) returns a copy of x.

The copy module supplies a copy function that creates and returns a copy of most types of objects. Normal copies, such as x[:] for a list x and copy.copy(x), are also known as shallow copies. When x has references to other objects (e.g., items or attributes), a normal copy of x has distinct references to the same objects. Sometimes, however, you need a deep copy, where referenced objects are copied recursively. Module copy supplies a deepcopy(x) function that performs a deep copy and returns it as the function's result.



Creates and returns a copy of x for x of most types (copies of modules, classes, frames, arrays, and internal types are not supported). If x is immutable, copy.copy(x) may return x itself as an optimization. A class can customize the way copy.copy copies its instances by having a special method _ _copy_ _(self) that returns a new object, a copy of self.



Makes a deep copy of x and returns it. Deep copying implies a recursive walk over a directed graph of references. A precaution is needed to preserve the graph's shape: when references to the same object are met more than once during the walk, distinct copies must not be made. Rather, references to the same copied object must be used. Consider the following simple example:

sublist = [1,2]
original = [sublist, sublist]
thecopy = copy.deepcopy(original)

original[0] is original[1] is True (i.e., the two items of list original refer to the same object). This is an important property of original and therefore must be preserved in anything that claims to be a copy of it. The semantics of copy.deepcopy are defined to ensure that thecopy[0] is thecopy[1] is also True in this case. In other words, the shapes of the graphs of references of original and thecopy are the same. Avoiding repeated copying has an important beneficial side effect: preventing infinite loops that would otherwise occur if the graph has cycles.

copy.deepcopy accepts a second, optional argument memo, which is a dictionary that maps the id( ) of objects already copied to the new objects that are their copies. memo is passed by recursive calls of deepcopy to itself, but you may also explicitly pass it (normally as an originally empty dictionary) if you need to keep such a correspondence map between the identities of originals and copies of objects.

A class can customize the way copy.deepcopy copies its instances by having a special method _ _deepcopy_ _(self,memo) that returns a new object, a deep copy of self. When _ _deepcopy_ _ needs to deep copy some referenced object subobject, it must do so by calling copy.deepcopy(subobject,memo). When a class has no special method _ _deepcopy_ _, copy.deepcopy on an instance of that class tries to call special methods _ _getinitargs_ _, _ _getstate_ _, and _ _setstate_ _, which are covered in Section

    Part III: Python Library and Extension Modules