7.2 Module Loading

Module-loading operations rely on attributes of the built-in sys module (covered in Chapter 8). The module-loading process described here is carried out by built-in function _ _import_ _. Your code can call _ _import_ _ directly, with the module name string as an argument. _ _import_ _ returns the module object or raises ImportError if the import fails.

To import a module named M, _ _import_ _ first checks dictionary sys.modules, using string M as the key. When key M is in the dictionary, _ _import_ _ returns the corresponding value as the requested module object. Otherwise, _ _import_ _ binds sys.modules[M] to a new empty module object with a _ _name_ _ of M, then looks for the right way to initialize (load) the module, as covered in Section 7.2.2 later in this section.

Thanks to this mechanism, the loading operation takes place only the first time a module is imported in a given run of the program. When a module is imported again, the module is not reloaded, since _ _import_ _ finds and returns the module's entry in sys.modules. Thus, all imports of a module after the first one are extremely fast because they're just dictionary lookups.

7.2.1 Built-in Modules

When a module is loaded, _ _import_ _ first checks whether the module is built-in. Built-in modules are listed in tuple sys.builtin_module_names, but rebinding that tuple does not affect module loading. A built-in module, like any other Python extension, is initialized by calling the module's initialization function. The search for built-in modules also finds frozen modules and modules in platform-specific locations (e.g., resources on the Mac, the Registry in Windows).

7.2.2 Searching the Filesystem for a Module

If module M is not built-in or frozen, _ _import_ _ looks for M's code as a file on the filesystem. _ _import_ _ looks in the directories whose names are the items of list sys.path, in order. sys.path is initialized at program startup, using environment variable PYTHONPATH (covered in Chapter 3) if present. The first item in sys.path is always the directory from which the main program (script) is loaded. An empty string in sys.path indicates the current directory.

Your code can mutate or rebind sys.path, and such changes affect what directories _ _import_ _ searches to load modules. Changing sys.path does not affect modules that are already loaded (and thus already listed in sys.modules) when sys.path is changed.

If a text file with extension .pth is found in the PYTHONHOME directory at startup, its contents are added to sys.path, one item per line. .pth files can also contain blank lines and comment lines starting with the character #, as Python ignores any such lines. .pth files can also contain import statements, which Python executes, but no other kinds of statements.

When looking for the file for module M in each directory along sys.path, Python considers the following extensions in the order listed:

  1. .pyd and .dll (Windows) or .so (most Unix-like platforms), which indicate Python extension modules. (Some Unix dialects use different extensions; e.g., .sl is the extension used on HP-UX.)

  2. .py, which indicates pure Python source modules.

  3. .pyc (or .pyo, if Python is run with option -O), which indicates bytecode-compiled Python modules.

Upon finding source file M.py, Python compiles it to M.pyc (or M.pyo) unless the bytecode file is already present, is newer than M.py, and was compiled by the same version of Python. Python saves the bytecode file to the filesystem in the same directory as M.py (if permissions on the directory allow writing) so that future runs will not needlessly recompile. When the bytecode file is newer than the source file, Python does not recompile the module.

Once Python has the bytecode file, either from having constructed it by compilation or by reading it from the filesystem, Python executes the module body to initialize the module object. If the module is an extension, Python calls the module's initialization function.

7.2.3 The Main Program

Execution of a Python application normally starts with a top-level script (also known as the main program), as explained in Chapter 3. The main program executes like any other module being loaded except that Python keeps the bytecode in memory without saving it to disk. The module name for the main program is always _ _main_ _, both as the _ _name_ _ global variable (module attribute) and as the key in sys.modules. You should not normally import the same .py file that is in use as the main program. If you do, the module is loaded again, and the module body is executed once more from the top in a separate module object with a different _ _name_ _.

Code in a Python module can test whether the module is being used as the main program by checking if global variable _ _name_ _ equals '_ _main_ _'. The idiom:

if _ _name_ _=  ='_ _main_ _':

is often used to guard some code so that it executes only when the module is run as the main program. If a module is designed only to be imported, it should normally execute unit tests when it is run as the main program, as covered in Chapter 17.

7.2.4 The reload Function

As I explained earlier, Python loads a module only the first time you import the module during a program run. When you develop interactively, you need to make sure that your modules are reloaded each time you edit them (some development environments provide automatic reloading).

To reload a module, pass the module object (not the module name) as the only argument to built-in function reload. reload(M) ensures the reloaded version of M is used by client code that relies on import M and accesses attributes with the syntax M.A. However, reload(M) has no effect on other references bound to previous values of M's attributes (e.g., with the from statement). In other words, already-bound variables remain bound as they were, unaffected by reload. reload's inability to rebind such variables is a further incentive to avoid from.

7.2.5 Circular Imports

Python lets you specify circular imports. For example, you can write a module a.py that contains import b, while module b.py contains import a. In practice, you are typically better off avoiding circular imports, since circular dependencies are fragile and hard to manage. If you decide to use a circular import for some reason, you need to understand how circular imports work in order to avoid errors in your code.

Say that the main script executes import a. As discussed earlier, this import statement creates a new empty module object as sys.modules['a'], and then the body of module a starts executing. When a executes import b, this creates a new empty module object as sys.modules['b'], and then the body of module b starts executing. The execution of a's module body is now suspended until b's module body finishes.

Now, when b executes import a, the import statement finds sys.modules['a'] already defined and therefore binds global variable a in module b to the module object for module a. Since the execution of a's module body is currently suspended, module a may be only partly populated at this time. If the code in b's module body tries to access some attribute of module a that is not yet bound, an error results.

If you do insist on keeping a circular import in some case, you must carefully manage the order in which each module defines its own globals, imports the other module, and accesses the globals of the other module. Generally, you can have greater control on the sequence in which things happen by grouping your statements into functions and calling those functions in a controlled order, rather than just relying on sequential execution of top-level statements in module bodies. However, removing circular dependencies is almost always easier than ensuring bomb-proof ordering while keeping such circular dependencies.

7.2.6 sys.modules Entries

The built-in _ _import_ _ function never binds anything other than a module object as a value in sys.modules. However, if _ _import_ _ finds an entry that is already in sys.modules, it will try to use that value, whatever type of object it may be. The import and from statements rely on the _ _import_ _ function, so therefore they too can end up using objects that are not modules. This lets you set class instances as entries in sys.modules, and take advantage of features such as their _ _getattr_ _ and _ _setattr_ _ special methods, covered in Chapter 5. This advanced technique lets you import module-like objects whose attributes can in fact be computed on the fly. Here's a trivial toy-like example:

class TT:
    def _ _getattr_ _(self, name): return 23
import sys
sys.modules[_ _name_ _] = TT(  )

When you import this code as a module, you get a module-like object that appears to have any attribute name you try to get from it, and all attribute names correspond to the integer value 23.

7.2.7 Custom Importers

You can rebind the _ _import_ _ attribute of module _ _builtin_ _ to your own custom importer function by wrapping the _ _import_ _ function using the technique shown earlier in this chapter. Such rebinding influences all import and from statements that execute after the rebinding. A custom importer must implement the same interface as the built-in _ _import_ _, and is often implemented with some help from the functions exposed by built-in module imp. Custom importer functions are an advanced and rarely used technique.



    Part III: Python Library and Extension Modules