The distutils are a rich and flexible set of tools to package Python programs and extensions for distribution to third parties. I cover typical, simple use of the distutils for the most common packaging needs. For in-depth, highly detailed discussion of distutils, I recommend two manuals that are part of Python's online documentation: Distributing Python Modules (available at http://www.python.org/doc/current/dist/), and Installing Python Modules (available at http://www.python.org/doc/current/inst/), both by Greg Ward, the principal author of the distutils.
A distribution is the set of files to package into a single file for distribution purposes. A di stribution may include zero, one, or more Python packages and other Python modules (as covered in Chapter 7), as well as, optionally, Python scripts, C-coded (and other) extensions, supporting data files, and auxiliary files containing metadata about the distribution itself. A distribution is said to be pure if all code it includes is Python, and non-pure if it also includes non-Python code (most often, C-coded extensions).
You should normally place all the files of a distribution in a directory, known as the distribution root directory, and in subdirectories of the distribution root. Mostly, you can arrange the subtree of files and directories rooted at the distribution root to suit your own organizational needs. However, remember from Chapter 7 that a Python package must reside in its own directory, and a package's directory must contain a file named _ _init_ _.py (or subdirectories with _ _init_ _.py files, for subpackages) as well as other modules belonging to that package.
The distribution root directory must contain a Python script that by convention is named setup.py. The setup.py script can, in theory, contain arbitrary Python code. However, in practice, setup.py always boils down to some variation of:
from distutils.core import setup, Extension setup( many keyword arguments go here )
All the action is in the parameters you supply in the call to setup. You should not import Extension if your setup.py deals with a pure distribution. Extension is needed only for non-pure distributions, and you should import it only when you need it. It is fine to have a few statements before the call to setup, in order to arrange setup's arguments in clearer and more readable ways than could be managed by having everything inline as part of the setup call.
The distutils.core.setup function accepts only keyword arguments, and there are a large number of such arguments that you could potentially supply. A few deal with the internal operations of the distutils themselves, and you never supply such arguments unless you are extending or debugging the distutils, an advanced subject that I do not cover in this book. Other keyword arguments to setup fall into two groups: metadata about the distribution, and information about what files are in the distribution.
You should provide metadata about the distribution by supplying some of the following keyword arguments when you call the distutils.core.setup function. The value you associate with each argument name you supply is a string that is intended mostly to be human-readable; therefore, any specifications about the string's format are just advisory. The explanations and recommendations about the metadata fields in the following are also non-normative, and correspond only to common, not universal, conventions. Whenever the following explanations refer to "this distribution," it can be taken to refer to the material included in the distribution, rather than to the packaging of the distribution.
The name(s) of the author(s) of material included in the distribution. You should always provide this information, as the authors deserve credit for their work.
Email address(es) of the author(s) named in argument author. You should provide this information only if the author is willing to receive email about this work.
The name of the principal contact person or mailing list for this distribution. You should provide this information if there is somebody who should be contacted in preference to people named in arguments author and maintainer.
Email address of the contact named in argument contact. You should provide this information if and only if you supply the contact argument.
A concise description of this distribution, preferably fitting within one line of 80 characters or less. You should always provide this information.
The full name of this distribution. You should provide this information if the name supplied as argument name is in abbreviated or incomplete form (e.g., an acronym).
A list of keywords that would likely be searched for by somebody looking for the functionality provided by this distribution. You should provide this information if it might be useful to index this distribution in some kind of search engine.
The licensing terms of this distribution, in a concise form that may refer for details to a file in the distribution or to a URL. You should always provide this information.
The name(s) of the current maintainer(s) of this distribution. You should normally provide this information if the maintainer is different from the author.
Email address(es) of the maintainer(s) named in argument maintainer. You should provide this information only if you supply the maintainer argument and if the maintainer is willing to receive email about this work.
The name of this distribution as a valid Python identifier (this often requires abbreviations, e.g., by an acronym). You should always provide this information.
A list of platforms on which this distribution is known to work. You should provide this information if you have reasons to believe this distribution may not work everywhere. This information should be reasonably concise, so this field may refer for details to a file in the distribution or to a URL.
A URL at which more information can be found about this distribution. You should always provide this information if any such URL exists.
The version of this distribution and/or its contents, normally structured as major.minor or even more finely. You should always provide this information.
A distribution can contain a mix of Python source files, C-coded extensions, and other files. setup accepts optional keyword arguments detailing files to put in the distribution. Whenever you specify file paths, the paths must be relative to the distribution root directory and use / as the path separator. distutils adapts location and separator appropriately when it installs the distribution. Note, however, that the keyword arguments packages and py_modules do not list file paths, but rather Python packages and modules respectively. Therefore, in the values of these keyword arguments, use no path separators or file extensions. When you list subpackage names in argument packages, use Python syntax (e.g., top_package.sub_package).
By default, setup looks for Python modules (which you list in the value of the keyword argument py_modules) in the distribution root directory, and for Python packages (which you list in the value of the keyword argument packages) as sub-directories of the distribution root directory. You may specify keyword argument package_dir to change these defaults. However, things are simpler when you locate files according to setup's defaults, so I do not cover package_dir further in this book.
The setup keyword arguments you will most frequently use to detail what Python source files to put in the distribution are the following.
packages |
packages=[ list of package name strings ] |
For each package name string p in the list, setup expects to find a subdirectory p in the distribution root directory, and includes in the distribution the file p/_ _init_ _.py, which must be present, as well as any other file p/*.py (i.e., all the modules of package p). setup does not search for subpackages of p: you must explicitly list all subpackages, as well as top-level packages, in the value of keyword argument packages.
py_modules |
py_modules=[ list of module name strings ] |
For each module name string m in the list, setup expects to find a file m.py in the distribution root directory, and includes m.py in the distribution.
scripts |
scripts=[ list of script file path strings ] |
Scripts are Python source files meant to be run as main programs (generally from the command line). The value of the scripts keyword lists the path strings of these files, complete with .py extension, relative to the distribution root directory.
Each script file should have as its first line a shebang line, that is, a line starting with #! and containing the substring python. When distutils install the scripts included in the distribution, distutils adjust each script's first line to point to the Python interpreter. This is quite useful on many platforms, since the shebang line is used by the platform's shells or by other programs that may run your scripts, such as web servers.
To put data files of any kind in the distribution, supply the following keyword argument.
data_files |
data_files=[ list of pairs (target_directory,[list of files]) ] |
The value of keyword argument data_files is a list of pairs. Each pair's first item is a string and names a target directory (i.e., a directory where distutils places data files when installing the distribution); the second item is the list of file path strings for files to put in the target directory. At installation time, distutils places each target directory as a subdirectory of Python's sys.prefix for a pure distribution, or of Python's sys.exec_prefix for a non-pure distribution. distutils places the given files directly in the respective target directory, never in subdirectories of the target. For example, given the following data_files usage:
data_files = [ ('miscdata', ['conf/config.txt', 'misc/sample.txt']) ]
distutils includes in the distribution the file config.txt from sub-directory conf of the distribution root, and the file sample.txt from subdirectory misc of the distribution root. At installation time, distutils creates a subdirectory named miscdata in Python's sys.prefix directory (or in the sys.exec_prefix directory, if the distribution is non-pure), and copies the two files into miscdata/config.txt and miscdata/sample.txt.
To put C-coded extensions in the distribution, supply the following keyword argument.
ext_modules |
ext_modules=[ list of instances of class Extension ] |
All the details about each extension are supplied as arguments when instantiating the distutils.core.Extension class.
Extension's constructor accepts two mandatory arguments and many optional keyword arguments, as follows.
Extension |
class Extension(name, sources, **kwds) |
name is the module name string for the C-coded extension. name may include dots to indicate that the extension module resides within a package. sources is the list of source files that the distutils must compile and link in order to build the extension. Each item of sources is a string giving a source file's path relative to the distribution root directory, complete with file extension .c. kwds lets you pass other, optional arguments to Extension, as covered later in this section.
The Extension class also supports other file extensions besides .c, indicating other languages you may use to code Python extensions. On platforms having a C++ compiler, file extension .cpp indicates C++ source files. Other file extensions that may be supported, depending on the platform and on add-ons to the distutils that are still in experimental stages at the time of this writing, include .f for Fortran, .i for SWIG, and .pyx for Pyrex files. See Chapter 24 for information about using different languages to extend Python.
In some cases, your extension needs no further information besides mandatory arguments name and sources. The distutils implicitly perform all that is necessary to make the Python headers directory and the Python library available for your extension's compilation and linking, and also provide whatever compiler or linker flags or options are needed to build extensions on a given platform.
When it takes additional information to compile and link your extension correctly, you can supply such information via the keyword arguments of class Extension. Such arguments may potentially interfere with the cross-platform portability of your distribution. In particular, whenever you specify file or directory paths as the values of such arguments, the paths should be relative to the distribution root directoryusing absolute paths seriously impairs your distribution's cross-platform portability.
Portability is not a problem when you just use the distutils as a handy way to build your extension, as suggested in Chapter 24. However, when you plan to distribute your extensions to other platforms, you should examine whether you really need to provide build information via keyword arguments to Extension. It is sometimes possible to bypass such needs by careful coding at the C level, and the already mentioned Distributing Python Modules manual provides important examples.
The keyword arguments that you may pass when calling Extension are the following:
Each of the items macro_name and macro_value, in the pairs listed as the value of define_macros, is a string, respectively the name and value for a C preprocessor macro definition, equivalent in effect to the C preprocessor directive:
#define macro_name macro_value
macro_value can also be None, to get the same effect as the C preprocessor directive:
#define macro_name
Each of the strings compile_arg listed as the value of extra_compile_args is placed among the command-line arguments for each invocation of the C compiler.
Each of the strings link_arg listed as the value of extra_link_args is placed among the command-line arguments for the invocation of the linker.
Each of the strings object_name listed as the value of extra_objects names an object file to add to the invocation of the linker. Do not specify the file extension as part of the object name: distutils adds the platform-appropriate file extension (such as .o on Unix-like platforms and .obj on Windows) to help you keep cross-platform portability.
Each of the strings directory_path listed as the value of include_dirs identifies a directory to supply to the compiler as one where header files are found.
Each of the strings library_name listed as the value of libraries names a library to add to the invocation of the linker. Do not specify the file extension or any prefix as part of the library name: distutils, in cooperation with the linker, adds the platform-appropriate file extension and prefix (such as .a (and a prefix lib) on Unix-like platforms, and .lib on Windows) to help you keep cross-platform portability.
Each of the strings directory_path listed as the value of library_dirs identifies a directory to supply to the linker as one where library files are found.
Each of the strings directory_path listed as the value of runtime_library_dirs identifies a directory where dynamically loaded libraries are found at runtime.
Each of the strings macro_name listed as the value of undef_macros is the name for a C preprocessor macro definition, equivalent in effect to the C preprocessor directive:
#undef macro_name
The distutils let the user who is installing your distribution specify many options at installation time. Most often the user will simply enter the following command at a command line:
C:\> python setup.py install
but the already mentioned manual Installing Python Modules explains many alternatives in detail. If you wish to provide suggested values for some installation options, you can put a setup.cfg file in your distribution root directory. setup.cfg can also provide appropriate defaults for options you can supply to build-time commands. For copious details on the format and contents of file setup.cfg, see the already mentioned manual Distributing Python Modules.
When you run:
python setup.py sdist
to produce a packaged-up source distribution (typically a .zip file on Windows, or a .tgz file, also known as a tarball, on Unix), the distutils by default insert the following in the distribution:
All Python and C source files, as well as data files, explicitly mentioned or directly implied by your setup.py file's options, as covered earlier in this chapter
Test files, located at test/test*.py under the distribution root directory
Files README.txt (if any), setup.cfg (if any), and setup.py
You can add yet more files in the source distribution .zip file or tarball by placing in the distribution root directory a manifest template file named MANIFEST.in, whose lines are rules, applied sequentially, about files to add (include) or subtract (prune) from the overall list of files to place in the distribution. The sdist command of the distutils also produces an exact list of the files placed in the source distribution as a text file named MANIFEST in the distribution root directory.
The packaged source distributions you create with python setup.py sdist are the most widely useful files you can produce with distutils. However, you can make life even easier for users with specific platforms by also creating prebuilt forms of your distribution with the command python setup.py bdist.
For a pure distribution, supplying prebuilt forms is merely a matter of convenience for the users. You can create prebuilt pure distributions for any platform, including ones different from those on which you work, as long as you have available on your path the needed commands (such as zip, gzip, bzip2, and tar). Such commands are freely available on the Net for all sorts of platforms, so you can easily stock up on them in order to provide maximum convenience to users who want to install your distribution.
For a non-pure distribution, making prebuilt forms available may be more than just an issue of convenience. A non-pure distribution, by definition, includes code that is not pure Python, generally C code. Unless you supply a prebuilt form, users need to have the appropriate C compiler installed in order to build and install your distribution. This is not a terrible problem on platforms where the appropriate C compiler is the free and ubiquitous gcc. However, on other platforms, the C compiler needed for normal building of Python extensions is commercial and costly. For example, on Windows, the normal C compiler used by Python and its C-coded extensions is Microsoft Visual C++ (Release 6, at the time of this writing). It is possible to substitute other compilers, including free ones such as the mingw32 and cygwin versions of gcc, and Borland C++ 5.5, whose command-line version you can download from the Net at no cost. However, the process of using such alternative compilers, as documented in the Python online manuals, is rather complex and intricate, particularly for end users who may not be experienced programmers.
Therefore, if you want your non-pure distribution to be widely adopted on such platforms as Windows, it's highly advisable to make your distribution also available in prebuilt form. However, unless you have developed or purchased advanced cross-compilation environments, building a non-pure distribution and packaging it up in prebuilt form is only feasible on the target platform. You also need to have the necessary C compiler installed. When those conditions are satisfied, however, the distutils make the procedure quite simple. In particular, the command:
python setup.py bdist_wininst
creates an .exe file that is a Windows installer for your distribution. If your distribution is non-pure, the prebuilt distribution is dependent on the specific Python version. The distutils reflect this fact in the name of the .exe installer they create for you. Say, for example, that your distribution's name metadata is mydist, your distribution's version metadata is 0.1, and the Python version you use is 2.2. In this case, the distutils build a Windows installer named mydist-0.1.win32-py2.2.exe.