Using the os module, you can manipulate the filesystem in a variety of ways: creating, copying, and deleting files and directories, comparing files, and examining filesystem information about files and directories. This section documents the attributes and methods of the os module that you use for these purposes, and also covers some related modules that operate on the filesystem.
A file or directory is identified by a string, known as its path, whose syntax depends on the platform. On both Unix-like and Windows platforms, Python accepts Unix syntax for paths, with slash (/) as the directory separator. On non-Unix-like platforms, Python also accepts platform-specific path syntax. On Windows, for example, you can use backslash (\) as the separator. However, you do need to double up each backslash to \\ in normal string literals or use raw-string syntax as covered in Chapter 4. In the rest of this chapter, for brevity, Unix syntax is assumed in both explanations and examples.
Module os supplies attributes that provide details about path strings on the current platform. You should typically use the higher-level path manipulation operations covered in Section 10.2.4 later in this chapter, rather than lower-level string operations based on these attributes. However, the attributes may still be useful at times:
The string that denotes the current directory ('.' on Unix and Windows)
The default search path used if the environment lacks a PATH environment variable
The string that terminates text lines ('\n' on Unix, '\r\n' on Windows)
The string that separates the extension part of a file's name from the rest of the name ('.' on Unix and Windows)
The string that denotes the parent directory ('..' on Unix and Windows)
The separator between paths in lists of paths, such as those used for the environment variable PATH (':' on Unix, ';' on Windows)
The separator of path components ('/' on Unix, '\\' on Windows)
Unix-like platforms associate nine bits with each file or directory, three each for the file's owner (user), its group, and anybody else, indicating whether the file or directory can be read, written, and executed by the specified subject. These nine bits are known as the file's permission bits, part of the file's mode (a bit string that also includes other bits describing the file). These bits are often displayed in octal notation, which groups three bits in each digit. For example, a mode of 0664 indicates a file that can be read and written by its owner and group, but only read, not written, by anybody else. When any process on a Unix-like system creates a file or directory, the operating system applies to the specified mode a bit mask known as the process's umask, which can remove some of the permission bits.
Non-Unix-like platforms handle file and directory permissions in very different ways. However, the functions in Python's standard library that deal with permissions accept a mode argument according to the Unix-like approach described in the previous paragraph. The implementation on each platform maps the nine permission bits in a way appropriate for the given platform. For example, on versions of Windows that distinguish only between read-only and read-write files and do not distinguish file ownership, a file's permission bits show up as either 0666 (read-write) or 0444 (read-only). On such a platform, when a file is created, the implementation looks only at bit 0200, making the file read-write if that bit is 0 or read-only if that bit is 1.
The os module supplies several functions to query and set file and directory status.
access |
access(path,mode) |
Returns True if file path has all of the permissions encoded in integer mode, otherwise False. mode can be os.F_OK to test for file existence, or one or more of os.R_OK, os.W_OK, and os.X_OK (with the bitwise-OR operator | joining them if more than one) to test permissions to read, write, and execute the file.
access does not use the standard interpretation for its mode argument, covered in Section 10.2.2 earlier in this chapter. access tests only if this specific process's real user and group identifiers have the requested permissions on the file. If you need to study a file's permission bits in more detail, see function stat in this section.
chdir |
chdir(path) |
Sets the current working directory to path.
chmod |
chmod(path,mode) |
Changes the permissions of file path, as encoded in integer mode. mode can be zero or more of os.R_OK, os.W_OK, and os.X_OK (with the bitwise-OR operator | joining them if more than one) to set permission to read, write, and execute. On Unix-like platforms, mode can also be a richer bit pattern, as covered in Section 10.2.2 earlier in this chapter.
getcwd |
getcwd( ) |
Returns the path of the current working directory.
listdir |
listdir(path) |
Returns a list whose items are the names of all files and subdirectories found in directory path. The returned list is in arbitrary order, and does not include the special directory names '.' and '..'.
The dircache module also supplies a function named listdir, which works like os.listdir, with two enhancements. First, dircache.listdir returns a sorted list. Further, dircache caches the list it returns, so repeated requests for lists of the same directory are faster if the directory's contents have not changed in the meantime. dircache automatically detects changes, so the list that dircache.listdir returns is always up to date.
makedirs, mkdir |
makedirs(path,mode=0777) mkdir(path,mode=0777) |
makedirs creates all directories that are part of path and do not yet exist. mkdir creates only the rightmost directory of path. Both functions use mode as permission bits of directories they create. Both functions raise OSError if creation fails or if a file or directory named path already exists.
remove, unlink |
remove(path) unlink(path) |
Removes the file named path (see rmdir later in this section to remove a directory). unlink is a synonym of remove.
removedirs |
removedirs(path) |
Loops from right to left over the directories that are part of path, removing each one. The loop ends when a removal attempt raises an exception, generally because a directory is not empty. removedirs does not propagate the exception as long as it has removed at least one directory.
rename |
rename(source,dest) |
Renames the file or directory named source to dest.
renames |
renames(source,dest) |
Like rename, except that renames attempts to create all intermediate directories needed for dest. After the renaming, renames tries to remove empty directories from path source using removedirs. It does not propagate any resulting exception, since it's not an error if the starting directory of source does not become empty after the renaming.
rmdir |
rmdir(path) |
Removes the directory named path (raises OSError if it is not empty).
stat |
stat(path) |
Returns a value x that is a tuple of 10 integers that provide information about a file or subdirectory path. See Section 10.2.5 later in this chapter for details about using the returned tuple. In Python 2.2 and later, x is of type stat_result. You can still use x as a tuple, but you can also access x's items as read-only attributes x.st_mode, x.st_ino, and so on, using as attribute names the lowercase versions of the names of constants listed later in Table 10-1.
A module named statcache also supplies a function named stat, like os.stat but with an enhancement: the returned tuple (or stat_result instance) is cached, so repeated requests about the same file run faster. statcache cannot detect changes automatically, so you should use it only for stable files that do not change in the time between stat requests.
tempnam, tmpnam |
tempnam(dir=None,prefix=None) tmpnam( ) |
Returns an absolute path usable as the name of a new temporary file. If dir is None, the path uses the directory normally used for temporary files on the current platform; otherwise the path uses dir. If prefix is not None, it should be a short string to be prefixed to the temporary file's name. tempnam never returns the name of any already existing file. Your program must create the temporary file, use the file, and remove the file when done, as in the following snippet:
import os def work_on_temporary_file(workfun): nam = os.tempnam( ) fil = open(nam, 'rw+') try: workfun(fil) finally: fil.close( ) os.remove(nam)
tmpnam is a synonym for tempnam. However, tmpnam does not accept arguments, and always behaves like tempnam(None,None). tempnam and tmpnam are potential weaknesses in your program's security, and recent versions of Python emit a warning the first time your program calls these functions to alert you to this fact. See Chapter 17 for information about ways in which your program can interact with warnings.
utime |
utime(path,times=None) |
Sets the accessed and modified times of file or directory path. If times is None, utime uses the current time. Otherwise, times must be a pair of numbers (in seconds since the epoch, as covered in Chapter 12) in the order (accessed, modified).
The os.path module supplies functions to analyze and transform path strings.
abspath |
abspath(path) |
Returns a normalized absolute path equivalent to path, just like:
os.path.normpath(os.path.join(os.getcwd( ),path))
For example, os.path.abspath(os.curdir) always returns the same string as os.getcwd( ).
basename |
basename(path) |
Returns the base name part of path, just like os.path.split(path)[1]. For example, os.path.basename('b/c/d.e') returns 'd.e'.
commonprefix |
commonprefix(list) |
Accepts a list of strings and returns the longest string that is a prefix of all items in the list. Unlike other functions in os.path, commonprefix works on arbitrary strings, not just on paths.
dirname |
dirname(path) |
Returns the directory part of path, just like os.path.split(path)[0]. For example, os.path.basename('b/c/d.e') returns 'b/c'.
exists |
exists(path) |
Returns True when path names an existing file or directory, otherwise False. In other words, os.path.exists(x) always returns the same result as os.access(x,os.F_OK).
expandvars |
expandvars(path) |
Returns a copy of string path, replacing each substring of the form "$name" or "${name}" with the value of environment variable name. The replacement is an empty string if name does not exist in the environment.
getatime, getmtime, getsize |
getatime(path) getmtime(path) getsize(path) |
Each of these functions returns an attribute from the result of os.stat(path), respectively the attributes st_atime, st_mtime, and st_size. See Section 10.2.5 later in this chapter for more information about these attributes.
isabs |
isabs(path) |
Returns True when path is absolute. A path is absolute when it starts with a slash /, or, on some non-Unix-like platforms, with a drive designator followed by os.sep. When path is not absolute, isabs returns False.
isfile |
isfile(path) |
Returns True when path names an existing regular file (in Unix, however, isfile also follows symbolic links), otherwise False.
isdir |
isdir(path) |
Returns True when path names an existing directory (in Unix, however, isdir also follows symbolic links), otherwise False.
islink |
islink(path) |
Returns True when path names a symbolic link. Otherwise (always, on platforms that don't support symbolic links) islink returns False.
ismount |
ismount(path) |
Returns True when path names a mount point. Otherwise (always, on platforms that don't support mount points) ismount returns False.
join |
join(path,*paths) |
Returns a string that joins the argument strings with the appropriate path separator for the current platform. For example, on Unix, exactly one slash character / separates adjacent path components. If any argument is an absolute path, join ignores all previous components. For example:
print os.path.join('a/b', 'c/d','e/f') # on Unix prints: a/b/c/d/e/f print os.path.join('a/b', '/c/d', 'e/f') # on Unix prints: /c/d/e/f
The second call to os.path.join ignores its first argument 'a/b', since its second argument '/c/d' is an absolute path.
normcase |
normcase(path) |
Returns a copy of path with case normalized for the current platform. On case-sensitive filesystems (as typical in Unix), path is returned unchanged. On case-insensitive filesystems, all letters in the returned string are lowercase. On Windows, normcase also converts each / to a \.
normpath |
normpath(path) |
Returns a normalized pathname equivalent to path, removing redundant separators and path-navigation aspects. For example, on Unix, normpath returns 'a/b' when path is any of 'a//b', 'a/./b', or 'a/c/../b'. normpath converts path separators as appropriate for the current platform. For example, on Windows, the returned string uses \ as the separator.
split |
split(path) |
Returns a pair of strings (dir,base) such that join(dir,base) equals path. base is the last pathname component and never contains a path separator. If path ends in a separator, base is ''. dir is the leading part of path, up to the last path separator, shorn of trailing separators. For example, os.path.split('a/b/c/d') returns the pair ('a/b/c','d').
splitdrive |
splitdrive(path) |
Returns a pair of strings (drv,pth) such that drv+pth equals path. drv is either a drive specification or ''. drv is always '' on platforms that do not support drive specifications, such as Unix. For example, on Windows, os.path.splitdrive('c:d/e') returns the pair ('c:','d/e').
splitext |
splitext(path) |
Returns a pair of strings (root,ext) such that root+ext equals path. ext either is '', or starts with a '.' and has no other '.' or path separator. For example, os.path.splitext('a/b.c') returns the pair ('a/b','.c').
walk |
walk(path,func,arg) |
Calls func(arg,dirpath,namelist) for each directory in the tree whose root is directory path, starting with path itself. In each such call to func, dirpath is the path of the directory being visited, and namelist is the list of dirpath's contents as returned by os.listdir. func may modify namelist in-place (e.g., with del) to avoid visiting certain parts of the tree: walk further calls func only for subdirectories remaining in namelist after func returns, if any. arg is provided only for func's convenience: walk just receives arg, and passes arg back to func each time walk calls func. A typical use of os.path.walk is to print all files and subdirectories in a tree:
import os def print_tree(tree_root_dir): def printall(junk, dirpath, namelist): for name in namelist: print os.path.join(dirpath, name) os.path.walk(tree_root_dir, printall, None)
Accessing items in the tuple returned by os.stat by their numeric indices is not advisable. The order of the tuple's 10 items is guaranteed, but using numeric literals to index into the tuple is not readable. The stat module supplies attributes whose values are indices into the tuple returned by os.stat. Table 10-1 lists the attributes of module stat and the meaning of corresponding items.
Item |
stat attribute |
Meaning |
---|---|---|
0 |
ST_MODE |
Protection and other mode bits |
1 |
ST_INO |
Inode number |
2 |
ST_DEV |
Device ID |
3 |
ST_NLINK |
Number of hard links |
4 |
ST_UID |
User ID of owner |
5 |
ST_GID |
Group ID of owner |
6 |
ST_SIZE |
Size in bytes |
7 |
ST_ATIME |
Time of last access |
8 |
ST_MTIME |
Time of last modification |
9 |
ST_CTIME |
Time of last status change |
In Python 2.2, os.stat returns an instance of type stat_result, whose 10 items are also accessible as attributes named st_mode, st_ino, and so onthe lowercase versions of the stat attributes listed in Table 10-1.
For example, to print the size in bytes of file path, you can use any of:
import os, stat print os.path.getsize(path) print os.stat(path)[6] print os.stat(path)[stat.ST_SIZE] print os.stat(path).st_size # only in Python 2.2 and later
Time values are in seconds since the epoch, as covered in Chapter 12 (int on most platforms, float on the Macintosh). Platforms unable to give a meaningful value for an item use a dummy value for that item.
Module stat also supplies functions that examine the ST_MODE item to determine the kind of file. os.path also supplies functions for such tasks, which operate directly on the file's path. The functions supplied by stat are faster when performing several tests on the same file: they require only one os.stat call at the start of a series of tests, while the functions in os.path ask the operating system for the information at each test. Each function returns True if mode denotes a file of the given kind, otherwise False.
Is the file a directory
Is the file a special device-file of the character kind
Is the file a special device-file of the block kind
Is the file a normal file (not a directory, special device-file, and so on)
Is the file a FIFO (i.e., a named pipe)
Is the file a symbolic link
Is the file a Unix-domain socket
Except for stat.S_ISDIR and stat.S_ISREG, the other functions are meaningful only on Unix-like systems, since most other platforms do not keep special files such as devices in the same namespace as regular files.
Module stat supplies two more functions that extract relevant parts of a file's mode (x[ST_MODE], or x.st_mode, in the result x of function os.stat).
S_IFMT |
S_IFMT(mode) |
Returns those bits of mode that describe the kind of file (i.e., those bits that are examined by functions S_ISDIR, S_ISREG, etc.).
S_IMODE |
S_IMODE(mode) |
Returns those bits of mode that can be set by function os.chmod (i.e., the permission bits and, on Unix-like platforms, other special bits such as the set-user-id flag).
The filecmp module supplies functionality to compare files and directories.
cmp |
cmp(f1,f2,shallow=True,use_statcache=False) |
Compares the files named by path strings f1 and f2. If the files seem equal, cmp returns True, otherwise False. If shallow is true, files are deemed equal if their stat tuples are equal. If shallow is false, cmp reads and compares files with equal stat tuples. If use_statcache is false, cmp obtains file information via os.stat; if use_statcache is true, cmp calls statcache.stat instead. cmp remembers what files have already been compared and does not repeat comparisons unless some file has changed, but use_statcache makes cmp believe that no file ever changes.
cmpfiles |
cmpfiles(dir1,dir2,common,shallow=True,use_statcache=False) |
Loops on sequence common. Each item of common is a string naming a file present in both directories dir1 and dir2. cmpfiles returns a tuple with three lists of strings: (equal,diff,errs). equal is the list of names of files equal in both directories, diff the list of names of files that differ between directories, and errs the list of names of files that could not be compared (not existing in both directories or no permission to read them). Arguments shallow and use_statcache are just as for function cmp.
dircmp |
class dircmp(dir1,dir2,ignore=('RCS','CVS','tags'), hide=('.','..')) |
Creates a new directory-comparison instance object, comparing directories named dir1 and dir2, ignoring names listed in ignore, and hiding names listed in hide. A dircmp instance d exposes three methods:
Outputs to sys.stdout a comparison between dir1 and dir2
Outputs to sys.stdout a comparison between dir1 and dir2 and their common immediate subdirectories
Outputs to sys.stdout a comparison between dir1 and dir2 and their common subdirectories, recursively
A dircmp instance d supplies several attributes, computed just in time (i.e., only if and when needed, thanks to a _ _getattr_ _ special method) so that using a dircmp instance suffers no unnecessary overhead. d's attributes are:
Files and subdirectories that are in both dir1 and dir2
Subdirectories that are in both dir1 and dir2
Files that are in both dir1 and dir2
Names that are in both dir1 and dir2 for which os.stat reports an error or returns different kinds for the versions in the two directories
Files that are in both dir1 and dir2 but with different contents
Files that are in both dir1 and dir2 but could not be compared
Files and subdirectories that are in dir1
Files and subdirectories that are in dir1 and not in dir2
Files and subdirectories that are in dir2
Files and subdirectories that are in dir2 and not in dir1
Files that are in both dir1 and dir2 with the same contents
A dictionary whose keys are the strings in common_dirs: the corresponding values are instances of dircmp for each subdirectory
The shutil module (an abbreviation for shell utilities) supplies functions to copy files and to remove an entire directory tree.
copy |
copy(src,dst) |
Copies the contents of file src, creating or overwriting file dst. If dst is a directory, the target is a file with the same base name as src in directory dst. copy also copies permission bits, but not last-access and modification times.
copy2 |
copy2(src,dst) |
Like copy, but also copies times of last access and modification.
copyfile |
copyfile(src,dst) |
Copies the contents only of file src, creating or overwriting file dst.
copyfileobj |
copyfileobj(fsrc,fdst,bufsize=16384) |
Copies file object fsrc, which must be open for reading, to file object fdst, which must be open for writing. Copies no more than bufsize bytes at a time if bufsize is greater than 0. File objects are covered later in this chapter.
copymode |
copymode(src,dst) |
Copies permission bits of file or directory src to file or directory dst. Both src and dst must exist. Does not modify dst's contents, nor any other aspect of file or directory status.
copystat |
copystat(src,dst) |
Copies permission bits and times of last access and modification of file or directory src to file or directory dst. Both src and dst must exist. Does not modify dst's contents, nor any other aspect of file or directory status.
copytree |
copytree(src,dst,symlinks=False) |
Copies the whole directory tree rooted at src into the destination directory named by dst. dst must not already exist, as copytree creates it. copytree copies each file by using function copy2. When symlinks is true, copytree creates symbolic links in the new tree when it finds symbolic links in the source tree. When symlinks is false, copytree follows each symbolic link it finds, and copies the linked-to file with the link's name. On platforms that do not have the concept of a symbolic link, such as Windows, copytree ignores argument symlinks.
rmtree |
rmtree(path,ignore_errors=False,onerror=None) |
Removes the directory tree rooted at path. When ignore_errors is true, rmtree ignores errors. When ignore_errors is false and onerror is None, any error raises an exception. When onerror is not None, it must be callable with parameters func, path, and excp. func is the function raising an exception (os.remove or os.rmdir), path the path passed to func, and excp the tuple of information that sys.exc_info( ) returns. If onerror raises any exception x, rmtree terminates, and exception x propagates.
The os module supplies functions to handle file descriptors, integers that the operating system uses as opaque handles to refer to open files. Python file objects, covered in the next section, are almost invariably better for input/output tasks, but sometimes working at file-descriptor level lets you perform some operation more rapidly or elegantly. Note that file objects and file descriptors are not interchangeable in any way.
You can get the file descriptor n of a Python file object f by calling n=f.fileno( ). You can wrap a new Python file object f around an open file descriptor fd by calling f=os.fdopen(fd). On Unix-like and Windows platforms, some file descriptors are preallocated when a process starts: 0 is the file descriptor for the process's standard input, 1 for the process's standard output, and 2 for the process's standard error.
os provides the following functions for working with file descriptors.
close |
close(fd) |
Closes file descriptor fd.
dup |
dup(fd) |
Returns a file descriptor that duplicates file descriptor fd.
dup2 |
dup2(fd,fd2) |
Duplicates file descriptor fd to file descriptor fd2. If file descriptor fd2 is already open, dup2 first closes fd2.
fdopen |
fdopen(fd,mode='r',bufsize=-1) |
Returns a Python file object wrapping file descriptor fd. mode and bufsize have the same meaning as for Python's built-in open, covered in the next section.
fstat |
fstat(fd) |
Returns a tuple x (x is a stat_result instance in Python 2.2 and later), with information about the file open on file descriptor fd. Section 10.2.5 earlier in this chapter covers the format of x's contents.
lseek |
lseek(fd,pos,how) |
Sets the current position of file descriptor fd to the signed integer byte offset pos, and returns the resulting byte offset from the start of the file. how indicates the reference (point 0): when how is 0, the reference is the start of the file; when 1, the current position; and when 2, the end of the file. In particular, lseek(fd,0,1) returns the current position's byte offset from the start of the file, without affecting the current position. Normal disk files support such seeking operations, but calling lstat on a file that does not support seeking (e.g., a file open for output to a terminal) raises an exception.
open |
open(file,flags,mode=0777) |
Returns a file descriptor, opening or creating a file named file. If open creates the file, it uses mode as the file's permission bits. flags is an int, normally obtained by bitwise ORing one or more of the following attributes of os:
Opens file for read-only, write-only, or read-write respectively (mutually exclusive: exactly one of these attributes must be in flags)
Opens file in non-blocking (no-delay) mode, if the platform supports this
Appends any new data to file's previous contents
Sets synchronization mode accordingly, if the platform supports this
Creates file, if file does not already exist
Raises an exception if file already exists
Throws away previous contents of file (incompatible with O_RDONLY)
Open file in binary rather than text mode on non-Unix platforms (innocuous and without effect on Unix and Unix-like platforms)
pipe |
pipe( ) |
Creates a pipe and returns a pair of file descriptors (r,w) open for reading and writing respectively.
read |
read(fd,n) |
Reads up to n bytes from file descriptor fd and returns them as a string. Reads and returns m<n bytes when only m more bytes are currently available for reading from the file. In particular, returns the empty