10.2 Filesystem Operations

Using the os module, you can manipulate the filesystem in a variety of ways: creating, copying, and deleting files and directories, comparing files, and examining filesystem information about files and directories. This section documents the attributes and methods of the os module that you use for these purposes, and also covers some related modules that operate on the filesystem.

10.2.1 Path-String Attributes of the os Module

A file or directory is identified by a string, known as its path, whose syntax depends on the platform. On both Unix-like and Windows platforms, Python accepts Unix syntax for paths, with slash (/) as the directory separator. On non-Unix-like platforms, Python also accepts platform-specific path syntax. On Windows, for example, you can use backslash (\) as the separator. However, you do need to double up each backslash to \\ in normal string literals or use raw-string syntax as covered in Chapter 4. In the rest of this chapter, for brevity, Unix syntax is assumed in both explanations and examples.

Module os supplies attributes that provide details about path strings on the current platform. You should typically use the higher-level path manipulation operations covered in Section 10.2.4 later in this chapter, rather than lower-level string operations based on these attributes. However, the attributes may still be useful at times:

curdir

The string that denotes the current directory ('.' on Unix and Windows)

defpath

The default search path used if the environment lacks a PATH environment variable

linesep

The string that terminates text lines ('\n' on Unix, '\r\n' on Windows)

extsep

The string that separates the extension part of a file's name from the rest of the name ('.' on Unix and Windows)

pardir

The string that denotes the parent directory ('..' on Unix and Windows)

pathsep

The separator between paths in lists of paths, such as those used for the environment variable PATH (':' on Unix, ';' on Windows)

sep

The separator of path components ('/' on Unix, '\\' on Windows)

10.2.2 Permissions

Unix-like platforms associate nine bits with each file or directory, three each for the file's owner (user), its group, and anybody else, indicating whether the file or directory can be read, written, and executed by the specified subject. These nine bits are known as the file's permission bits, part of the file's mode (a bit string that also includes other bits describing the file). These bits are often displayed in octal notation, which groups three bits in each digit. For example, a mode of 0664 indicates a file that can be read and written by its owner and group, but only read, not written, by anybody else. When any process on a Unix-like system creates a file or directory, the operating system applies to the specified mode a bit mask known as the process's umask, which can remove some of the permission bits.

Non-Unix-like platforms handle file and directory permissions in very different ways. However, the functions in Python's standard library that deal with permissions accept a mode argument according to the Unix-like approach described in the previous paragraph. The implementation on each platform maps the nine permission bits in a way appropriate for the given platform. For example, on versions of Windows that distinguish only between read-only and read-write files and do not distinguish file ownership, a file's permission bits show up as either 0666 (read-write) or 0444 (read-only). On such a platform, when a file is created, the implementation looks only at bit 0200, making the file read-write if that bit is 0 or read-only if that bit is 1.

10.2.3 File and Directory Functions of the os Module

The os module supplies several functions to query and set file and directory status.

access

access(path,mode)

Returns True if file path has all of the permissions encoded in integer mode, otherwise False. mode can be os.F_OK to test for file existence, or one or more of os.R_OK, os.W_OK, and os.X_OK (with the bitwise-OR operator | joining them if more than one) to test permissions to read, write, and execute the file.

access does not use the standard interpretation for its mode argument, covered in Section 10.2.2 earlier in this chapter. access tests only if this specific process's real user and group identifiers have the requested permissions on the file. If you need to study a file's permission bits in more detail, see function stat in this section.

chdir

chdir(path)

Sets the current working directory to path.

chmod

chmod(path,mode)

Changes the permissions of file path, as encoded in integer mode. mode can be zero or more of os.R_OK, os.W_OK, and os.X_OK (with the bitwise-OR operator | joining them if more than one) to set permission to read, write, and execute. On Unix-like platforms, mode can also be a richer bit pattern, as covered in Section 10.2.2 earlier in this chapter.

getcwd

getcwd(  )

Returns the path of the current working directory.

listdir

listdir(path)

Returns a list whose items are the names of all files and subdirectories found in directory path. The returned list is in arbitrary order, and does not include the special directory names '.' and '..'.

The dircache module also supplies a function named listdir, which works like os.listdir, with two enhancements. First, dircache.listdir returns a sorted list. Further, dircache caches the list it returns, so repeated requests for lists of the same directory are faster if the directory's contents have not changed in the meantime. dircache automatically detects changes, so the list that dircache.listdir returns is always up to date.

makedirs, mkdir

makedirs(path,mode=0777)
mkdir(path,mode=0777)

makedirs creates all directories that are part of path and do not yet exist. mkdir creates only the rightmost directory of path. Both functions use mode as permission bits of directories they create. Both functions raise OSError if creation fails or if a file or directory named path already exists.

remove, unlink

remove(path)
unlink(path)

Removes the file named path (see rmdir later in this section to remove a directory). unlink is a synonym of remove.

removedirs

removedirs(path)

Loops from right to left over the directories that are part of path, removing each one. The loop ends when a removal attempt raises an exception, generally because a directory is not empty. removedirs does not propagate the exception as long as it has removed at least one directory.

rename

rename(source,dest)

Renames the file or directory named source to dest.

renames

renames(source,dest)

Like rename, except that renames attempts to create all intermediate directories needed for dest. After the renaming, renames tries to remove empty directories from path source using removedirs. It does not propagate any resulting exception, since it's not an error if the starting directory of source does not become empty after the renaming.

rmdir

rmdir(path)

Removes the directory named path (raises OSError if it is not empty).

stat

stat(path)

Returns a value x that is a tuple of 10 integers that provide information about a file or subdirectory path. See Section 10.2.5 later in this chapter for details about using the returned tuple. In Python 2.2 and later, x is of type stat_result. You can still use x as a tuple, but you can also access x's items as read-only attributes x.st_mode, x.st_ino, and so on, using as attribute names the lowercase versions of the names of constants listed later in Table 10-1.

A module named statcache also supplies a function named stat, like os.stat but with an enhancement: the returned tuple (or stat_result instance) is cached, so repeated requests about the same file run faster. statcache cannot detect changes automatically, so you should use it only for stable files that do not change in the time between stat requests.

tempnam, tmpnam

tempnam(dir=None,prefix=None)
tmpnam(  )

Returns an absolute path usable as the name of a new temporary file. If dir is None, the path uses the directory normally used for temporary files on the current platform; otherwise the path uses dir. If prefix is not None, it should be a short string to be prefixed to the temporary file's name. tempnam never returns the name of any already existing file. Your program must create the temporary file, use the file, and remove the file when done, as in the following snippet:

import os
def work_on_temporary_file(workfun):
    nam = os.tempnam(  )
    fil = open(nam, 'rw+')
    try:
        workfun(fil)
    finally:
        fil.close(  )
        os.remove(nam)

tmpnam is a synonym for tempnam. However, tmpnam does not accept arguments, and always behaves like tempnam(None,None). tempnam and tmpnam are potential weaknesses in your program's security, and recent versions of Python emit a warning the first time your program calls these functions to alert you to this fact. See Chapter 17 for information about ways in which your program can interact with warnings.

utime

utime(path,times=None)

Sets the accessed and modified times of file or directory path. If times is None, utime uses the current time. Otherwise, times must be a pair of numbers (in seconds since the epoch, as covered in Chapter 12) in the order (accessed, modified).

10.2.4 The os.path Module

The os.path module supplies functions to analyze and transform path strings.

abspath

abspath(path)

Returns a normalized absolute path equivalent to path, just like:

os.path.normpath(os.path.join(os.getcwd(  ),path))

For example, os.path.abspath(os.curdir) always returns the same string as os.getcwd( ).

basename

basename(path)

Returns the base name part of path, just like os.path.split(path)[1]. For example, os.path.basename('b/c/d.e') returns 'd.e'.

commonprefix

commonprefix(list)

Accepts a list of strings and returns the longest string that is a prefix of all items in the list. Unlike other functions in os.path, commonprefix works on arbitrary strings, not just on paths.

dirname

dirname(path)

Returns the directory part of path, just like os.path.split(path)[0]. For example, os.path.basename('b/c/d.e') returns 'b/c'.

exists

exists(path)

Returns True when path names an existing file or directory, otherwise False. In other words, os.path.exists(x) always returns the same result as os.access(x,os.F_OK).

expandvars

expandvars(path)

Returns a copy of string path, replacing each substring of the form "$name" or "${name}" with the value of environment variable name. The replacement is an empty string if name does not exist in the environment.

getatime, getmtime, getsize

getatime(path)

getmtime(path)
getsize(path)

Each of these functions returns an attribute from the result of os.stat(path), respectively the attributes st_atime, st_mtime, and st_size. See Section 10.2.5 later in this chapter for more information about these attributes.

isabs

isabs(path)

Returns True when path is absolute. A path is absolute when it starts with a slash /, or, on some non-Unix-like platforms, with a drive designator followed by os.sep. When path is not absolute, isabs returns False.

isfile

isfile(path)

Returns True when path names an existing regular file (in Unix, however, isfile also follows symbolic links), otherwise False.

isdir

isdir(path)

Returns True when path names an existing directory (in Unix, however, isdir also follows symbolic links), otherwise False.

islink

islink(path)

Returns True when path names a symbolic link. Otherwise (always, on platforms that don't support symbolic links) islink returns False.

ismount

ismount(path)

Returns True when path names a mount point. Otherwise (always, on platforms that don't support mount points) ismount returns False.

join

join(path,*paths)

Returns a string that joins the argument strings with the appropriate path separator for the current platform. For example, on Unix, exactly one slash character / separates adjacent path components. If any argument is an absolute path, join ignores all previous components. For example:

print os.path.join('a/b', 'c/d','e/f')        
# on Unix prints: a/b/c/d/e/f
print os.path.join('a/b', '/c/d', 'e/f')      
# on Unix prints: /c/d/e/f

The second call to os.path.join ignores its first argument 'a/b', since its second argument '/c/d' is an absolute path.

normcase

normcase(path)

Returns a copy of path with case normalized for the current platform. On case-sensitive filesystems (as typical in Unix), path is returned unchanged. On case-insensitive filesystems, all letters in the returned string are lowercase. On Windows, normcase also converts each / to a \.

normpath

normpath(path)

Returns a normalized pathname equivalent to path, removing redundant separators and path-navigation aspects. For example, on Unix, normpath returns 'a/b' when path is any of 'a//b', 'a/./b', or 'a/c/../b'. normpath converts path separators as appropriate for the current platform. For example, on Windows, the returned string uses \ as the separator.

split

split(path)

Returns a pair of strings (dir,base) such that join(dir,base) equals path. base is the last pathname component and never contains a path separator. If path ends in a separator, base is ''. dir is the leading part of path, up to the last path separator, shorn of trailing separators. For example, os.path.split('a/b/c/d') returns the pair ('a/b/c','d').

splitdrive

splitdrive(path)

Returns a pair of strings (drv,pth) such that drv+pth equals path. drv is either a drive specification or ''. drv is always '' on platforms that do not support drive specifications, such as Unix. For example, on Windows, os.path.splitdrive('c:d/e') returns the pair ('c:','d/e').

splitext

splitext(path)

Returns a pair of strings (root,ext) such that root+ext equals path. ext either is '', or starts with a '.' and has no other '.' or path separator. For example, os.path.splitext('a/b.c') returns the pair ('a/b','.c').

walk

walk(path,func,arg)

Calls func(arg,dirpath,namelist) for each directory in the tree whose root is directory path, starting with path itself. In each such call to func, dirpath is the path of the directory being visited, and namelist is the list of dirpath's contents as returned by os.listdir. func may modify namelist in-place (e.g., with del) to avoid visiting certain parts of the tree: walk further calls func only for subdirectories remaining in namelist after func returns, if any. arg is provided only for func's convenience: walk just receives arg, and passes arg back to func each time walk calls func. A typical use of os.path.walk is to print all files and subdirectories in a tree:

import os
def print_tree(tree_root_dir):
    def printall(junk, dirpath, namelist):
        for name in namelist: 
            print os.path.join(dirpath, name)
    os.path.walk(tree_root_dir, printall, None)

10.2.5 The stat Module

Accessing items in the tuple returned by os.stat by their numeric indices is not advisable. The order of the tuple's 10 items is guaranteed, but using numeric literals to index into the tuple is not readable. The stat module supplies attributes whose values are indices into the tuple returned by os.stat. Table 10-1 lists the attributes of module stat and the meaning of corresponding items.

Table 10-1. Items of a stat tuple

Item

stat attribute

Meaning

0
ST_MODE

Protection and other mode bits

1
ST_INO

Inode number

2
ST_DEV

Device ID

3
ST_NLINK

Number of hard links

4
ST_UID

User ID of owner

5
ST_GID

Group ID of owner

6
ST_SIZE

Size in bytes

7
ST_ATIME

Time of last access

8
ST_MTIME

Time of last modification

9
ST_CTIME

Time of last status change

In Python 2.2, os.stat returns an instance of type stat_result, whose 10 items are also accessible as attributes named st_mode, st_ino, and so onthe lowercase versions of the stat attributes listed in Table 10-1.

For example, to print the size in bytes of file path, you can use any of:

import os, stat

print os.path.getsize(path)
print os.stat(path)[6]
print os.stat(path)[stat.ST_SIZE]
print os.stat(path).st_size             # only in Python 2.2 and later

Time values are in seconds since the epoch, as covered in Chapter 12 (int on most platforms, float on the Macintosh). Platforms unable to give a meaningful value for an item use a dummy value for that item.

Module stat also supplies functions that examine the ST_MODE item to determine the kind of file. os.path also supplies functions for such tasks, which operate directly on the file's path. The functions supplied by stat are faster when performing several tests on the same file: they require only one os.stat call at the start of a series of tests, while the functions in os.path ask the operating system for the information at each test. Each function returns True if mode denotes a file of the given kind, otherwise False.

S_ISDIR( mode)

Is the file a directory

S_ISCHR( mode)

Is the file a special device-file of the character kind

S_ISBLK( mode)

Is the file a special device-file of the block kind

S_ISREG( mode)

Is the file a normal file (not a directory, special device-file, and so on)

S_ISFIFO( mode)

Is the file a FIFO (i.e., a named pipe)

S_ISLNK( mode)

Is the file a symbolic link

S_ISSOCK( mode)

Is the file a Unix-domain socket

Except for stat.S_ISDIR and stat.S_ISREG, the other functions are meaningful only on Unix-like systems, since most other platforms do not keep special files such as devices in the same namespace as regular files.

Module stat supplies two more functions that extract relevant parts of a file's mode (x[ST_MODE], or x.st_mode, in the result x of function os.stat).

S_IFMT

S_IFMT(mode)

Returns those bits of mode that describe the kind of file (i.e., those bits that are examined by functions S_ISDIR, S_ISREG, etc.).

S_IMODE

S_IMODE(mode)

Returns those bits of mode that can be set by function os.chmod (i.e., the permission bits and, on Unix-like platforms, other special bits such as the set-user-id flag).

10.2.6 The filecmp Module

The filecmp module supplies functionality to compare files and directories.

cmp

cmp(f1,f2,shallow=True,use_statcache=False)

Compares the files named by path strings f1 and f2. If the files seem equal, cmp returns True, otherwise False. If shallow is true, files are deemed equal if their stat tuples are equal. If shallow is false, cmp reads and compares files with equal stat tuples. If use_statcache is false, cmp obtains file information via os.stat; if use_statcache is true, cmp calls statcache.stat instead. cmp remembers what files have already been compared and does not repeat comparisons unless some file has changed, but use_statcache makes cmp believe that no file ever changes.

cmpfiles

cmpfiles(dir1,dir2,common,shallow=True,use_statcache=False)

Loops on sequence common. Each item of common is a string naming a file present in both directories dir1 and dir2. cmpfiles returns a tuple with three lists of strings: (equal,diff,errs). equal is the list of names of files equal in both directories, diff the list of names of files that differ between directories, and errs the list of names of files that could not be compared (not existing in both directories or no permission to read them). Arguments shallow and use_statcache are just as for function cmp.

dircmp

class dircmp(dir1,dir2,ignore=('RCS','CVS','tags'),
             hide=('.','..'))

Creates a new directory-comparison instance object, comparing directories named dir1 and dir2, ignoring names listed in ignore, and hiding names listed in hide. A dircmp instance d exposes three methods:

d.report( )

Outputs to sys.stdout a comparison between dir1 and dir2

d.report_partial_closure( )

Outputs to sys.stdout a comparison between dir1 and dir2 and their common immediate subdirectories

d.report_full_closure( )

Outputs to sys.stdout a comparison between dir1 and dir2 and their common subdirectories, recursively

A dircmp instance d supplies several attributes, computed just in time (i.e., only if and when needed, thanks to a _ _getattr_ _ special method) so that using a dircmp instance suffers no unnecessary overhead. d's attributes are:

d.common

Files and subdirectories that are in both dir1 and dir2

d.common_dirs

Subdirectories that are in both dir1 and dir2

d.common_files

Files that are in both dir1 and dir2

d.common_funny

Names that are in both dir1 and dir2 for which os.stat reports an error or returns different kinds for the versions in the two directories

d.diff_files

Files that are in both dir1 and dir2 but with different contents

d.funny_files

Files that are in both dir1 and dir2 but could not be compared

d.left_list

Files and subdirectories that are in dir1

d.left_only

Files and subdirectories that are in dir1 and not in dir2

d.right_list

Files and subdirectories that are in dir2

d.right_only

Files and subdirectories that are in dir2 and not in dir1

d.same_files

Files that are in both dir1 and dir2 with the same contents

d.subdirs

A dictionary whose keys are the strings in common_dirs: the corresponding values are instances of dircmp for each subdirectory

10.2.7 The shutil Module

The shutil module (an abbreviation for shell utilities) supplies functions to copy files and to remove an entire directory tree.

copy

copy(src,dst)

Copies the contents of file src, creating or overwriting file dst. If dst is a directory, the target is a file with the same base name as src in directory dst. copy also copies permission bits, but not last-access and modification times.

copy2

copy2(src,dst)

Like copy, but also copies times of last access and modification.

copyfile

copyfile(src,dst)

Copies the contents only of file src, creating or overwriting file dst.

copyfileobj

copyfileobj(fsrc,fdst,bufsize=16384)

Copies file object fsrc, which must be open for reading, to file object fdst, which must be open for writing. Copies no more than bufsize bytes at a time if bufsize is greater than 0. File objects are covered later in this chapter.

copymode

copymode(src,dst)

Copies permission bits of file or directory src to file or directory dst. Both src and dst must exist. Does not modify dst's contents, nor any other aspect of file or directory status.

copystat

copystat(src,dst)

Copies permission bits and times of last access and modification of file or directory src to file or directory dst. Both src and dst must exist. Does not modify dst's contents, nor any other aspect of file or directory status.

copytree

copytree(src,dst,symlinks=False)

Copies the whole directory tree rooted at src into the destination directory named by dst. dst must not already exist, as copytree creates it. copytree copies each file by using function copy2. When symlinks is true, copytree creates symbolic links in the new tree when it finds symbolic links in the source tree. When symlinks is false, copytree follows each symbolic link it finds, and copies the linked-to file with the link's name. On platforms that do not have the concept of a symbolic link, such as Windows, copytree ignores argument symlinks.

rmtree

rmtree(path,ignore_errors=False,onerror=None)

Removes the directory tree rooted at path. When ignore_errors is true, rmtree ignores errors. When ignore_errors is false and onerror is None, any error raises an exception. When onerror is not None, it must be callable with parameters func, path, and excp. func is the function raising an exception (os.remove or os.rmdir), path the path passed to func, and excp the tuple of information that sys.exc_info( ) returns. If onerror raises any exception x, rmtree terminates, and exception x propagates.

10.2.8 File Descriptor Operations

The os module supplies functions to handle file descriptors, integers that the operating system uses as opaque handles to refer to open files. Python file objects, covered in the next section, are almost invariably better for input/output tasks, but sometimes working at file-descriptor level lets you perform some operation more rapidly or elegantly. Note that file objects and file descriptors are not interchangeable in any way.

You can get the file descriptor n of a Python file object f by calling n=f.fileno( ). You can wrap a new Python file object f around an open file descriptor fd by calling f=os.fdopen(fd). On Unix-like and Windows platforms, some file descriptors are preallocated when a process starts: 0 is the file descriptor for the process's standard input, 1 for the process's standard output, and 2 for the process's standard error.

os provides the following functions for working with file descriptors.

close

close(fd)

Closes file descriptor fd.

dup

dup(fd)

Returns a file descriptor that duplicates file descriptor fd.

dup2

dup2(fd,fd2)

Duplicates file descriptor fd to file descriptor fd2. If file descriptor fd2 is already open, dup2 first closes fd2.

fdopen

fdopen(fd,mode='r',bufsize=-1)

Returns a Python file object wrapping file descriptor fd. mode and bufsize have the same meaning as for Python's built-in open, covered in the next section.

fstat

fstat(fd)

Returns a tuple x (x is a stat_result instance in Python 2.2 and later), with information about the file open on file descriptor fd. Section 10.2.5 earlier in this chapter covers the format of x's contents.

lseek

lseek(fd,pos,how)

Sets the current position of file descriptor fd to the signed integer byte offset pos, and returns the resulting byte offset from the start of the file. how indicates the reference (point 0): when how is 0, the reference is the start of the file; when 1, the current position; and when 2, the end of the file. In particular, lseek(fd,0,1) returns the current position's byte offset from the start of the file, without affecting the current position. Normal disk files support such seeking operations, but calling lstat on a file that does not support seeking (e.g., a file open for output to a terminal) raises an exception.

open

open(file,flags,mode=0777)

Returns a file descriptor, opening or creating a file named file. If open creates the file, it uses mode as the file's permission bits. flags is an int, normally obtained by bitwise ORing one or more of the following attributes of os:

O_RDONLY , O_WRONLY, O_RDWR

Opens file for read-only, write-only, or read-write respectively (mutually exclusive: exactly one of these attributes must be in flags)

O_NDELAY , O_NONBLOCK

Opens file in non-blocking (no-delay) mode, if the platform supports this

O_APPEND

Appends any new data to file's previous contents

O_DSYNC , O_RSYNC, O_SYNC, O_NOCTTY

Sets synchronization mode accordingly, if the platform supports this

O_CREAT

Creates file, if file does not already exist

O_EXCL

Raises an exception if file already exists

O_TRUNC

Throws away previous contents of file (incompatible with O_RDONLY)

O_BINARY

Open file in binary rather than text mode on non-Unix platforms (innocuous and without effect on Unix and Unix-like platforms)

pipe

pipe(  )

Creates a pipe and returns a pair of file descriptors (r,w) open for reading and writing respectively.

read

read(fd,n)

Reads up to n bytes from file descriptor fd and returns them as a string. Reads and returns m<n bytes when only m more bytes are currently available for reading from the file. In particular, returns the empty



Part III: Python Library and Extension Modules