14.8 The mmap Module

The mmap module supplies memory-mapped file objects. An mmap object behaves similarly to a plain (not Unicode) string, so you can often pass an mmap object where a plain string is expected. However, there are differences:

  • An mmap object does not supply the methods of a string object

  • An mmap object is mutable, while string objects are immutable

  • An mmap object also corresponds to an open file and behaves polymorphically to a Python file object (as covered in Chapter 10)

An mmap object m can be indexed or sliced, yielding plain strings. Since m is mutable, you can also assign to an indexing or slicing of m. However, when you assign to a slice of m, the right-hand side of the assignment statement must be a string of exactly the same length as the slice you're assigning to. Therefore, many of the useful tricks available with list slice assignment (covered in Chapter 4) do not apply to mmap slice assignment.

Module mmap supplies a factory function that is different on Unix-like systems and Windows.

mmap

mmap(filedesc,length,tagname='')    # Windows
mmap(filedesc,length,flags=MAP_SHARED,
     prot=PROT_READ|PROT_WRITE)     # Unix

Creates and returns an mmap object m that maps into memory the first length bytes of the file indicated by file descriptor filedesc. filedesc must normally be a file descriptor opened for both reading and writing (except, on Unix-like platforms, when argument prot requests only reading or only writing). File descriptors are covered in Section 10.2.8. To get an mmap object m that refers to a Python file object f, use m=mmap.mmap(f.fileno( ),length).

On Windows only, you can pass a string tagname to give an explicit tag name for the memory mapping. This tag name lets you have several memory mappings on the same file, but this functionality is rarely necessary. Calling mmap with only two arguments has the advantage of keeping your code portable between Windows and Unix-like platforms. On Windows, all memory mappings are readable and writable and shared between processes, so that all processes with a memory mapping on a file can see changes made by each such process.

On Unix-like platforms only, you can pass mmap.MAP_PRIVATE as the flags argument to get a mapping that is private to your process and copy-on-write. mmap.MAP_SHARED, the default, gets a mapping that is shared with other processes, so that all processes mapping the file can see changes made by one process (same as on Windows). You can pass mmap.PROT_READ as the prot argument to get a mapping that you can only read, not write. Passing mmap.PROT_WRITE gets a mapping that you can only write, not read. The bitwise-OR mmap.PROT_READ|mmap.PROT_WRITE, the default, gets a mapping that you can both read and write (same as on Windows).

14.8.1 Methods of mmap Objects

An mmap object m supplies the following methods.

close

m.close(  )

Closes the file of m.

find

m.find(str,start=0)

Returns the lowest index I greater than or equal to start such that str= =m[i:i+len(str)]. If no such i exists, m.find returns -1. This is the same functionality as for the find method of string objects, covered in Chapter 9.

flush

m.flush([offset,n])

Ensures that all changes made to m also exist on m's file. Until you call m.flush, it's uncertain whether the file reflects the current state of m. You can pass a starting byte offset offset and a byte count n to limit the flushing effect's guarantee to a slice of m. You must pass both arguments, or neither: it is an error to call m.flush with exactly one argument.

move

m.move(dstoff,srcoff,n)

Like the slicing m[dstoff:dstoff+n]=m[srcoff:srcoff+n], but potentially faster. The source and destination slices can overlap. Apart from such potential overlap, move does not affect the source slice (i.e., the move method copies bytes but does not move them, despite the method's name).

read

m.read(n)

Reads and returns a string s containing up to n bytes starting from m's file pointer, then advances m's file pointer by len(s). If there are less than n bytes between m's file pointer and m's length, returns the bytes available. In particular, if m's file pointer is at the end of m, returns the empty string ''.

read_byte

m.read_byte(  )

Returns a string of length 1 containing the character at m's file pointer, then advances m's file pointer by 1. m.read_byte( ) is similar to m.read(1). However, if m's file pointer is at the end of m, m.read(1) returns the empty string '', while m.read_byte( ) raises a ValueError exception.

readline

m.readline(  )

Reads and returns one line from the file of m, from m's current file pointer up to the next '\n', included (or up to the end of m, if there is no '\n'), then advances m's file pointer to point just past the bytes just read. If m's file pointer is at the end of m, readline returns the empty string ''.

resize

m.resize(n)

Changes the length of m, so that len(m) becomes n. Does not affect the size of m's file. m's length and the file's size are independent. To set m's length to be equal to the file's size, call m.resize(m.size( )). If m's length is larger than the file's size, m is padded with null bytes (\x00).

seek

m.seek(pos,how=0)

Sets the file pointer of m to the integer byte offset pos. how indicates the reference point (point 0): when how is 0, the reference point is the start of the file; when 1, m's current file pointer; when 2, the end of m. A seek that tries to set m's file pointer to a negative byte offset, or to a positive offset beyond m's length, raises a ValueError exception.

size

m.size(  )

Returns the length (number of bytes) of the file of m, not the length of m itself. To get the length of m, use len(m).

tell

m.tell(  )

Returns the current position of the file pointer of m, as a byte offset from the start of m's file.

write

m.write(str)

Writes the bytes in str into m and at the current position of m's file pointer, overwriting the bytes that were there, and then advances m's file pointer by len(str). If there aren't at least len(str) bytes between m's file pointer and the length of m, write raises a ValueError exception.

write_byte

m.write_byte(byte)

Writes byte, which must be a single-character string, into mapping m at the current position of m's file pointer, overwriting the byte that was there, and then advances m's file pointer by 1. When x is a single-character string, m.write_byte(x) is similar to m.write(x). However, if m's file pointer is at the end of m, m.write_byte(x) silently does nothing, while m.write(x) raises a ValueError exception. Note that this is the reverse of the relationship between read and read_byte at end-of-file: write and read_byte raise ValueError, while read and write_byte don't.

14.8.2 Using mmap Objects for IPC

The way in which processes communicate using mmap is similar to IPC using files: one process can write data, and another process can later read the same data back. Since an mmap object rests on an underlying file, you can also have some processes doing I/O directly on the file, as covered in Chapter 10, while others use mmap to access the same file. You can choose between mmap and I/O on file objects on the basis of convenience: the functionality is the same. For example, here is a simple program that uses file I/O to make the contents of a file equal to the last line interactively typed by the user:

fileob = open('xxx','w')
while True:
    data = raw_input('Enter some text:')
    fileob.seek(0)
    fileob.write(data)
    fileob.truncate(  )
    fileob.flush(  )

And here is another simple program that, when run in the same directory as the former, uses mmap (and the time.sleep function, covered in Chapter 12) to check every second for changes to the file and print out the file's new contents:

import mmap, os, time
mx = mmap.mmap(os.open('xxx',os.O_RDWR), 1)
last = None
while True:
    mx.resize(mx.size(  ))
    data = mx[:]
    if data != last:
        print data
        last = data
    time.sleep(1)


    Part III: Python Library and Extension Modules