2.2 Standard Modules

2.2.1 Basic String Transformations

The module string forms the core of Python's text manipulation libraries. That module is certainly the place to look before other modules. Most of the methods in the string module, you should note, have been copied to methods of string objects from Python 1.6+. Moreover, methods of string objects are a little bit faster to use than are the corresponding module functions. A few new methods of string objects do not have equivalents in the string module, but are still documented here.

SEE ALSO: str 33; UserString 33;

string • A collection of string operations

There are a number of general things to notice about the functions in the string module (which is composed entirely of functions and constants; no classes).

  1. Strings are immutable (as discussed in Chapter 1). This means that there is no such thing as changing a string "in place" (as we might do in many other languages, such as C, by changing the bytes at certain offsets within the string). Whenever a string module function takes a string object as an argument, it returns a brand-new string object and leaves the original one as is. However, the very common pattern of binding the same name on the left of an assignment as was passed on the right side within the string module function somewhat conceals this fact. For example:

    >>> import string
    >>> str = "Mary had a little lamb"
    >>> str = string.replace(str, 'had', 'ate')
    >>> str
    'Mary ate a little lamb'
    

    The first string object never gets modified per se; but since the first string object is no longer bound to any name after the example runs, the object is subject to garbage collection and will disappear from memory. In short, calling a string module function will not change any existing strings, but rebinding a name can make it look like they changed.

  2. Many string module functions are now also available as string object methods. To use these string object methods, there is no need to import the string module, and the expression is usually slightly more concise. Moreover, using a string object method is usually slightly faster than the corresponding string module function. However, the most thorough documentation of each function/method that exists as both a string module function and a string object method is contained in this reference to the string module.

  3. The form string.join(string.split (...)) is a frequent Python idiom. A more thorough discussion is contained in the reference items for string.join() and string.split(), but in general, combining these two functions is very often a useful way of breaking down a text, processing the parts, then putting together the pieces.

  4. Think about clever string.replace() patterns. By combining multiple string.replace() calls with use of "place holder" string patterns, a surprising range of results can be achieved (especially when also manipulating the intermediate strings with other techniques). See the reference item for string.replace() for some discussion and examples.

  5. A mutable string of sorts can be obtained by using built-in lists, or the array module. Lists can contain a collection of substrings, each one of which may be replaced or modified individually. The array module can define arrays of individual characters, each position modifiable, included with slice notation. The function string.join() or the method "".join() may be used to re-create true strings; for example:

    >>> 1st = ['spam','and','eggs']
    >>> 1st[2] = 'toast'
    >>> print ''.join(lst)
    spamandtoast
    >>> print ' '.join(lst)
    spam and toast
    

    Or:

    >>> import array
    >>> a = array.array('c','spam and eggs')
    >>> print ''.join(a)
    spam and eggs
    >>> a[0] = 'S'
    >>> print ''.join(a)
    Spam and eggs
    >>> a[-4:] = array.array('c','toast')
    >>> print ''.join(a)
    Spam and toast
    
CONSTANTS

The string module contains constants for a number of frequently used collections of characters. Each of these constants is itself simply a string (rather than a list, tuple, or other collection). As such, it is easy to define constants alongside those provided by the string module, should you need them. For example:

>>> import string
>>> string.brackets = "[]{}()<>"
>>> print string.brackets
[]{}()<>
string.digits

The decimal numerals ("0123456789").

string.hexdigits

The hexadecimal numerals ("0123456789abcdefABCDEF").

string.octdigits

The octal numerals ("01234567").

string.lowercase

The lowercase letters; can vary by language. In English versions of Python (most systems):

>>> import string
>>> string.lowercase
'abcdefghijklmnopqrstuvwxyz'

You should not modify string.lowercase for a source text language, but rather define a new attribute, such as string.spanish_lowercase with an appropriate string (some methods depend on this constant).

string.uppercase

The uppercase letters; can vary by language. In English versions of Python (most systems):

>>> import string
>>> string.uppercase
'ABCDEFGHIJKLMNOPQRSTUVWXYZ'

You should not modify string.uppercase for a source text language, but rather define a new attribute, such as string.spanish_uppercase with an appropriate string (some methods depend on this constant).

string.letters

All the letters (string.lowercase+string.uppercase).

string.punctuation

The characters normally considered as punctuation; can vary by language. In English versions of Python (most systems):

>>> import string
>>> string.punctuation
'!"#$%&\'()*+,-./:;<=>?@[\\]^_'{|}~'
string.whitespace

The "empty" characters. Normally these consist of tab, linefeed, vertical tab, formfeed, carriage return, and space (in that order):

>>> import string
>>> string.whitespace
'\011\012\013\014\015 '

You should not modify string.whitespace (some methods depend on this constant).

string.printable

All the characters that can be printed to any device; can vary by language (string.digits+string.letters+string.punctuation+string.whitespace).

FUNCTIONS
string.atof(s=...)

Deprecated. Use float().

Converts a string to a floating point value.

SEE ALSO: eval() 445; float() 422;

string.atoi(s=...[,base=10])

Deprecated with Python 2.0. Use int() if no custom base is needed or if using Python 2.0+.

Converts a string to an integer value (if the string should be assumed to be in a base other than 10, the base may be specified as the second argument).

SEE ALSO: eval() 445; int() 421; long() 422;

string.atol(s=...[,base=10])

Deprecated with Python 2.0. Use long() if no custom base is needed or if using Python 2.0+.

Converts a string to an unlimited length integer value (if the string should be assumed to be in a base other than 10, the base may be specified as the second argument).

SEE ALSO: eval() 445; long() 422; int() 421;

string.capitalize(s=...)
"".capitalize()

Return a string consisting of the initial character converted to uppercase (if applicable), and all other characters converted to lowercase (if applicable):

>>> import string
>>> string.capitalize("mary had a little lamb!")
'Mary had a little lamb!'
>>> string.capitalize("Mary had a Little Lamb!")
'Mary had a little lamb!'
>>> string.capitalize("2 Lambs had Mary!")
'2 lambs had mary!'

For Python 1.6+, use of a string object method is marginally faster and is stylistically preferred in most cases:

>>> "mary had a little lamb".capitalize()
'Mary had a little lamb'

SEE ALSO: string.capwords() 133; string.lower() 138;

string.capwords(s=...)
"".title()

Return a string consisting of the capitalized words. An equivalent expression is:

string.join(map(string.capitalize,string.split(s))

But string.capwords() is a clearer way of writing it. An effect of this implementation is that whitespace is "normalized" by the process:

>>> import string
>>> string.capwords("mary HAD a little lamb!")
'Mary Had A Little Lamb!'
>>> string.capwords("Mary     had a      Little Lamb!")
'Mary Had A Little Lamb!'

With the creation of string methods in Python 1.6, the module function string.capwords() was renamed as a string method to "".title().

SEE ALSO: string.capitalize() 132; string.lower() 138; "".istitle() 136;

string.center(s=. . . , width=...)
"".center(width)

Return a string with s padded with symmetrical leading and trailing spaces (but not truncated) to occupy length width (or more).

>>> import string
>>> string.center(width=30,s="Mary had a little lamb")
'    Mary had a little lamb '
>>> string.center("Mary had a little lamb", 5)
'Mary had a little lamb'

For Python 1.6+, use of a string object method is stylistically preferred in many cases:

>>> "Mary had a little lamb".center(25)
'  Mary had a little lamb '

SEE ALSO: string.ljust() 138; string.rjust() 141;

string.count(s, sub [,start [,end]])
"".count(sub [,start [,end]])

Return the number of nonoverlapping occurrences of sub in s. If the optional third or fourth arguments are specified, only the corresponding slice of s is examined.

>>> import string
>>> string.count("mary had a little lamb", "a")
4
>>> string.count("mary had a little lamb", "a", 3, 10)
2

For Python 1.6+, use of a string object method is stylistically preferred in many cases:

>>> 'mary had a little lamb'.count("a")
4
"".endswith(suffix [,start [,end]])

This string method does not have an equivalent in the string module. Return a Boolean value indicating whether the string ends with the suffix suffix. If the optional second argument start is specified, only consider the terminal substring after offset start. If the optional third argument end is given, only consider the slice [start:end].

SEE ALSO: "".startswith() 144; string.find() 135;

string.expandtabs(s=...[,tabsize=8])
"".expandtabs([,tabsize=8])

Return a string with tabs replaced by a variable number of spaces. The replacement causes text blocks to line up at "tab stops." If no second argument is given, the new string will line up at multiples of 8 spaces. A newline implies a new set of tab stops.

>>> import string
>>> s = 'mary\011had a little lamb'
>>> print s
mary    had a little lamb
>>> string.expandtabs(s, 16)
'mary            had a little lamb'
>>> string.expandtabs(tabsize=l, s=s)
'mary had a little lamb'

For Python 1.6+, use of a string object method is stylistically preferred in many cases:

>>> 'mary\011had a little lamb'.expandtabs(25)
'mary                     had a little lamb'
string.find(s, sub [,start [,end]])
"".find(sub [,start [,end]])

Return the index position of the first occurrence of sub in s. If the optional third or fourth arguments are specified, only the corresponding slice of s is examined (but result is position in s as a whole). Return -1 if no occurrence is found. Position is zero-based, as with Python list indexing:

>>> import string
>>> string.find("mary had a little lamb", "a")
1
>>> string.find("mary had a little lamb", "a", 3, 10)
6
>>> string.find("mary had a little lamb", "b")
21
>>> string.find("mary had a little lamb", "b", 3, 10)
-1

For Python 1.6+, use of a string object method is stylistically preferred in many cases:

>>> 'mary had a little lamb'.find("ad")
6

SEE ALSO: string.index() 135; string.rfind() 140;

string.index(s, sub [,start [,end]])
"".index(sub [,start [,end]])

Return the same value as does string.find() with same arguments, except raise ValueError instead of returning -1 when sub does not occur in s.

>>> import string
>>> string.index("mary had a little lamb", "b")
21
>>> string.index("mary had a little lamb", "b", 3, 10)
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "d:/py20sl/lib/string.py", line 139, in index
    return s.index(*args)
ValueError: substring not found in string.index

For Python 1.6+, use of a string object method is stylistically preferred in many cases:

>>> 'mary had a little lamb'.index("ad")
6

SEE ALSO: string.find() 135; string.rindex() 141;

Several string methods that return Boolean values indicating whether a string has a certain property. None of the .is*() methods, however, have equivalents in the string module:

"".isalpha()

Return a true value if all the characters are alphabetic.

"".isalnum()

Return a true value if all the characters are alphanumeric.

"".isdigit()

Return a true value if all the characters are digits.

"".islower()

Return a true value if all the characters are lowercase and there is at least one cased character:

>>> "ab123".islower(), '123'.islower(), 'Ab123'.islower()
(1, 0, 0)

SEE ALSO: "".lower() 138;

"".isspace()

Return a true value if all the characters are whitespace.

"".istitle()

Return a true value if all the string has title casing (each word capitalized).

SEE ALSO: "".title() 133;

"".isupper()

Return a true value if all the characters are uppercase and there is at least one cased character.

SEE ALSO: "".upper() 146;

string.join(words=...[,sep=" "])
"".join (words)

Return a string that results from concatenating the elements of the list words together, with sep between each. The function string.join() differs from all other string module functions in that it takes a list (of strings) as a primary argument, rather than a string.

It is worth noting string.join() and string.split() are inverse functions if sep is specified to both; in other words, string.join(string.split(s,sep),sep)==s for all s and sep.

Typically, string.join() is used in contexts where it is natural to generate lists of strings. For example, here is a small program to output the list of all-capital words from STDIN to STDOUT, one per line:

list_capwords.py
import string,sys
capwords = []

for line in sys.stdin.readlines():
    for word in line.split():
        if word == word.upper() and word.isalpha():
            capwords.append(word)
print string.join(capwords, '\n')

The technique in the sample list_capwords.py script can be considerably more efficient than building up a string by direct concatenation. However, Python 2.0's augmented assignment reduces the performance difference:

>>> import string
>>> s = "Mary had a little lamb"
>>> t = "its fleece was white as snow"
>>> s = s +" "+ t    # relatively "expensive" for big strings
>>> s += " " + t     # "cheaper" than Python 1.x style
>>> 1st = [s]
>>> lst.append(t)    # "cheapest" way of building long string
>>> s = string.join(lst)

For Python 1.6+, use of a string object method is stylistically preferred in some cases. However, just as string.join() is special in taking a list as a first argument, the string object method "".join() is unusual in being an operation on the (optional) sep string, not on the (required) words list (this surprises many new Python programmers).

SEE ALSO: string.split() 142;

string.joinfields(...)

Identical to string.join().

string.ljust(s=..., width=...)
"".Ijust(width)

Return a string with s padded with trailing spaces (but not truncated) to occupy length width (or more).

>>> import string
>>> string.ljust(width=30,s="Mary had a little lamb")
'Mary had a little lamb        '
>>> string.ljust("Mary had a little lamb", 5)
'Mary had a little lamb'

For Python 1.6+, use of a string object method is stylistically preferred in many cases:

>>> "Mary had a little lamb".ljust(25)
'Mary had a little lamb   '

SEE ALSO: string.rjust() 141; string.center() 133;

string.lower(s=...)
"".lower()

Return a string with any uppercase letters converted to lowercase.

>>> import string
>>> string.lower("mary HAD a little lamb!")
'mary had a little lamb!'
>>> string.lower("Mary had a Little Lamb!")
'mary had a little lamb!'

For Python 1.6+, use of a string object method is stylistically preferred in many cases:

>>> "Mary had a Little Lamb!".lower()
'mary had a little lamb!'

SEE ALSO: string.upper() 146;

string.lstrip(s=...)
"".lstrip([chars=string.whitespace])

Return a string with leading whitespace characters removed. For Python 1.6+, use of a string object method is stylistically preferred in many cases:

>>> import string
>>> s = """
...     Mary had a little lamb      \011"""
>>> string.lstrip(s)
'Mary had a little lamb      \011'
>>> s.lstrip()
'Mary had a little lamb      \011'

Python 2.3+ accepts the optional argument chars to the string object method. All characters in the string chars will be removed.

SEE ALSO: string.rstrip() 142; string.strip() 144;

string.maketrans(from, to)

Return a translation table string for use with string.translate() . The strings from and to must be the same length. A translation table is a string of 256 successive byte values, where each position defines a translation from the chr() value of the index to the character contained at that index position.

>>> import string
>>> ord('A')
65
>>> ord('z')
122
>>> string.maketrans('ABC','abc')[65:123]
'abcDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_'abcdefghijklmnopqrstuvwxyz'
>>> string.maketrans('ABCxyz','abcXYZ')[65:123]
'abcDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_'abcdefghijklmnopqrstuvwXYZ'

SEE ALSO: string.translate() 145;

string.replace(s=..., old=..., new=...[,maxsplit=...])
"".replace(old, new [,maxsplit])

Return a string based on s with occurrences of old replaced by new. If the fourth argument maxsplit is specified, only replace maxsplit initial occurrences.

>>> import string
>>> string.replace("Mary had a little lamb", "a little", "some")
'Mary had some lamb'

For Python 1.6+, use of a string object method is stylistically preferred in many cases:

>>> "Mary had a little lamb".replace("a little", "some")
'Mary had some lamb'

A common "trick" involving string.replace() is to use it multiple times to achieve a goal. Obviously, simply to replace several different substrings in a string, multiple string.replace() operations are almost inevitable. But there is another class of cases where string.replace() can be used to create an intermediate string with "placeholders" for an original substring in a particular context. The same goal can always be achieved with regular expressions, but sometimes staged string.replace() operations are both faster and easier to program:

>>> import string
>>> line = 'variable = val      # see comments #3 and #4'
>>> # we'd like '#3' and '#4' spelled out within comment
>>> string.replace(line,'#','number ')       # doesn't work
'variable = val      number  see comments number 3 and number 4'
>>> place_holder=string.replace(line,' # ',' !!! ') # insrt plcholder
>>> place_holder
'variable = val      !!! see comments #3 and #4'
>>> place_holder=place_holder.replace('#','number ') # almost there
>>> place_holder
'variable = val      !!! see comments number 3 and number 4'
>>> line = string.replace(place_holder,'!!!','#') # restore orig
>>> line
'variable = val      # see comments number 3 and number 4'

Obviously, for jobs like this, a placeholder must be chosen so as not ever to occur within the strings undergoing "staged transformation"; but that should be possible generally since placeholders may be as long as needed.

SEE ALSO: string.translate() 145; mx.TextTools.replace() 314;

string.rfind(s, sub [,start [,end]])
"".rfind(sub [,start [,end]])

Return the index position of the last occurrence of sub in s. If the optional third or fourth arguments are specified, only the corresponding slice of s is examined (but result is position in s as a whole). Return -1 if no occurrence is found. Position is zero-based, as with Python list indexing:

>>> import string
>>> string.rfind("mary had a little lamb", "a")
19
>>> string.rfind("mary had a little lamb", "a", 3, 10)
9
>>> string.rfind("mary had a little lamb", "b")
21
>>> string.rfind("mary had a little lamb", "b", 3, 10)
-1

For Python 1.6+, use of a string object method is stylistically preferred in many cases:

>>> 'mary had a little lamb'.rfind("ad")
6

SEE ALSO: string.rindex() 141; string.find() 135;

string.rindex(s, sub [,start [,end]])
"".rindex(sub [,start [,end]])

Return the same value as does string.rfind() with same arguments, except raise ValueError instead of returning -1 when sub does not occur in s.

>>> import string
>>> string.rindex("mary had a little lamb", "b")
21
>>> string.rindex("mary had a little lamb", "b", 3, 10)
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "d:/py20sl/lib/string.py", line 148, in rindex
    return s.rindex(*args)
ValueError: substring not found in string.rindex

For Python 1.6+, use of a string object method is stylistically preferred in many cases:

>>> 'mary had a little lamb'.index("ad")
6

SEE ALSO: string.rfind() 140; string.index() 135;

string.rjust(s=..., width=...)
"".rjust(width)

Return a string with s padded with leading spaces (but not truncated) to occupy length width (or more).

>>> import string
>>> string.rjust(width=30,s="Mary had a little lamb")
'        Mary had a little lamb'
>>> string.rjust("Mary had a little lamb", 5)
'Mary had a little lamb'

For Python 1.6+, use of a string object method is stylistically preferred in many cases:

>>> "Mary had a little lamb".rjust(25)
'   Mary had a little lamb'

SEE ALSO: string.ljust() 138; string.center() 133;

string.rstrip(s=...)
"".rstrip([chars=string.whitespace])

Return a string with trailing whitespace characters removed. For Python 1.6+, use of a string object method is stylistically preferred in many cases:

>>> import string
>>> s = """
...     Mary had a little lamb       \011"""
>>> string.rstrip(s)
'\012    Mary had a little lamb'
>>> s.rstrip()
'\012    Mary had a little lamb'

Python 2.3+ accepts the optional argument chars to the string object method. All characters in the string chars will be removed.

SEE ALSO: string.lstrip() 139; string.strip() 144;

string.split(s=...[,sep=...[,maxsplit=...]])
"".split([,sep [,maxsplit]])

Return a list of nonoverlapping substrings of s. If the second argument sep is specified, the substrings are divided around the occurrences of sep. If sep is not specified, the substrings are divided around any whitespace characters. The dividing strings do not appear in the resultant list. If the third argument maxsplit is specified, everything "left over" after splitting maxsplit parts is appended to the list, giving the list length 'maxsplit'+1.

>>> import string
>>> s = 'mary had a little lamb    ...with a glass of sherry'
>>> string.split(s, ' a ')
['mary had', 'little lamb     ...with', 'glass of sherry']
>>> string.split(s)
['mary', 'had', 'a', 'little', 'lamb', '...with', 'a', 'glass',
'of', 'sherry']
>>> string.split(s,maxsplit=5)
['mary', 'had', 'a', 'little', 'lamb', '...with a glass of sherry']

For Python 1.6+, use of a string object method is stylistically preferred in many cases:

>>> "Mary had a Little Lamb!".split()
['Mary', 'had', 'a', 'Little', 'Lamb!']

The string.split() function (and corresponding string object method) is surprisingly versatile for working with texts, especially ones that resemble prose. Its default behavior of treating all whitespace as a single divider allows string.split() to act as a quick-and-dirty word parser:

>>> wc = lambda s: len(s.split())
>>> wc("Mary had a Little Lamb")
5
>>> s = """Mary had a Little Lamb
... its fleece as white as snow.
... And everywhere that Mary went  ...  the lamb was sure to go."""
>>> print s
Mary had a Little Lamb
its fleece as white as snow.
And everywhere that Mary went   ...  the lamb was sure to go.
>>> wc(s)
23

The function string.split() is very often used in conjunction with string.join(). The pattern involved is "pull the string apart, modify the parts, put it back together." Often the parts will be words, but this also works with lines (dividing on \n) or other chunks. For example:

>>> import string
>>> s = """Mary had a Little Lamb
... its fleece as white as snow.
... And everywhere that Mary went   ...  the lamb was sure to go."""
>>> string.join(string.split(s))
'Mary had a Little Lamb its fleece as white as snow. And everywhere
... that Mary went the lamb was sure to go.'

A Python 1.6+ idiom for string object methods expresses this technique compactly:

>>> "-".join(s.split())
'Mary-had-a-Little-Lamb-its-fleece-as-white-as-snow.-And-everywhere
...-that-Mary-went--the-lamb-was-sure-to-go.'

SEE ALSO: string.join() 137; mx.TextTools.setsplit() 314; mx.TextTools.charsplit() 311; mx.TextTools.splitat() 315; mx.TextTools.splitlines() 315;

string.splitfields(...)

Identical to string.split().

"".splitlines([keepends=0])

This string method does not have an equivalent in the string module. Return a list of lines in the string. The optional argument keepends determines whether line break character(s) are included in the line strings.

"".startswith(prefix [,start [,end]])

This string method does not have an equivalent in the string module. Return a Boolean value indicating whether the string begins with the prefix prefix. If the optional second argument start is specified, only consider the terminal substring after the offset start. If the optional third argument end is given, only consider the slice [start: end].

SEE ALSO: "".endswith() 134; string.find() 135;

string.strip(s=...)
"".strip([chars=string.whitespace])

Return a string with leading and trailing whitespace characters removed. For Python 1.6+, use of a string object method is stylistically preferred in many cases:

>>> import string
>>> s = """
...     Mary had a little lamb     \011"""
>>> string.strip(s)
'Mary had a little lamb'
>>> s.strip()
'Mary had a little lamb'

Python 2.3+ accepts the optional argument chars to the string object method. All characters in the string chars will be removed.

>>> s = "MARY had a LITTLE lamb STEW"
>>> s.strip("ABCDEFGHIJKLMNOPQRSTUVWXYZ") # strip caps
' had a LITTLE lamb '

SEE ALSO: string.rstrip() 142; string.lstrip() 139;

string.swapcase(s=...)
"".swapcase()

Return a string with any uppercase letters converted to lowercase and any lowercase letters converted to uppercase.

>>> import string
>>> string.swapcase("mary HAD a little lamb!")
'MARY had A LITTLE LAMB!'

For Python 1.6+, use of a string object method is stylistically preferred in many cases:

>>> "Mary had a Little Lamb!".swapcase()
'MARY had A LITTLE LAMB!'

SEE ALSO: string.upper() 146; string.lower() 138;

string.translate(s=..., table=...[,deletechars=""])
"".translate(table [,deletechars=""])

Return a string, based on s, with deletechars deleted (if the third argument is specified) and with any remaining characters translated according to the translation table.

>>> import string
>>> tab = string.maketrans('ABC','abc')
>>> string.translate('MARY HAD a little LAMB', tab, 'Atl')
'MRY HD a ie LMb'

For Python 1.6+, use of a string object method is stylistically preferred in many cases. However, if string.maketrans() is used to create the translation table, one will need to import the string module anyway:

>>> 'MARY HAD a little LAMB'.translate(tab, 'Atl')
'MRY HD a ie LMb'

The string.translate() function is a very fast way to modify a string. Setting up the translation table takes some getting used to, but the resultant transformation is much faster than a procedural technique such as:

>>> (new,frm,to,dlt) = ("",'ABC','abc','Alt')
>>> for c in 'MARY HAD a little LAMB':
...     if c not in dlt:
...         pos = frm.find(c)
...         if pos == -1: new += c
...         else:         new += to[pos]
...
>>> new
'MRY HD a ie LMb'

SEE ALSO: string.maketrans() 139;

string.upper(s=...)
"".upper()

Return a string with any lowercase letters converted to uppercase.

>>> import string
>>> string.upper("mary HAD a little lamb!")
'MARY HAD A LITTLE LAMB!'
>>> string.upper("Mary had a Little Lamb!")
'MARY HAD A LITTLE LAMB!'

For Python 1.6+, use of a string object method is stylistically preferred in many cases:

>>> "Mary had a Little Lamb!".upper()
'MARY HAD A LITTLE LAMB!'

SEE ALSO: string.lower() 138;

string.zfill(s=..., width=...)

Return a string with s padded with leading zeros (but not truncated) to occupy length width (or more). If a leading sign is present, it "floats" to the beginning of the return value. In general, string.zfill() is designed for alignment of numeric values, but no checking is done to see if a string looks number-like.

>>> import string
>>> string.zfill("this", 20)
'0000000000000000this'
>>> string.zfill("-37", 20)
'-0000000000000000037'
>>> string.zfill("+3.7", 20)
'+00000000000000003.7'

Based on the example of string.rjust(), one might expect a string object method "".zfill() ; however, no such method exists.

SEE ALSO: string.rjust() 141;

2.2.2 Strings as Files, and Files as Strings

In many ways, strings and files do a similar job. Both provide a storage container for an unlimited amount of (textual) information that is directly structured only by linear position of the bytes. A first inclination is to suppose that the difference between files and strings is one of persistence?files hang around when the current program is no longer running. But that distinction is not really tenable. On the one hand, standard Python modules like shelve, pickle, and marshal?and third-party modules like xml_pickle and ZODB?provide simple ways of making strings persist (but not thereby correspond in any direct way to a filesystem). On the other hand, many files are not particularly persistent: Special files like STDIN and STDOUT under Unix-like systems exist only for program life; other peculiar files like /dev/cua0 and similar "device files" are really just streams; and even files that live on transient memory disks, or get deleted with program cleanup, are not very persistent.

The real difference between files and strings in Python is no more or less than the set of techniques available to operate on them. File objects can do things like .read() and .seek() on themselves. Notably, file objects have a concept of a "current position" that emulates an imaginary "read-head" passing over the physical storage media. Strings, on the other hand, can be sliced and indexed?for example, str[4:10] or for c in str:?and can be processed with string object methods and by functions of modules like string and re. Moreover, a number of special-purpose Python objects act "file-like" without quite being files; for example, gzip.open() and urllib.urlopen() . Of course, Python itself does not impose any strict condition for just how "file-like" something has to be to work in a file-like context. A programmer has to figure that out for each type of object she wishes to apply techniques to (but most of the time things "just work" right).

Happily, Python provides some standard modules to make files and strings easily interoperable.

mmap • Memory-mapped file support

The mmap module allows a programmer to create "memory-mapped" file objects. These special mmap objects enable most of the techniques you might apply to "true" file objects and simultaneously most of the techniques you might apply to "true" strings. Keep in mind the hinted caveat about "most," however: Many string module functions are implemented using the corresponding string object methods. Since a mmap object is only somewhat "string-like," it basically only implements the .find() method and those "magic" methods associated with slicing and indexing. This is enough to support most string object idioms.

When a string-like change is made to a mmap object, that change is propagated to the underlying file, and the change is persistent (assuming the underlying file is persistent, and that the object called .flush() before destruction). mmap thereby provides an efficient route to "persistent strings."

Some examples of working with memory-mapped file objects are worth looking at:

>>> # Create a file with some test data
>>> open('test','w').write(' #'.join(map(str, range(1000))))
>>> fp = open('test','r+')
>>> import mmap
>>> mm = mmap.mmap(fp.fileno(),1000)
>>> len(mm)
1000
>>> mm[-20:]
'218 #219 #220 #221 #'
>>> import string   # apply a string module method
>>> mm.seek(string.find(mm, '21'))
>>> mm.read(10)
'21 #22 #23'
>>> mm.read(10)     # next ten bytes
' #24 #25 #'
>>> mm.find('21')   # object method to find next occurrence
402
>>> try: string.rfind(mm, '21')
... except AttributeError: print "Unsupported string function"
...
Unsupported string function
>>> '/'.join(re.findall('..21..',mm))   # regex's work nicely
' #21 #/#121 #/ #210 / #212 / #214 / #216 / #218 /#221 #'

It is worth emphasizing that the bytes in a file on disk are in fixed positions. You may use the mmap.mmap.resize() method to write into different portions of a file, but you cannot expand the file from the middle, only by adding to the end.

CLASSES
mmap.mmap(fileno, length [,tagname]) (Windows)
mmap.mmap(fileno, length [,flags=MAP_SHARED, prot=PROT_READ|PROT_WRITE])

Create a new memory-mapped file object. fileno is the numeric file handle to base the mapping on. Generally this number should be obtained using the .fileno() method of a file object. length specifies the length of the mapping. Under Windows, the value 0 may be given for length to specify the current length of the file. If length smaller than the current file is specified, only the initial portion of the file will be mapped. If length larger than the current file is specified, the file can be extended with additional string content.

The underlying file for a memory-mapped file object must be opened for updating, using the "+" mode modifier.

According to the official Python documentation for Python 2.1, a third argument tagname may be specified. If it is, multiple memory-maps against the same file are created. In practice, however, each instance of mmap.mmap() creates a new memory-map whether or not a tagname is specified. In any case, this allows multiple file-like updates to the same underlying file, generally at different positions in the file.

>>> open('test','w').write(' #'.join([str(n) for n in range(1000)]))
>>> fp = open('test','r+')
>>> import mmap
>>> mm1 = mmap.mmap(fp.fileno(),1000)
>>> mm2 = mmap.mmap(fp.fileno(),1000)
>>> mm1.seek(500)
>>> mm1.read(10)
'122 #123 #'
>>> mm2.read(10)
'0 #1 #2 #3'

Under Unix, the third argument flags may be MAP_PRIVATE or MAP_SHARED. If MAP_SHARED is specified for flags, all processes mapping the file will see the changes made to a mmap object. Otherwise, the changes are restricted to the current process. The fourth argument, prot, may be used to disallow certain types of access by other processes to the mapped file regions.

METHODS
mmap.mmap.close()

Close the memory-mapped file object. Subsequent calls to the other methods of the mmap object will raise an exception. Under Windows, the behavior of a mmap object after . close() is somewhat erratic, however. Note that closing the memory-mapped file object is not the same as closing the underlying file object. Closing the underlying file will make the contents inaccessible, but closing the memory-mapped file object will not affect the underlying file object.

SEE ALSO: FILE.close() 16;

mmap.mmap.find(sub [,pos])

Similar to string.find() . Return the index position of the first occurrence of sub in the mmap object. If the optional second argument pos is specified, the result is the offset returned relative to pos. Return -1 if no occurrence is found:

>>> open('test','w').write(' #'.join([str(n) for n in range(1000)]))
>>> fp = open('test','r+')
>>> import mmap
>>> mm = mmap.mmap(fp.fileno(), 0)
>>> mm.find('21')
74
>>> mm.find('21',100)
-26
>>> mm.tell()
0

SEE ALSO: mmap.mmap.seek() 152; string.find() 135;

mmap.mmap.flush([offset, size])

Writes changes made in memory to mmap object back to disk. The first argument offset and second argument size must either both be specified or both be omitted. If offset and size are specified, only the position starting at offset or length size will be written back to disk.

mmap.mmap.flush() is necessary to guarantee that changes are written to disk; however, no guarantee is given that changes will not be written to disk as part of normal Python interpreter housekeeping. mmap should not be used for systems with "cancelable" changes (since changes may not be cancelable).

SEE ALSO: FILE.flush() 16;

mmap.mmap.move(target, source, length)

Copy a substring within a memory-mapped file object. The length of the substring is the third argument length. The target location is the first argument target. The substring is copied from the position source. It is allowable to have the substring's original position overlap its target range, but it must not go past the last position of the mmap object.

>>> open('test','w').write(''.join([c*10 for c in 'ABCDE']))
>>> fp = open('test','r+')
>>> import mmap
>>> mm = mmap.mmap(fp.fileno(),0)
>>> mm[:]
'AAAAAAAAAABBBBBBBBBBCCCCCCCCCCDDDDDDDDDDEEEEEEEEEE'
>>> mm.move(40,0,5)
>>> mm[:]
'AAAAAAAAAABBBBBBBBBBCCCCCCCCCCDDDDDDDDDDAAAAAEEEEE'
mmap.mmap.read(num)

Return a string containing num bytes, starting at the current file position. The file position is moved to the end of the read string. In contrast to the .read() method of file objects, mmap.mmap.read() always requires that a byte count be specified, which makes a memory-map file object not fully substitutable for a file object when data is read. However, the following is safe for both true file objects and memory-mapped file objects:

>>> open('test','w').write(' #'.join( [str (n) for n in range(1000)]))
>>> fp = open('test','r+')
>>> import mmap
>>> mm = mmap.mmap(fp.fileno(),0)
>>> def safe_readall(file):
...     try:
...         length = len(file)
...         return file.read(length)
...     except TypeError:
...         return file.read()
...
>>> s1 = safe_readall(fp)
>>> s2 = safe_readall(mm)
>>> s1 == s2
1

SEE ALSO: mmap.mmap.read_byte() 151; mmap.mmap.readline() 151; mmap.mmap.write() 153; FILE.read() 17;

mmap.mmap.read_byte()

Return a one-byte string from the current file position and advance the current position by one. Same as mmap.mmap.read (1).

SEE ALSO: mmap.mmap.read() 150; mmap.mmap.readline() 151;

mmap.mmap.readline()

Return a string from the memory-mapped file object, starting from the current file position and going to the next newline character. Advance the current file position by the amount read.

SEE ALSO: mmap.mmap.read() 150; mmap.mmap.read_byte() 151; FILE.readline() 17;

mmap.mmap.resize(newsize)

Change the size of a memory-mapped file object. This may be used to expand the size of an underlying file or merely to expand the area of a file that is memory-mapped. An expanded file is padded with null bytes (\000) unless otherwise filled with content. As with other operations on mmap objects, changes to the underlying file system may not occur until a .flush() is performed.

SEE ALSO: mmap.mmap.flush() 150;

mmap.mmap.seek(offset [,mode])

Change the current file position. If a second argument mode is given, a different seek mode can be selected. The default is 0, absolute file positioning. Mode 1 seeks relative to the current file position. Mode 2 is relative to the end of the memory-mapped file (which may be smaller than the whole size of the underlying file). The first argument offset specifies the distance to move the current file position?in mode 0 it should be positive, in mode 2 it should be negative, in mode 1 the current position can be moved either forward or backward.

SEE ALSO: FILE.seek() 17;

mmap.mmap.size()

Return the length of the underlying file. The size of the actual memory-map may be smaller if less than the whole file is mapped:

>>> open('test','w').write('X'*100)
>>> fp = open('test','r+')
>>> import mmap
>>> mm = mmap.mmap(fp.fileno(),50)
>>> mm.size()
100
>>> len(mm)
50

SEE ALSO: len() 14; mmap.mmap.seek() 152; mmap.mmap.tell() 152;

mmap.mmap.tell()

Return the current file position.

>>> open('test','w').write('X'*100)
>>> fp = open('test','r+')
>>> import mmap
>>> mm = mmap.mmap(fp.fileno(), 0)
>>> mm.tell()
0
>>> mm.seek(20)
>>> mm.tell()
20
>>> mm.read(20)
'XXXXXXXXXXXXXXXXXXXX'
>>> mm.tell()
40

SEE ALSO: FILE.tell() 17; mmap.mmap.seek() 152;

mmap.mmap.write(s)

Write s into the memory-mapped file object at the current file position. The current file position is updated to the position following the write. The method mmap.mmap.write() is useful for functions that expect to be passed a file-like object with a .write() method. However, for new code, it is generally more natural to use the string-like index and slice operations to write contents. For example:

>>> open('test','w').write('X'*50)
>>> fp = open('test','r+')
>>> import mmap
>>> mm = mmap.mmap(fp.fileno(), 0)
>>> mm.write('AAAAA')
>>> mm.seek(10)
>>> mm.write('BBBBB')
>>> mm[30:35] = 'SSSSS'
>>> mm[:]
'AAAAAXXXXXBBBBBXXXXXXXXXXXXXXXSSSSSXXXXXXXXXXXXXXX'
>>> mm.tell()
15

SEE ALSO: FILE.write() 17; mmap.mmap.read() 150;

mmap.mmap.write_byte(c)

Write a one-byte string to the current file position, and advance the current position by one. Same as mmap.mmap.write(c) where c is a one-byte string.

SEE ALSO: mmap.mmap.write() 153;

StringIO • File-like objects that read from or write to a string buffer

cStringIO • Fast, but incomplete, StringIO replacement

The StringIO and cStringIO modules allow a programmer to create "memory files," that is, "string buffers." These special StringIO objects enable most of the techniques you might apply to "true" file objects, but without any connection to a filesystem.

The most common use of string buffer objects is when some existing techniques for working with byte-streams in files are to be applied to strings that do not come from files. A string buffer object behaves in a file-like manner and can "drop in" to most functions that want file objects.

cStringIO is much faster than StringIO and should be used in most cases. Both modules provide a StringIO class whose instances are the string buffer objects. cStringI0.StringI0 cannot be subclassed (and therefore cannot provide additional methods), and it cannot handle Unicode strings. One rarely needs to subclass StringIO, but the absence of Unicode support in cStringIO could be a problem for many developers. As well, cStringIO does not support write operations, which makes its string buffers less general (the effect of a write against an in-memory file can be accomplished by normal string operations).

A string buffer object may be initialized with a string (or Unicode for StringIO) argument. If so, that is the initial content of the buffer. Below are examples of usage (including Unicode handling):

>>> from cStringIO import StringIO as CSIO
>>> from StringIO import StringIO as SIO
>>> alef, omega = unichr(1488), unichr(969)
>>> sentence = "In set theory, the Greek "+omega+" represents the \n"+\
...            "ordinal limit of the integers, while the Hebrew \n"+\
...