9.3 String Formatting

In Python, a string-formatting expression has the syntax:

format % values

where format is a plain or Unicode string containing format specifiers and values is any single object or a collection of objects in a tuple or dictionary. Python's string-formatting operator has roughly the same set of features as the C language's printf and operates in a similar way. Each format specifier is a substring of format that starts with a percent sign (%) and ends with one of the conversion characters shown in Table 9-1.

Table 9-1. String-formatting conversion characters

Character

Output format

Notes

d, i

Signed decimal integer

Value must be number

u

Unsigned decimal integer

Value must be number

o

Unsigned octal integer

Value must be number

x

Unsigned hexadecimal integer (lowercase letters)

Value must be number

X

Unsigned hexadecimal integer (uppercase letters)

Value must be number

e

Floating-point value in exponential form (lowercase e for exponent)

Value must be number

E

Floating-point value in exponential form (uppercase E for exponent)

Value must be number

f, F

Floating-point value in decimal form

Value must be number

g, G

Like e or E when exp is greater than 4 or less than the precision; otherwise like f or F

exp is the exponent of the number being converted

c

Single character

Value can be integer or single-character string

r

String

Converts any value with repr

s

String

Converts any value with str

%

Literal % character

Consumes no value

Between the % and the conversion character, you can specify a number of optional modifiers, as we'll discuss shortly.

The result of a formatting expression is a string that is a copy of format where each format specifier is replaced by the corresponding item of values converted to a string according to the specifier. Here are some simple examples:

x = 42
y = 3.14
z = "george"
print 'result = %d' % x                 # prints: result = 42
print 'answers are: %d %f' % (x,y)      # prints: answers are: 42 3.14
print 'hello %s' % z                    # prints: hello george

9.3.1 Format Specifier Syntax

A format specifier can include numerous modifiers that control how the corresponding item in values is converted to a string. The components of a format specifier, in order, are:

  • The mandatory leading % character that marks the start of the specifier

  • An optional item name in parentheses (e.g. (name))

    • Zero or more optional conversion flags:

    • #, which indicates that the conversion uses an alternate form (if any exists for its type)

    • 0, which indicates that the conversion is zero-padded

    • -, which indicates that the conversion is left-justified

    • a space, which indicates that a space is placed before a positive number

    • +, which indicates that the numeric sign (+ or -) is included before any numeric conversion

  • An optional minimum width of the conversion, specified using one or more digits or an asterisk (*), which means that the width is taken from the next item in values

  • An optional precision for the conversion, specified with a dot (.) followed by zero or more digits or a *, which means that the width is taken from the next item in values

  • A mandatory conversion type from Table 9-1

Item names must be given either in all format specifiers in format or in none of them. When item names are present, values must be a mapping (often the dictionary of a namespace, e.g., vars( )), and each item name is a key in values. In other words, each format specifier corresponds to the item in values keyed by the specifier's item name. When item names are present, you cannot use * in any format specifier.

When item names are absent, values must be a tuple; when there is just one item, values may be the item itself instead of a tuple. Each format specifier corresponds to an item in values by position, and values must have exactly as many items as format has specifiers (plus one extra for each width or precision given by *). When the width or precision component of a specifier is given by *, the * consumes one item in values, which must be an integer and is taken as the number of characters to use as minimum width or precision of the conversion.

9.3.2 Common String-Formatting Idioms

It is quite common for format to contain several occurrences of %s and for values to be a tuple with exactly as many items as format has occurrences of %s. The result is a copy of format where each %s is replaced with str applied to the corresponding item of values. For example:

'%s+%s is %s'%(23,45,68)                # results in: '23+45 is 68'

You can think of %s as a fast and concise way to put together a few values, converted to string form, into a larger string. For example:

oneway = 'x' + str(j) + 'y' + str(j) + 'z'
another = 'x%sy%sz' % (j, j)

After this code is executed, variables oneway and another will always be equal, but the computation of another, done via string formatting, is measurably faster. Which way is clearer and simpler is a matter of habit: get used to the string-formatting idiom, and it will come to look simpler and clearer.

Apart from %s, other reasonably common format specifiers are those used to format floating-point values: %f for decimal formatting, %e for exponential formatting, and %g for either decimal or exponential formatting, depending on the number's magnitude. When formatting floating-point values, you normally specify width and/or precision modifiers. A width modifier is a number right after the % that gives the minimum width for the resulting conversion; you generally use a width modifier if you're formatting a table for display in a fixed-width font. A precision modifier is a number following a dot (.) right before the conversion type letter; you generally use a precision modifier in order to fix the number of decimal digits displayed for a number, to avoid giving a misleading impression of excessive precision and wasting display space. For example:

'%.2f'%(1/3.0)                          # results in: '0.33'
'%s'%(1/3.0)                            # results in: '0.333333333333'

With %s, you cannot specify how many digits to display after the decimal point. It is important to avoid giving a mistaken impression of very high precision when you know that your numeric results are only accurate to a few digits. Displaying high precision values might mislead people examining those results into believing the results are much more accurate than is in fact the case.



    Part III: Python Library and Extension Modules