4.2 Data Types

The operation of a Python program hinges on the data it handles. All data values in Python are represented by objects, and each object, or value, has a type. An object's type determines what operations the object supports, or, in other words, what operations you can perform on the data value. The type also determines the object's attributes and items (if any) and whether the object can be altered. An object that can be altered is known as a mutable object, while one that cannot be altered is an immutable object. I cover object attributes and items in detail later in this chapter.

The built-in type(obj) accepts any object as its argument and returns the type object that represents the type of obj. Another built-in function, isinstance(obj,type), returns True if object obj is represented by type object type; otherwise, it returns False (built-in names True and False were introduced in Python 2.2.1; in older versions, 1 and 0 are used instead).

Python has built-in objects for fundamental data types such as numbers, strings, tuples, lists, and dictionaries, as covered in the following sections. You can also create user-defined objects, known as classes, as discussed in detail in Chapter 5.

4.2.1 Numbers

The built-in number objects in Python support integers (plain and long), floating-point numbers, and complex numbers. All numbers in Python are immutable objects, meaning that when you perform an operation on a number object, you always produce a new number object. Operations on numbers, called arithmetic operations, are covered later in this chapter.

Integer literals can be decimal, octal, or hexadecimal. A decimal literal is represented by a sequence of digits where the first digit is non-zero. An octal literal is specified with a 0 followed by a sequence of octal digits (0 to 7). To indicate a hexadecimal literal, use 0x followed by a sequence of hexadecimal digits (0 to 9 and A to F, in either upper- or lowercase). For example:

1, 23, 3493                  # Decimal integers
01, 027, 06645               # Octal integers
0x1, 0x17, 0xDA5             # Hexadecimal integers

Any kind of integer literal may be followed by the letter L or l to denote a long integer. For instance:

1L, 23L, 99999333493L        # Long decimal integers
01L, 027L, 01351033136165L   # Long octal integers
0x1L, 0x17L, 0x17486CBC75L   # Long hexadecimal integers

Use uppercase L here, not lowercase l, which may look like the digit 1. The difference between a long integer and a plain integer is that a long integer has no predefined size limit: it may be as large as memory allows. A plain integer takes up a few bytes of memory and has minimum and maximum values that are dictated by machine architecture. sys.maxint is the largest available plain integer, while -sys.maxint-1 is the largest negative one. On typical 32-bit machines, sys.maxint is 2147483647.

A floating-point literal is represented by a sequence of decimal digits that includes a decimal point (.), an exponent part (an e or E, optionally followed by + or -, followed by one or more digits), or both. The leading character of a floating-point literal cannot be e or E: it may be any digit or a period (.) (prior to Python 2.2, a leading 0 had to be immediately followed by a period). For example:

0., 0.0, .0, 1., 1.0, 1e0, 1.e0, 1.0e0

A Python floating-point value corresponds to a C double and shares its limits of range and precision, typically 53 bits of precision on modern platforms. (Python currently offers no way to find out this range and precision.)

A complex number is made up of two floating-point values, one each for the real and imaginary parts. You can access the parts of a complex object z as read-only attributes z.real and z.imag. You can specify an imaginary literal as a floating-point or decimal literal followed by a j or J:

0j, 0.j, 0.0j, .0j, 1j, 1.j, 1.0j, 1e0j, 1.e0j, 1.0e0j

The j at the end of the literal indicates the square root of -1, as commonly used in electrical engineering (some other disciplines use i for this purpose, but Python has chosen j). There are no other complex literals; constant complex numbers are denoted by adding or subtracting a floating-point literal and an imaginary one.

Note that numeric literals do not include a sign: a leading + or -, if present, is a separate operator, as discussed later in this chapter.

4.2.2 Sequences

A sequence is an ordered container of items, indexed by non-negative integers. Python provides built-in sequence types for strings (plain and Unicode), tuples, and lists. Library and extension modules provide other sequence types, and you can write yet others yourself (as discussed in Chapter 5). Sequences can be manipulated in a variety of ways, as discussed later in this chapter.

4.2.2.1 Strings

A built-in string object is an ordered collection of characters used to store and represent text-based information. Strings in Python are immutable, meaning that when you perform an operation on a string, you always produce a new string object rather than mutating the existing string. String objects provide numerous methods, as discussed in detail in Chapter 9.

A string literal can be quoted or triple-quoted. A quoted string is a sequence of zero or more characters enclosed in matching quote characters, single (') or double ("). For example:

'This is a literal string'
"This is another string"

The two different kinds of quotes function identically; having both allows you to include one kind of quote inside of a string specified with the other kind without needing to escape them with the backslash character (\):

'I\'m a Python fanatic'           # a quote can be escaped
"I'm a Python fanatic"            # this way is more readable

To have a string span multiple lines, you can use a backslash as the last character of the line to indicate that the next line is a continuation:

"A not very long string\
that spans two lines"             # comment not allowed on previous line

To make the string output on two lines, you must embed a newline in the string:

"A not very long string\n\
that prints on two lines"         # comment not allowed on previous line

Another approach is to use a triple-quoted string, which is enclosed by matching triplets of quote characters (''' or """):

"""An even bigger
string that spans
three lines"""                    # comments not allowed on previous lines

In a triple-quoted string literal, line breaks in the literal are preserved as newline characters in the resulting string object.

The only character that cannot be part of a triple-quoted string is an unescaped backslash, while a quoted string cannot contain an unescaped backslash, a line-end, and the quote character that encloses it. The backslash character starts an escape sequence, which lets you introduce any character in either kind of string. Python's string escape sequences are listed in Table 4-1.

Table 4-1. String escape sequences

Sequence

Meaning

ASCII/ISO code

\<newline>

End of line is ignored

None

\\

Backslash

0x5c
\'

Single quote

0x27
\"

Double quote

0x22
\a

Bell

0x07
\b

Backspace

0x08
\f

Form feed

0x0c
\n

Newline

0x0a
\r

Carriage return

0x0d
\t

Tab

0x09
\v

Vertical tab

0x0b
\DDD

Octal value DDD

As given

\xXX

Hexadecimal value XX

As given

\other

Any other character

0x5c + as given

A variant of a string literal is a raw string. The syntax is the same as for quoted or triple-quoted string literals, except that an r or R immediately precedes the leading quote. In raw strings, escape sequences are not interpreted as in Table 4-1, but are literally copied into the string, including backslashes and newline characters. Raw string syntax is handy for strings that include many backslashes, as in regular expressions (see Chapter 9). A raw string cannot end with an odd number of backslashes: the last one would be taken as escaping the terminating quote.

Unicode string literals have the same syntax as other string literals, plus a u or U immediately before the leading quote character. Unicode string literals can use \u followed by four hexadecimal digits to denote Unicode characters, and can also include the kinds of escape sequences listed in Table 4-1. Unicode literals can also include the escape sequence \N{name}, where name is a standard Unicode name as per the list at http://www.unicode.org/charts/. For example, \N{Copyright Sign} indicates a Unicode copyright sign character (©). Raw Unicode string literals start with ur, not ru.

Multiple string literals of any kind (quoted, triple-quoted, raw, Unicode) can be adjacent, with optional whitespace in between. The compiler concatenates such adjacent string literals into a single string object. If any literal in the concatenation is Unicode, the whole result is Unicode. Writing a long string literal in this way lets you present it readably across multiple physical lines, and gives you an opportunity to insert comments about parts of the string. For example:

marypop = ('supercalifragilistic'   # Open paren -> logical line continues
           'expialidocious')        # Indentation ignored in continuation

The result here is a single word of 34 characters.

4.2.2.2 Tuples

A tuple is an immutable ordered sequence of items. The items of a tuple are arbitrary objects and may be of different types. To specify a tuple, use a series of expressions (the items of the tuple) separated by commas (,). You may optionally place a redundant comma after the last item. You may group tuple items with parentheses, but the parentheses are needed only where the commas would otherwise have another meaning (e.g., in function calls) or to denote empty or nested tuples. A tuple with exactly two items is also often called a pair. To create a tuple of one item (a singleton), add a comma to the end of the expression. An empty tuple is denoted by an empty pair of parentheses. Here are some tuples, all enclosed in optional parentheses:

(100,200,300)              # Tuple with three items
(3.14,)                    # Tuple with one item
(  )                         # Empty tuple

You can also call the built-in tuple to create a tuple. For example:

tuple('wow')

This builds a tuple equal to:

('w', 'o', 'w')

tuple( ) without arguments creates and returns an empty tuple. When x is a sequence, tuple(x) returns a tuple whose items are the same as the items in sequence x.

4.2.2.3 Lists

A list is a mutable ordered sequence of items. The items of a list are arbitrary objects and may be of different types. To specify a list, use a series of expressions (the items of the list) separated by commas (,) and within brackets ([ ]). You may optionally place a redundant comma after the last item. An empty list is denoted by an empty pair of brackets. Here are some example lists:

[42,3.14,'hello']          # List with three items
[100]                      # List with one item
[  ]                         # Empty list

You can also call the built-in list to create a list. For example:

list('wow')

This builds a list equal to:

['w', 'o', 'w']

list( ) without arguments creates and returns an empty list. When x is a sequence, list(x) creates and returns a new list whose items are the same as the items in sequence x. You can also build lists with list comprehensions, as discussed later in this chapter.

4.2.3 Dictionaries

A mapping is an arbitrary collection of objects indexed by nearly arbitrary values called keys. Mappings are mutable and, unlike sequences, are unordered.

Python provides a single built-in mapping type, the dictionary type. Library and extension modules provide other mapping types, and you can write others yourself (as discussed in Chapter 5). Keys in a dictionary may be of different types, but they must be hashable (see function hash in Section 8.2 in Chapter 8). Values in a dictionary are arbitrary objects and may be of different types. An item in a dictionary is a key/value pair. You can think of a dictionary as an associative array (also known in some other languages as a hash).

To specify a dictionary, use a series of pairs of expressions (the pairs are the items of the dictionary) separated by commas (,) within braces ({ }). You may optionally place a redundant comma after the last item. Each item in a dictionary is written key:value, where key is an expression giving the item's key and value is an expression giving the item's value. If a key appears more than once in a dictionary, only one of the items with that key is kept in the dictionary. In other words, dictionaries do not allow duplicate keys. An empty dictionary is denoted by an empty pair of braces. Here are some dictionaries:

{ 'x':42, 'y':3.14, 'z':7 }     # Dictionary with three items and string keys
{ 1:2, 3:4 }                    # Dictionary with two items and integer keys
{  }                            # Empty dictionary

In Python 2.2 and up, you can call the built-in dict to create a dictionary. For example:

dict([[1,2],[3,4]])

This builds a dictionary equal to:

{1:2,3:4}

dict( ) without arguments creates and returns an empty dictionary. When the argument x to dict is a mapping, dict returns a new dictionary object with the same keys and values as x. When x is a sequence, the items in x must be pairs, and dict(x) returns a dictionary whose items (key/value pairs) are the same as the items in sequence x. If a key appears more than once in x, only the last item with that key is kept in the resulting dictionary.

4.2.4 None

The built-in type None denotes a null object. None has no methods or other attributes. You can use None as a placeholder when you need a reference but you don't care about what object you refer to, or when you need to indicate that no object is there. Functions return None as their result unless they have specific return statements coded to return other values.

4.2.5 Callables

In Python, callable types are those whose instances support the function call operation (see Section 4.4 later in this chapter). Functions are obviously callable, and Python provides built-in functions (see Chapter 8) and also supports user-defined functions (see Section 4.10 later in this chapter). Generators, which are new as of Python 2.2, are also callable (see Section 4.10.8 later in this chapter).

Types are also callable. Thus, the dict, list, and tuple built-ins discussed earlier are in fact types. Prior to Python 2.2, these names referred to factory functions for creating objects of these types. As of Python 2.2, however, they refer to the type objects themselves. Since types are callable, this change does not break existing programs. See Chapter 8 for a complete list of built-in types.

As we'll discuss in Chapter 5, class objects are callable. So are methods, which are functions bound to class attributes. Finally, class instances whose classes supply _ _call_ _ methods are also callable.

4.2.6 Boolean Values

Prior to Python 2.3, there is no explicit Boolean type in Python. However, every data value in Python can be evaluated as a truth value: true or false. Any non-zero number or non-empty string, tuple, list, or dictionary evaluates as true. Zero (of any numeric type), None, and empty strings, tuples, lists, and dictionaries evaluate as false. Python also has a number of built-in functions that return Boolean results.

Built-in names True and False were introduced in Python 2.2.1 to represent true and false; in older versions of Python, 1 and 0 are used instead. Throughout the rest of this book, I will use True and False to represent true and false. If you are using a version of Python older than 2.2.1, you'll need to substitute 1 and 0 when using examples from this book.

Python 2.2.1 also introduced a new built-in function named bool. When this function is called with any argument, it considers the argument's value in a Boolean context and returns False or True accordingly.

In Python 2.3, bool becomes a type (a subclass of int) and True and False are the values of that type. The only substantial effect of this innovation is that the string representations of Boolean values become 'True' and 'False', while in earlier versions they are '1' and '0'.

The 2.2.1 and 2.3 changes are handy because they let you speak of functions and expressions as "returning True or False" or "returning a Boolean." The changes also let you write clearer code when you want to return a truth value (e.g., return True instead of return 1).



    Part III: Python Library and Extension Modules