Chapter 2. Basic String Operations

The cheapest, fastest and most reliable components of a computer system are those that aren't there.

?Gordon Bell, Encore Computer Corporation

If you are writing programs in Python to accomplish text processing tasks, most of what you need to know is in this chapter. Sure, you will probably need to know how to do some basic things with pipes, files, and arguments to get your text to process (covered in Chapter 1); but for actually processing the text you have gotten, the string module and string methods?and Python's basic data structures?do most all of what you need done, almost all the time. To a lesser extent, the various custom modules to perform encodings, encryptions, and compressions are handy to have around (and you certainly do not want the work of implementing them yourself). But at the heart of text processing are basic transformations of bits of text. That's what string functions and string methods do.

There are a lot of interesting techniques elsewhere in this book. I wouldn't have written about them if I did not find them important. But be cautious before doing interesting things. Specifically, given a fixed task in mind, before cracking this book open to any of the other chapters, consider very carefully whether your problem can be solved using the techniques in this chapter. If you can answer this question affirmatively, you should usually eschew the complications of using the higher-level modules and techniques that other chapters discuss. By all means read all of this book for the insight and edification that I hope it provides; but still focus on the "Zen of Python," and prefer simple to complex when simple is enough.

This chapter does several things. Section 2.1 looks at a number of common problems in text processing that can (and should) be solved using (predominantly) the techniques documented in this chapter. Each of these "Problems" presents working solutions that can often be adopted with little change to real-life jobs. But a larger goal is to provide readers with a starting point for adaptation of the examples. It is not my goal to provide mere collections of packaged utilities and modules?plenty of those exist on the Web, and resources like the Vaults of Parnassus <> and the Python Cookbook <> are worth investigating as part of any project/task (and new and better utilities will be written between the time I write this and when you read it). It is better for readers to receive a solid foundation and starting point from which to develop the functionality they need for their own projects and tasks. And even better than spurring adaptation, these examples aim to encourage contemplation. In presenting examples, this book tries to embody a way of thinking about problems and an attitude towards solving them. More than any individual technique, such ideas are what I would most like to share with readers.

Section 2.2 is a "reference with commentary" on the Python standard library modules for doing basic text manipulations. The discussions interspersed with each module try to give some guidance on why you would want to use a given module or function, and the reference documentation tries to contain more examples of actual typical usage than does a plain reference. In many cases, the examples and discussion of individual functions addresses common and productive design patterns in Python. The cross-references are intended to contextualize a given function (or other thing) in terms of related ones (and to help you decide which is right for you). The actual listing of functions, constants, classes, and the like is in alphabetical order within type of thing.

Section 2.3 in many ways continues Section 2.1, but also provides some aids for using this book in a learning context. The problems and solutions presented in Section 2.3 are somewhat more open-ended than those in Section 2.1. As well, each section labeled as "Discussion" is followed by one labeled "Questions." These questions are ones that could be assigned by a teacher to students; but they are also intended to be issues that general readers will enjoy and benefit from contemplating. In many cases, the questions point to limitations of the approaches initially presented, and ask readers to think about ways to address or move beyond these limitations?exactly what readers need to do when writing their own custom code to accomplish outside tasks. However, each Discussion in Section 2.3 should stand on its own, even if the Questions are skipped over by the reader.