Organization of This Book

Here's a quick summary of what the book covers. If you're still relatively new to Perl you may want to work through the chapters in order. If you have some programming experience and are looking for ways to approach problems in bioinformatics with Perl, feel free to skip around.

Part I

Chapter 1

Modules are the standard Perl way of "packaging" useful programs so that other programmers can easily use previous work. Such standard modules as CGI, for instance, put the power of interactive web site programming within reach of a programmer who knows basic Perl. Also discussed in later chapters are Bioperl, for manipulating biological data, and DBI, for gaining access to relational databases. Modules are sometimes considered the most important part of Perl because that's where a lot of the functionality of Perl has been placed. In this chapter I show how to write your own modules, as well as how to find useful modules and use them in your programs.

Chapter 2

Complex data structures and references are fundamentally important to Perl. The basic Perl data structures of scalar, array, and hash go a long way toward solving many (perhaps most) Perl programming problems. However, many commonly used data structures such as multidimensional arrays, for instance, require more sophisticated Perl data structures to handle them. Perl enables you to define quite complex data structures, and we'll see how all that works.

String algorithms are standard techniques used in bioinformatics for finding important data in biological sequences; with them, you can compare two sequences, align two or more sequences, assemble a collection of sequence fragments, and so forth. String algorithms underlie many of the most commonly used programs in biology research, such as BLAST. In this chapter, a string matching algorithm that finds the closest match to a motif, based on the technique of dynamic programming, is presented in the form of a working Perl program.

Chapter 3

Object-oriented programming is a standard approach to designing programs. I assume, as a prerequisite, that you are familiar with the programming style called declarative programming. (For example, C and FORTRAN are declarative; C++ and Java are object-oriented; Perl can be either.) It's important for the Perl programmer to be familiar with the object-oriented approach. For instance, modules are usually defined in an object-oriented manner.

This chapter presents, step by step, the concepts and techniques of object-oriented Perl programming, in the context of a module that defines a simple class for keeping track of genes.

Chapter 4

In this chapter, object-oriented programming is further explored in the context of developing software to convert sequence files to alternate formats (FASTA, GCG, etc.). The concept of class inheritance is introduced and implemented.

Chapter 5

This chapter further develops object-oriented programming by writing a class that handles Rebase restriction enzyme data, a class that calculates restriction maps, and a class that draws restriction maps.

Part II

Chapter 6

Relational databases are important in programming because they save, organize, and retrieve data sets. This chapter introduces relational databases and the SQL language and includes information on designing and administering databases. I take a close look at how one such relational database management system, the popular MySQL, is used from the Perl language.

Chapter 7

Web programming is one of Perl's areas of strength. In this chapter, I start an example that puts a laboratory up on the Web using Perl and the CGI module. The software developed in previous chapters for restriction mapping is made accessible from the Web.

Chapter 8

Using computer graphics to display data is one of the most important programming skills in bioinformatics. In this chapter, graphics programs are used to dynamically display the output of restriction maps and data presented as graphs on the Web. The Perl module GD is discussed and used to generate maps on the fly from web page queries.

Chapter 9

Bioperl is a set of modules used by Perl programmers to write bioinformatics applications. In this chapter you'll see an introduction of the Bioperl project. Bioperl is open source (free under a very nonrestrictive copyright) and developed by a group of volunteers, many based in supportive research organizations. In recent years it has achieved critical mass and is now adequately documented and fairly broad in scope. If you do Perl bioinformatics programming, you should certainly be aware of what Bioperl has to offer, to avoid reinventing the wheel.

Part III