3.2 Using Perl Classes (Without Writing Them)

Before you actually start writing classes, it's helpful to know how to use them. This section shows you how to use OO Perl classes, even if the syntax is new to you and you've never written one yourself.

Thanks to the large and active community of Perl programmers, there are many useful Perl classes already written and freely available to use in your programs. Very often, the class you want already exists. All you need to do is obtain it and use it.

First, you need to find the appropriate module or modules (CPAN is the most common source for modules), install it, and examine the documentation to learn how to use the class. Finding and installing OO modules employs the same process covered in Chapter 1.

What's different about OO modules is how they create data structures and call and pass arguments to subroutines. In short, there's some new syntax to learn that amounts to a slightly different style of programming.

Object-oriented code creates an object by naming the class and calling a special method in the class (usually called new). The newly created object is a reference to a data structure, usually a hash. The object is then used to call methods (or OO subroutines). You're used to subroutines that get their data passed in as arguments; by contrast, OO code has a data structure that calls subroutines to operate on it. My goal in this section is to explain enough of this new terminology and syntax so you can read and understand the documentation for a class, and use it in your own programs.

Let's begin with the documentation for the Carp module, a nonOO module that appears later in this chapter. This is a simple module to use; it defines four subroutines, and the documentation gives brief examples of their use. Because the Carp module comes installed with any recent release of Perl, you don't have to install it. To find out how to use it, type:

perldoc Carp

(Upper- and lowercase is significant; typing perldoc carp won't work.) Here's the beginning of the output:

       carp    - warn of errors (from perspective of caller)
       cluck   - warn of errors with stack backtrace
                 (not exported by default)
       croak   - die of errors (from perspective of caller)
       confess - die of errors with stack backtrace

       use Carp;
       croak "We're outta here!";

       use Carp qw(cluck);
       cluck "This is how we got here!";

It shows what subroutines are available and how to use them in your code. Additional details do appear in the documentation. They are important and sometimes necessary to read carefully, but you usually don't need to delve any further than the SYNOPSIS section that gives examples. To use the croak subroutine, you first load it with the directive use Carp;. You then call croak by providing a string containing a message as an argument; the program prints the message and dies.

The basic Perl documentation is found at http://www.perldoc.com and http://www.perl.com, and includes standard distribution modules such as Carp. A great many modules such as ioperl aren't shipped with the standard Perl distribution. To find the documentation for Bioperl on the web, start at the CPAN web site: http://www.CPAN.org. Once you locate Bioperl in CPAN, you'll be directed to the Bioperl home page at http://www.bioperl.org. If the modules in question are already installed on your computer system, type at the command line:

perldoc bioperl

Depending on how up-to-date your version of Bioperl is, you'll get something like:

BIOPERL(1)     User Contributed Perl Documentation     BIOPERL(1)

       Bioperl - Coordinated OOP-Perl Modules for Biology

       Not very appropriate to put a synopsis  - many different
       objects to use. Read on...

       Bioperl contains a number of Perl objects which are useful
       in biology.  Examples include Sequence objects, Alignment
       objects and database searching objects. These objects not
       only do what they are advertised to do in the documenta
       tion, but they also interact - Alignment objects are made
       from the Sequence objects and so on. This means that the
       objects provide a coordinated framework to do computational 

       If you are new to bioperl, reading biostart.pod will get
       you aquainted with writing scripts and the main players
       for the objects.

       We now also have a cookbook tutorial in bptutorial.pl
       which has embedded documentation. Start there if learning-
       by-example suits you most.

       Bioperl development is focused on the objects themselves,
       and less on the scripts (programs) that put these objects
       together. There are some example scripts provided in the
       distribution, but it is not the focus of the objects that
       are distributed. Of course, as the objects do most of the
       hardwork for you, all you have to do is combine a number
       of objects together sensibly.

       The intent of the bioperl development effort is to make
       reusable tools that aid people in creating their own site
       or job specific applications.

       The bioperl.org (http://bioperl.org) website also attempts
       to maintain links and archives of standalone bio-related
       perl tools that are not affiliated or related to the core
       bioperl effort. Check the site for useful code ideas and
       contribute your own if possible.

The second paragraph of the DESCRIPTION advises you to read biostart.pod if you're new to Bioperl. Type:

perldoc biostart

and you get the following output (only the first page is reproduced here):

BIOSTART(1)    User Contributed Perl Documentation    BIOSTART(1)

       Bioperl - Getting Started


         use Bio::Seq;
         use Bio::SeqIO;

         $seqin = Bio::SeqIO->new( '-format' => 'EMBL' , -file => 'myfile.dat');
         $seqout= Bio::SeqIO->new( '-format' => 'Fasta', -file => '>output.fa');

         while((my $seqobj = $seqin->next_seq(  ))) {
               print "Seen sequence ",$seqobj->display_id,", start of seq ",
               if( $seqobj->moltype eq 'dna') {
                   $rev = $seqobj->revcom;
                   $id  = $seqobj->display_id(  );
                   $id  = "$id.rev";

               foreach $feat ( $seqobj->top_SeqFeatures(  ) ) {
                  if( $feat->primary_tag eq 'exon' ) {
                     print STDOUT "Location ",$feat->start,":",
                           $feat->end," GFF[",$feat->gff_string,"]\n";

       Bioperl is a set of Perl modules that represent useful
       biological objects. Some of the key objects represent:
       Sequences, features on sequences, databases of sequences,
       flat file representations of sequences and similarity
       search results.

       Because bioperl is formed from Perl modules, there are no
       actual useable programs in the distribution (this is not
       actually true.  In the scripts directory there are a few
       useful programs. But not a great deal...). You have to
       write the programs which use bioperl.

       It is very easy to write programs using the bioperl mod-
       ules, as a lot of the complex processing happens in the
       modules and not in the part of the program which you have
       to write. The idea is that you can connect up a number of
       the modules to do useful things. The synopsis above gives
       a simple script which uses bioperl. Stepping through this
       script, the lines mean the following things:


The file gives an example of code that uses Bioperl modules, and the rest of the biostart documentation explains the example in some detail. Let's take a closer look at the OO syntax of this example.

After the typical use statements (needed to load the modules), such as:

use Bio::SeqIO;

the documentation has the following line in the example:

$seqin = Bio::SeqIO->new( '-format' => 'EMBL' , -file => 'myfile.dat');

This line calls the new method. new is the name typically used in OO Perl for the subroutine that creates an object. The object that's returned from the method call is saved as a reference in a scalar variable. In this case, the new object in the Bio::SeqIO class is saved in the reference variable $seqin.

The new method is identified by giving the name of the class (Bio::SeqIO), followed by an arrow (->), and finally the method name:


This is the syntax for calling methods. If you're just interested in using the class, not in understanding the inner mechanisms, you simply have to remember to invoke the new method that creates a class object in this way. The other methods in a class are typically called from a class object that has been previously created (by just such a call to a new method).

Later in the biostart example you see the line:

while((my $seqobj = $seqin->next_seq(  ))) {

The call to the method next_seq is done as follows:

$seqobj = $seqin->next_seq(  )

Here, the Bio::SeqIO class object $seqin is being used to call the method next_seq in the class Bio::SeqIO. Because $seqin was created as an object in the class Bio::SeqIO, it can be used with arrow notation (->) to call a method in the class, without specifically mentioning the class name. This is how methods other than new are typically called. The result here is saved as $seqobj, a new object.

The important thing to remember about subroutines in OO code is that you create objects by calling the new method in the class. You call other methods by calling them on an object in the class. Both types of calls are accomplished with arrow notation.

Here's a new object being created in a class Myclass:

$myobject = Myclass->new(  );

Here's a method compute being called on that object:

$myobject->compute(  );

One more item in the biostart example that may appear unfamiliar is the way arguments to the methods are specified. Consider this line from the example:

$seqin = Bio::SeqIO->new( '-format' => 'EMBL' , -file => 'myfile.dat');

The arguments are passed as named arguments, which are pairs with the name of the argument followed by the symbol => followed by a value for the argument. If this looks suspiciously like the notation used to initialize hashes, it's no accident. In this invocation of the new method, you're initializing a new object, which is implemented as a hash, so it makes sense that you'd pass your initial values to the object using hash notation.

The details of how arguments are passed to methods are covered later in the chapter. For now, if you just want to use this class, you'll need to pass your arguments to the new method in the style just shown. Methods in a class usually pass arguments in this hash-like, key => value notation, but not always. If you use the syntax as shown in the documentation, your code will be fine.

One advantage to using a hash for arguments is that the arguments can be given in any order: you don't have to give them in a prescribed order as is often the case when passing a list of scalars as arguments to a subroutine.

I'll return to this biostart example in Chapter 9. For now, know that you should have sufficient syntax information to use a Perl OO module. The next section shows how to start pulling it all together.