3.6 Details of the Gene1 Class

In this section, I introduce the OO features used to make a class in Perl. First however, I explain the variable naming convention I use, as well as the handy Carp module.

3.6.1 Variable Names and Conventions

Using an underscore in front of a name is a programming convention that usually indicates that the item in question (e.g., a variable or hash key) isn't meant for the outside world but only for internal use.

This is just a convention; Perl doesn't require you to do it. It will, however, make your code easier to read and understand.

I generally follow this convention and put underscores in front of names that I don't want directly accessed by the programmer using the class. (In Perl, unlike some more strict OO languages, you can access data that's internal to a class, which make this naming convention that distinguishes internal variables particularly useful.)

Thus, in my Gene1 class, the attributes _name, _organism, _chromosome, and _pdbref are used internally only as the hash keys for the attributes in the object. When you use the class, as I do in my example program testGene1, you don't even have to know these names exist.

The interface is through arguments that specify the initialization values of these attributes. These arguments are called name, organism, chromosome, and pdbref. I also have methods?the subroutines also called name, organism, chromosome, and pdbref?that return the value of the actual attributes stored in the object.

3.6.2 Carp and croak

The Carp module is called near the top of Gene1.pm with use Carp;.

Carp is a standard Perl module that provides informative error messages in the case of problems. carp prints a warning message; croak prints an error message and dies. They are very much like the Perl functions warn and die; they report the line number in which the problem occured in the error message and report from what subroutine they were called. I use croak in my code; it prints out the error message provided, names the file and the line number and subroutine where it's called from, and then kills the program.

This function is certainly useful during development because it's another way to find errors in a program as it's being created. It also gives program users the ability to report the exact location of a problem, should one occur, to the programming staff (which may be just one programmer, you!).

In my program output, the Carp message is:

no name at testGene1 line 35

It's produced by the line:

_name     => $arg{name}    || croak("no name"),

in the Gene1.pm module file. Line 35 of testGene1 is the beginning line of this part of the program:

my $obj3 = Gene1->new(
        organism        => "Homo sapiens",
        chromosome      => "23",
        pdbref          => "pdb9999.ent"

It's the part of the code that tries to do something bad: it's trying to initialize a new object without setting its name. You'll see how this works in more detail in the following sections.

3.6.3 The new Constructor Method

To create objects, I defined a special constructor method called new. A call to new returns a new object, properly initialized. The new object is also marked as a member of the class, in this case the class Gene1.

sub new {
        my ($class, %arg) = @_;
        return bless {
                _name       => $arg{name}        || croak("no name"),
                _organism   => $arg{organism}    || croak("no organism"),
                _chromosome => $arg{chromosome}  || "????",
                _pdbref     => $arg{pdbref}      || "????",
        }, $class;

Note that each class may have its own requirements for creating a class object, and so each class's constructor method may be different than that for another class.[3] For instance, a constructor may or may not provide default values for its attributes. Still, there's a lot of similarity between the constructor method of most classes.

[3] In particular, a constructor method may have any name in Perl; you could call it constructor, OverTheSun, or anything that you choose. Most programmers just use the very familiar name new.

Let's dissect the code of the constructor new. You'll see how objects are marked as members of a class, and initialized, by their constructor methods. Here are the main novelties:

  • The package name Gene1 is automatically passed to the subroutine new as its first argument, even though it isn't included in the argument list.

  • The returned hash reference is marked with the name Gene1 (using the bless function) thus making it an object in the Gene1 class.

Everything else here is straightforward Perl subroutine code.

Note that the call to new in the demonstration program testGene1 is made as follows:

my $obj = Gene1->new( ... );

The scalar variable $obj is a reference that points to the anonymous hash that's returned from the new method. The object is a hash that contains the attributes of the object, namely the key/value pairs of the hash. As usual the reference variable $obj is lexically scoped with my. And, as you see, $obj is marked with the class name Gene1.

The call Gene1->new includes the name of the package Gene1 in which the new subroutine is defined. The package name is the class name; the name of the module file in which the class is defined must be the package name with .pm added. So you have a class Gene1 in a module file Gene1.pm that has the declaration package Gene1;.

The call to new with its arguments is of the form:

Gene1->new( key1 => 'value1', key2 => 'value2', ... )

This call does two important things:

  1. It calls the new subroutine in the Gene1 package.

  2. It passes the name Gene1 of the package to the new subroutine as its first argument. Therefore, in the new subroutine, in the line that collects the arguments:

            my ($class, %arg) = @_;

    the first argument is automatically the string Gene1 and is assigned to the variable $class.

This first argument Gene1 isn't listed in the usual place in the parentheses after the subroutine name in the call to the subroutine:

new( key1 => 'value1', key2 => 'value2', ... )

It happens automatically when the package name is used with an arrow (->):

Gene1->new( key1 => 'value1', key2 => 'value2', ... )

This may seem a bit odd, but it has the desirable advantage of making it unnecessary to type the class name Gene1 twice: once to call the new method in the Gene1 package, and again to pass the class name Gene1 to the new method. Instead of typing:

Gene1->new( "Gene1", key1 => 'value1', key2 => 'value2', ... )

you can just type:

Gene1->new( key1 => 'value1', key2 => 'value2', ... )

It's simply a bit of handy syntax the designers of Perl added to save a bit of typing when writing OO code in Perl, nothing more or less.

Now, let's examine the innards of the new constructor method. The new constructor method has the form:

sub new {
        my ($class, %arg) = @_;
        return bless {

        }, $class;

First, notice that in addition to assigning the first argument, the class name Gene1, to the variable $class, the subroutine captures the rest of the arguments in the hash variable %arg.

Recall, from your previous study of Perl, that initializing a hash by assigning a list to it causes the items in the list to be treated as key/value pairs in the hash. For example, if the arguments are:

('Myclass', mykey1 => 'myvalue1', mykey2 => 'myvalue2')

the scalar variable $class gets the value Myclass, and the hash variable %arg gets two key/value pairs initialized to the key 'mykey1' with the value 'myvalue1', and the key 'mykey2' with the value 'myvalue2'. Also recall that => is a synonym for a comma.[4]

[4] It also forces its left side to be interpreted as a string and removes the need to surround the string in quotes, which is exactly what I want here.

3.6.4 Creating an Object with bless

The new constructor then returns the value of:

bless { ... }, $class;

The built-in Perl function bless does a very simple thing, but it's enough to take a data structure and make it an object in a class. It marks a reference with a class (package) name.

In this code, bless takes two arguments. The first, delimited by a pair of curly braces, is an anonymous hash, which you'll recall is a reference to an unnamed hash. This anonymous hash contains the data of the resulting object. The second argument to bless is just the name of the class, as it was saved in the $class scalar variable.

This call to bless returns a hash that is "marked" with the name of the class. The hash that bless marks is then given to the return function to serve as the returned value of the new method.

The object reference that is returned can now be identified as an object in the class Gene1. The object reference in this example is marked with the name Gene1 and has a hash as its top-level data structure. The new method in the class creates a new object in the class.

Although the first argument to bless in this code is an anonymous hash; in general, it can be any reference to a data structure that serves as an object. It can be a reference to a scalar, an array, a hash, or a more complex data structure. In the example, I am just declaring an anonymous hash in place rather than providing a reference to an existing hash. So, for example, if I declare a hash and a reference to it like so:

%hash = ( key1 => 'value1', key2 => 'value2' );
$hashref = \%hash;

then I can bless the hash, mark it with the class name HashClass, and save the resulting object:

$hashobj = bless $hashref, 'HashClass';

Alternatively, the same object $hashobj can be created using an anonymous hash, and one call to bless:

$hashobj = bless { key1 => 'value1', key2 => 'value2' }, 'HashClass';

3.6.5 Using ref to Report an Object's Class

The Perl function ref reports on the type of element referred to?variable, object, code, etc. If the variable is blessed, ref reports on the class it is marked with.

After the call to new to create the Gene1 object $obj, the line:

print ref $obj, "\n";

prints out as Gene1.

The Perl function ref returns false if its argument isn't a reference. If it is a reference, it returns one of the following:


If the reference has been blessed into a package, that package name is returned from the call to ref.

3.6.6 Initialize an Object with an Anonymous Hash

Here again is the complete definition of the new method in the Gene1 class:

sub new {
    my ($class, %arg) = @_;
    return bless {
        _name          => $arg{name}         || croak("no name"),
        _organism      => $arg{organism}     || croak("no organism"),
        _chromosome    => $arg{chromosome}   || "????",
        _pdbref        => $arg{pdbref}       || "????",
    }, $class;

The first argument to bless is the following anonymous hash:

        _name         => $arg{name}         || croak("no name"),
        _organism     => $arg{organism}     || croak("no organism"),
        _chromosome   => $arg{chromosome}   || "????",
        _pdbref       => $arg{pdbref}       || "????",

As should be familiar (if not, see Appendix A for a Perl refresher), the key/value pairs are separated by the "syntactic sugar" symbol =>. The keys are in the first column; the variable names _name, _organism, _chromosome, and _pdbref are used as the names of the keys.

The desired values are in the second column, following the => symbol. They are given in the form of a Perl logical OR operator. The value has either been passed in, or the default value is used:

value || default

The values are the values assigned from the argument list to the hash %arg upon entry to the subroutine. If all these arguments are passed to the new method, the hash initializes its four keys (_name, _organism, _chromosome, and _pdbref with those values).

If chromosome or pdbref, is passed to the new method, those values of %arg aren't defined, and the subroutine assigns the default value (the string ????) to the missing keys (_chromosome, _pdbref, or both).

If name or organism aren't passed as arguments to the new method, their values in %arg aren't defined, and by default, the subroutine calls croak and the program exits with an error message.

Let's look closely at a line in the Gene1.pm module that calls croak:

_name            => $arg{name}            || croak("no name"),

This line is part of a hash initialization. It is initializing an entry with a key _name. The value to be associated with this key is given as:

$arg{name}        || croak("no name")

This sets the value of the key to the value $arg{name} if that value exists. If $arg{name} doesn't exist, the value croak("no name") is evaluated. The behavior of ||(the or Boolean operator) is that the first argument is evaluated. The second argument is evaluated only if the first argument evaluates to false. In this code, the second argument kills the program and prints an error message when it is evaluated. This is a bit of a trick, but it's a common one that's used in several programming languages that have the Boolean or operator.

Now that you've seen how the new constructor handles its arguments, let's look again at how the test program testGene1 calls the new method, which it does three times:

my $obj1 = Gene1->new(
        name            => "Aging",
        organism        => "Homo sapiens",
        chromosome      => "23",
        pdbref          => "pdb9999.ent"

my $obj2 = Gene1->new(
        organism        => "Homo sapiens",
        name            => "Aging",

my $obj3 = Gene1->new(
        organism        => "Homo sapiens",
        chromosome      => "23",
        pdbref          => "pdb9999.ent"

The key/value pairs (the keys are the attributes of the objects) are passed to the new method. Notice that, due to the use of the %arg hash to capture these arguments by new, the order in which the arguments are passed isn't important. This is a nice convenience when creating and initializing objects because there are often many attributes and some may or may not be initialized; being able to ignore the order of the arguments when you call new makes it easier to program. Recall that it's a general property of Perl hashes that the order of the keys isn't important; it has to do with how hashes are implemented, and why they're so fast at retrieving values.

You'll recall that the use of croak in the new method requires the initialization of the name and organism attributes. For instance, $obj3 isn't created with an initial value for the name attribute. The new subroutine was defined to require such an initial value, which makes sense because, at the least, I want every gene in my program to have a name and an originating organism. The output of the testGene1 program shows that this third call to new triggers the croak exit mechanism.

3.6.7 Accessor Methods

Accessor methods are subroutines in the class that return the values of the class attributes. These attributes are usually implemented as keys of the hash that serves as the class object. You can access the attributes of an object, and their values, directly; for example, given an object of the Gene1 class, you can print out its name like so:

print $obj->{_name};

This gives the value of the key _name in the anonymous hash pointed to by $obj. This works; however, it's not good OO style. It directly accesses the data in the object; good style requires you to access the data through subroutines defined for that purpose. It is preferable to restrict all access of an object's attributes to the use of specific methods.

The actual attribute is called _name. This is initialized from the value of the argument name in the initialization of the arguments, as in this line from new:

_name            => $arg{name}       || croak("no name"),

That was just a convenient way to pass arguments to new, so you can say:

new( name => 'Ecoli' ) instead of new( _name => 'Ecoli' )

But you can just define a subroutine called, conveniently, name that returns the value of the attribute $obj->{_name}.

In my program, I have defined a method for each key in the hash. I have method name, which accesses the value of the key _name; I also have a similar method for each other key. Here's how to define a method to access the value of the key _name:

sub name        { $_[0] -> {_name}        }

This is called by the following line in the testGene1 program:

print $obj1->name, "\n";

It calls the method name for the object, which then accesses the value of the key _name in the object. In this way the actual implementation of the data that is stored in the object is kept hidden from users of the class methods. If the data is retrieved with a method, and if the author of the class decides at a later date to change the way the object stores its data, the users of the class can still get at the data by making the same method call. Only the internals of the method call will change; the behavior of the method, namely what arguments you give it and what return values you expect from it, stay the same. When the interface remains the same, the code that uses the class can also remain the same, saving everybody time and trouble, even when new versions of the class are developed.

The method name receives the object as its first argument because it is called by:


The body of the subroutine uses the Perl built-in @_ array to access its arguments. The first argument to the subroutine is referred to as $_[0]. That first argument is the object, a reference to a hash, so I give it the key _name to retrieve the desired value:

$_[0] -> {_name}

Finally, since by default a subroutine returns the value of the last statement executed, this subroutine returns the gene name it has retrieved from the object.