1.9 CPAN Modules

The Comprehensive Perl Archive Network (CPAN, http://www.cpan.org) is an impressively large collection of Perl code (mostly Perl modules). CPAN is easily accessible and searchable on the Web, and you can use its modules for a variety of programming tasks.

By now you should have the basic idea of how modules are defined and used, so let's take some time to explore CPAN to see what goodies are available.

There are two important points about CPAN. First, a large number of the things you might want your programs to do have already been programmed and are easily obtained in downloadable modules. You just have to go find them at CPAN, install them on your computer, and call them from your program. We'll take a look at an example of exactly that in this section.

Second, all code on CPAN is free of charge and available for use by a very unrestrictive copyright declaration. Sound good? Keep reading.

CPAN includes convenient ways to search for useful modules, and there's a CPAN.pm module built-in with Perl that makes downloading and installing modules quite easy (when things work well, which they usually do). If you can't find CPAN.pm, you should consider updating your current version.

You can find more information by typing the following at the command line:

perldoc CPAN

You can also check the Frequently Asked Questions (FAQ) available at the CPAN web site.

1.9.1 What's Available at CPAN?

The CPAN web site offers several "views" of the CPAN collection of modules and several alternate ways of searching (by module name, category, full text search of the module documentation, etc.). Here is the top-level organization of the modules by overall category:

Development Support
Operating System Interfaces
Networking Devices IPC
Data Type Utilities
Database Interfaces
User Interfaces
Language Interfaces
File Names Systems Locking
String Lang Text Proc
Opt Arg Param Proc
Internationalization Locale
Security and Encryption
World Wide Web HTML HTTP CGI
Server and Daemon Utilities
Archiving and Compression
Images Pixmaps Bitmaps
Mail and Usenet News
Control Flow Utilities
File Handle Input Output
Microsoft Windows Modules
Miscellaneous Modules
Commercial Software Interfaces
Not In Modulelist

1.9.2 Searching CPAN

CPAN's main web page has a few ways to search the contents. Let's say you need to perform some statistics and are looking for code that's already available. We'll go through the steps necessary to search for the code, download and install it, and use the module in a program.

At the main CPAN page, look for "searching" and click on search.cpan.org. If you search for "statistics" in all locations, you'll get over 300 hits, so you should restrict your search to modules with the pull-down menu. You'll get 25 hits (more by the time you read this); here's what you'll see:

1.  Statistics::Candidates
Statistics-MaxEntropy-0.9 - 26 Nov 1998 - Hugo WL ter Doest

2. Statistics::ChiSquare
How random is your data?
Statistics-ChiSquare-0.3 - 23 Nov 2001 - Jon Orwant

3. Statistics::Contingency
Calculate precision, recall, F1, accuracy, etc.
Statistics-Contingency-0.03 - 09 Aug 2002 - Ken Williams

4. Statistics::DEA
Discontiguous Exponential Averaging
Statistics-DEA-0.04 - 17 Aug 2002 - Jarkko Hietaniemi

5. Statistics::Descriptive
Module of basic descriptive statistical functions.
Statistics-Descriptive-2.4 - 26 Apr 1999 - Colin Kuskie

6. Statistics::Distributions
Perl module for calculating critical values of common statistical distributions
Statistics-Distributions-0.07 - 22 Jun 2001 - Michael Kospach

7. Statistics::Frequency
simple counting of elements
Statistics-Frequency-0.02 - 24 Apr 2002 - Jarkko Hietaniemi

8. Statistics::GaussHelmert
General weighted least squares estimation
Statistics-GaussHelmert-0.05 - 18 Apr 2002 - Stephan Heuel

9. Statistics::LTU
An implementation of Linear Threshold Units
Statistics-LTU-2.8 - 27 Feb 1997 - Tom Fawcett

10. Statistics::Lite
Small stats stuff.
Statistics-Lite-1.02 - 15 Apr 2002 - Brian Lalonde 

11.  Statistics::MaxEntropy
Statistics-MaxEntropy-0.9 - 26 Nov 1998 - Hugo WL ter Doest

12. Statistics::OLS
perform ordinary least squares and associated statistics, v 0.07.
Statistics-OLS-0.07 - 13 Oct 2000 - Sanford Morton

13. Statistics::ROC
receiver-operator-characteristic (ROC) curves with nonparametric confidence bounds
Statistics-ROC-0.01 - 22 Jul 1998 - Hans A. Kestler

14. Statistics::Regression
weighted linear regression package (line+plane fitting)
StatisticsRegression - 26 May 2001 - ivo welch

15. Statistics::SparseVector
Perl5 extension for manipulating sparse bitvectors
Statistics-MaxEntropy-0.9 - 26 Nov 1998 - Hugo WL ter Doest

16. Statistics::Descriptive::Discrete
Compute descriptive statistics for discrete data sets.
Statistics-Descriptive-Discrete-0.07 - 13 Jun 2002 - Rhet Turnbull

17. Bio::Tree::Statistics
Calculate certain statistics for a Tree
bioperl-1.0.2 - 16 Jul 2002 - Ewan Birney

18. Device::ISDN::OCLM::Statistics
OCLM statistics superclass
Device-ISDN-OCLM-0.40 - 02 Jan 2000 - Merlin Hughes

19. Device::ISDN::OCLM::CurrentStatistics
OCLM current call statistics
Device-ISDN-OCLM-0.40 - 02 Jan 2000 - Merlin Hughes

20. Device::ISDN::OCLM::ISDNStatistics
OCLM ISDN statistics
Device-ISDN-OCLM-0.40 - 02 Jan 2000 - Merlin Hughes 

21.  Device::ISDN::OCLM::Last10Statistics
OCLM Last10 call statistics
Device-ISDN-OCLM-0.40 - 02 Jan 2000 - Merlin Hughes

22. Device::ISDN::OCLM::LastStatistics
OCLM last call statistics
Device-ISDN-OCLM-0.40 - 02 Jan 2000 - Merlin Hughes

23. Device::ISDN::OCLM::ManualStatistics
OCLM manual call statistics
Device-ISDN-OCLM-0.40 - 02 Jan 2000 - Merlin Hughes

24. Device::ISDN::OCLM::SPStatistics
OCLM service provider statistics
Device-ISDN-OCLM-0.40 - 02 Jan 2000 - Merlin Hughes

25. Device::ISDN::OCLM::SystemStatistics
OCLM system statistics
Device-ISDN-OCLM-0.40 - 02 Jan 2000 - Merlin Hughes

Let's check out the Statistics::ChiSquare module.

First, click on the link to Statistics::ChiSquare; you'll see a summary of the module, complete with a description, overview, discussion of the method, examples of use, and information about the author.

One of the modules looks interesting; let's download and install it. How big is the source code? If you click on the source link, you'll find that the module is really just one short subroutine with the documentation defined right in the module. Here's the subroutine definition part of the module:

package Statistics::ChiSquare;

# ChiSquare.pm
#
# Jon Orwant, orwant@media.mit.edu
#
# 31 Oct 95, revised Mon Oct 18 12:16:47 1999, and again November 2001
# to fix an off-by-one error
#
# Copyright 1995, 1999, 2001 Jon Orwant.  All rights reserved.
# This program is free software; you can redistribute it and/or
# modify it under the same terms as Perl itself.
# 
# Version 0.3.  Module list status is "Rdpf"

use strict;
use vars qw($VERSION @ISA @EXPORT);

require Exporter;
require AutoLoader;

@ISA = qw(Exporter AutoLoader);
# Items to export into callers namespace by default. Note: do not export
# names by default without a very good reason. Use EXPORT_OK instead.
# Do not simply export all your public functions/methods/constants.
@EXPORT = qw(chisquare);

$VERSION = '0.3';

my @chilevels = (100, 99, 95, 90, 70, 50, 30, 10, 5, 1);
my %chitable = (  );

# assume the expected probability distribution is uniform
sub chisquare {
    my @data = @_;
    @data = @{$data[0]} if @data =  = 1 and ref($data[0]);
    my $degrees_of_freedom = scalar(@data) - 1;
    my ($chisquare, $num_samples, $expected, $i) = (0, 0, 0, 0);
    if (! exists($chitable{$degrees_of_freedom})) {
        return "I can't handle ", scalar(@data), 
        " choices without a better table.";
    }
    foreach (@data) { $num_samples += $_ }
    $expected = $num_samples / scalar(@data);
    return "There's no data!" unless $expected;
    foreach (@data) {
        $chisquare += (($_ - $expected) ** 2) / $expected;
    }
    foreach (@{$chitable{$degrees_of_freedom}}) {
        if ($chisquare < $_) {
            return
             "There's a <$chilevels[$i+1]% and <$chilevels[$i]% chance that this data 
                    is random.";
        }
        $i++;
    }
    return "There's a <$chilevels[$#chilevels]% chance that this data is random.";
}
$chitable{1} = [0.00016, 0.0039, 0.016, 0.15, 0.46, 1.07, 2.71, 3.84, 6.64];
$chitable{2} = [0.020,   0.10,   0.21,  0.71, 1.39, 2.41, 4.60, 5.99, 9.21];
$chitable{3} = [0.12,    0.35,   0.58,  1.42, 2.37, 3.67, 6.25, 7.82, 11.34];
$chitable{4} = [0.30,    0.71,   1.06,  2.20, 3.36, 4.88, 7.78, 9.49, 13.28];
$chitable{5} = [0.55,    1.14,   1.61,  3.00, 4.35, 6.06, 9.24, 11.07, 15.09];
$chitable{6} = [0.87,    1.64,   2.20,  3.83, 5.35, 7.23, 10.65, 12.59, 16.81];
$chitable{7} = [1.24,    2.17,   2.83,  4.67, 6.35, 8.38, 12.02, 14.07, 18.48];
$chitable{8} = [1.65,    2.73,   3.49,  5.53, 7.34, 9.52, 13.36, 15.51, 20.09];
$chitable{9} = [2.09,    3.33,   4.17, 6.39, 8.34, 10.66, 14.68, 16.92, 21.67];
$chitable{10} = [2.56,   3.94,   4.86, 7.27, 9.34, 11.78, 15.99, 18.31, 23.21];
$chitable{11} = [3.05,   4.58,  5.58, 8.15, 10.34, 12.90, 17.28, 19.68, 24.73];
$chitable{12} = [3.57,   5.23, 6.30, 9.03, 11.34, 14.01, 18.55, 21.03, 26.22];
$chitable{13} = [4.11,   5.89, 7.04, 9.93, 12.34, 15.12, 19.81, 22.36, 27.69];
$chitable{14} = [4.66,   6.57, 7.79, 10.82, 13.34, 16.22, 21.06, 23.69, 29.14];
$chitable{15} = [5.23,   7.26, 8.55, 11.72, 14.34, 17.32, 22.31, 25.00, 30.58];
$chitable{16} = [5.81,   7.96, 9.31, 12.62, 15.34, 18.42, 23.54, 26.30, 32.00];
$chitable{17} = [6.41,  8.67, 10.09, 13.53, 16.34, 19.51, 24.77, 27.59, 33.41];
$chitable{18} = [7.00,  9.39, 10.87, 14.44, 17.34, 20.60, 25.99, 28.87, 34.81];
$chitable{19} = [7.63, 10.12, 11.65, 15.35, 18.34, 21.69, 27.20, 30.14, 36.19];
$chitable{20} = [8.26, 10.85, 12.44, 16.27, 19.34, 22.78, 28.41, 31.41, 37.57];

1;

Some of this code will look familiar; some may not. Check out the use of package, use strict, and require Exporter; they're parts of Perl you've just seen.

You'll also see references to version, Autoloader, use vars, and an initialization of a multidimensional array chitable, which will be covered later. For now, you may want to take a quick read-through of the code and get some personal satisfaction at how much of it makes sense.

Indeed, one of the really nice things about most modules is that you don't really have to read the code very often. Usually you can just install the module, read enough of the documentation to see how to call it from your program, and you're off and running. Let's take that approach now.

1.9.3 Installing Modules Using CPAN.pm

Our next task is to install the module using CPAN.pm. This section contains a log from when I installed Statistics::ChiSquare on my Linux computer using CPAN.pm.

In fact, to make things easy, here's the section of the CPAN FAQ that addresses installing modules:

How do I install Perl modules?

Installing a new module can be as simple as typing

perl -MCPAN -e 'install Chocolate::Belgian'.

The CPAN.pm documentation has more complete instructions on how to use
this convenient tool.  If you are uncomfortable with having something
take that much control over your software installation, or it otherwise
doesn't work for you, the perlmodinstall documentation covers
module installation for UNIX, Windows and Macintosh in more familiar terms.

Finally, if you're using ActivePerl on Windows, the PPM (Perl Package Manager)
has much of the same functionality as CPAN.pm.

The following is my install log. Notice that all I have to do is type a couple of lines, and everything else that follows is automatic!

[tisdall@coltrane tisdall]$ perl -MCPAN -e 'install Statistics::ChiSquare'
CPAN: Storable loaded ok
mkdir /root/.cpan: Permission denied at /usr/local/lib/perl5/5.6.1/CPAN.pm line 2218
[tisdall@coltrane tisdall]$ su
Password: 
[root@coltrane tisdall]# perl -MCPAN -e 'install Statistics::ChiSquare'
CPAN: Storable loaded ok
Going to read /root/.cpan/Metadata
  Database was generated on Wed, 20 Mar 2002 00:39:29 GMT
CPAN: LWP::UserAgent loaded ok
Fetching with LWP:
  ftp://cpan.cse.msu.edu/authors/01mailrc.txt.gz
Going to read /root/.cpan/sources/authors/01mailrc.txt.gz
CPAN: Compress::Zlib loaded ok
Fetching with LWP:
  ftp://cpan.cse.msu.edu/modules/02packages.details.txt.gz
Going to read /root/.cpan/sources/modules/02packages.details.txt.gz
  Database was generated on Mon, 26 Aug 2002 00:22:07 GMT

  There's a new CPAN.pm version (v1.62) available!
  [Current version is v1.59_54]
  You might want to try
    install Bundle::CPAN
    reload cpan
  without quitting the current session. It should be a seamless upgrade
  while we are running...

Fetching with LWP:
  ftp://cpan.cse.msu.edu/modules/03modlist.data.gz
Going to read /root/.cpan/sources/modules/03modlist.data.gz
Going to write /root/.cpan/Metadata
Running install for module Statistics::ChiSquare
Running make for J/JO/JONO/Statistics-ChiSquare-0.3.tar.gz
Fetching with LWP:
  ftp://cpan.cse.msu.edu/authors/id/J/JO/JONO/Statistics-ChiSquare-0.3.tar.gz
CPAN: MD5 loaded ok
Fetching with LWP:
  ftp://cpan.cse.msu.edu/authors/id/J/JO/JONO/CHECKSUMS
Checksum for /root/.cpan/sources/authors/id/J/JO/JONO/Statistics-ChiSquare-0.3.
     tar.gz ok
Scanning cache /root/.cpan/build for sizes
Deleting from cache: /root/.cpan/build/IO-stringy-2.108 (21.4>20.0 MB)
Deleting from cache: /root/.cpan/build/XML-Node-0.11 (20.8>20.0 MB)
Deleting from cache: /root/.cpan/build/bioperl-0.7.2 (20.7>20.0 MB)
Statistics/ChiSquare-0.3/
Statistics/ChiSquare-0.3/ChiSquare.pm
Statistics/ChiSquare-0.3/Makefile.PL
Statistics/ChiSquare-0.3/test.pl
Statistics/ChiSquare-0.3/Changes
Statistics/ChiSquare-0.3/MANIFEST
Package seems to come without Makefile.PL.
  (The test -f "/root/.cpan/build/Statistics/Makefile.PL" returned false.)
  Writing one on our own (setting NAME to StatisticsChiSquare)

  CPAN.pm: Going to build J/JO/JONO/Statistics-ChiSquare-0.3.tar.gz

Checking if your kit is complete...
Looks good
Writing Makefile for Statistics::ChiSquare
Writing Makefile for StatisticsChiSquare
make[1]: Entering directory `/root/.cpan/build/Statistics/ChiSquare-0.3'
cp ChiSquare.pm ../blib/lib/Statistics/ChiSquare.pm
AutoSplitting ../blib/lib/Statistics/ChiSquare.pm (../blib/lib/auto/
     Statistics/ChiSquare)
Manifying ../blib/man3/Statistics::ChiSquare.3
make[1]: Leaving directory `/root/.cpan/build/Statistics/ChiSquare-0.3'
  /usr/bin/make  -- OK
Running make test
make[1]: Entering directory `/root/.cpan/build/Statistics/ChiSquare-0.3'
make[1]: Leaving directory `/root/.cpan/build/Statistics/ChiSquare-0.3'
make[1]: Entering directory `/root/.cpan/build/Statistics/ChiSquare-0.3'
PERL_DL_NONLAZY=1 /usr/bin/perl -I../blib/arch -I../blib/lib -I/usr/local/lib/
     perl5/5.6.1/i686-linux -I/usr/local/lib/perl5/5.6.1 test.pl
1..2
ok 1
ok 2
make[1]: Leaving directory `/root/.cpan/build/Statistics/ChiSquare-0.3'
  /usr/bin/make test -- OK
Running make install
make[1]: Entering directory `/root/.cpan/build/Statistics/ChiSquare-0.3'
make[1]: Leaving directory `/root/.cpan/build/Statistics/ChiSquare-0.3'
Installing /usr/local/lib/perl5/site_perl/5.6.1/Statistics/ChiSquare.pm
Installing /usr/local/lib/perl5/site_perl/5.6.1/auto/Statistics/ChiSquare/
     autosplit.ix
Installing /usr/local/man/man3/Statistics::ChiSquare.3
Writing /usr/local/lib/perl5/site_perl/5.6.1/i686-linux/auto/
    StatisticsChiSquare/.packlist
Appending installation info to /usr/local/lib/perl5/5.6.1/i686-linux/perllocal.pod
  /usr/bin/make install UNINST=1 -- OK
[root@coltrane tisdall]#

This may seem like a confusing amount of output, but, again, all you have to do is type a couple of lines, and the installation follows automatically.

You may get something like the following message when you try to install a CPAN module:

[tisdall@coltrane tisdall]$ perl -MCPAN -e 'install Statistics::ChiSquare'
CPAN: Storable loaded ok
mkdir /root/.cpan: Permission denied at /usr/local/lib/perl5/5.6.1/CPAN.pm line 2218

As you can see, it didn't work, and it produced an error message. On Unix machines, it's often necessary to become root to install things.[2] In that case, use the Unix su command and try the CPAN command again:

[2] You may need to contact your system administrator about getting root permission. The CPAN documentation discusses how to do a non-root installation. If you're not on a Unix or Linux machine and are using ActiveState's Perl on a Windows machine, for instance, you need to consult that documentation.

[tisdall@coltrane tisdall]$ su
Password: 
[root@coltrane tisdall]# perl -MCPAN -e 'install Statistics::ChiSquare'

Great, it worked. If you look over the rather verbose output, you'll see that it finds the module, installs it, tests it, and logs the installation.

Pretty easy, huh?

It's usually this easy, but not always. Occasionally, errors result, and the module may not be installed. In that case, the error messages may be enough to explain the problem; for instance, the module may depend on another module you have to install first. Another problem is that some modules haven't been tested on, or even designed to work on, all operating systems; if you try to install a Windows-specific module on Linux, it is likely to complain. In extreme cases, the module documentation usually provides the author's email address.

1.9.4 Using the Newly Installed CPAN Module

Now comes the payoff. Let's look again at the documentation for the module and see if we can use it from our own Perl code.

Now that the module is installed, you can see the documentation by typing:

perldoc Statistics::ChiSquare

You can also simply go back to the web documentation found at http://search.cpan.org. Either way, you'll find the following example using this ChiSquare module:

NAME
       "Statistics::ChiSquare" - How random is your data?

SYNOPSIS
        use Statistics::Chisquare;
        print chisquare(@array_of_numbers);
        Statistics::ChiSquare is available at a CPAN site near
        you.

DESCRIPTION
        Suppose you flip a coin 100 times, and it turns up heads
        70 times.  Is the coin fair?
        Suppose you roll a die 100 times, and it shows 30 sixes.
        Is the die loaded?
        In statistics, the chi-square test calculates "how random"
        a series of numbers is.  But it doesn't simply say "yes"
        or "no".  Instead, it gives you a confidence interval,
        which sets upper and lower bounds on the likelihood that
        the variation in your data is due to chance.  See the
        examples below.
...

The documentation continues with more discussion and some concrete examples that use the module and interpret the results.

Very often, the SYNOPSIS part of the documentation is all you need to look at. It shows you specific examples of how to call the code in the module. In this case, because it's a very simple module, there is just one subroutine that can be used. As you see from the documentation excerpt, you just need to pass the chisquare subroutine an array of numbers and print out the return value to use the code. Let's try it. We'll take as our input an array of numbers that corresponds to the stops of the Broadway-7th Avenue local subway train on the west side of Manhattan, from 14th Street up to 137th Street in Harlem. (We'll assume you didn't run fast enough and missed the A train.) Let's see how random these stops really are:

use strict;
use warnings;

use Statistics::ChiSquare;

my(@subwaystops) = (14, 18, 23, 28, 34, 42, 50, 59, 66, 72, 79, 86, 96, 103, 110, 
116, 125, 137);

print chisquare(@subwaystops);

This produces the output:

There's a <1% chance that this data is random.

(Knowing firsthand the feelings of long-suffering New York City Subway riders, I predict that this result might provoke some spirited discussion. Nevertheless, we seem to have working code.)

1.9.5 Problems with CPAN Modules

Actually, the sharp-eyed reader may have noticed a problem in our mad dash uptown. In the first line of the SYNOPSIS section, there's the following:

use Statistics::Chisquare;

The name of the module is spelled Chisquare, whereas in all other places in the documentation the module is spelled ChiSquare with a capital S. In Perl, the case of a letter, uppercase or lowercase, is important, and this looks suspiciously like a typographical error in the documentation. If you try use Statistics::Chisquare, you'll discover that the module can't be found, whereas if you try use Statistics::ChiSquare, the module is there. This is a minor bug, but some modules have poor documentation, and it can be a time-consuming problem, especially if you are forced to wade into the module code or try various tests, to figure out how the module works.

Apart from bugs, I've also mentioned the problem that some modules are not tested, or designed, for all operating systems. In addition, many modules require other modules to be present. It's possible to configure CPAN to automatically install all the required modules a requested module uses, as described in the CPAN documentation, but you may need to intervene personally. It's useful to remember that if you have a program that uses a certain module running on one computer, and you move the program to another computer, you may have to install the required modules on the new computer as well.

Saving the worst for last, it's also important to remember that contributing to CPAN is open to one and all, and not all the code there is well-written or well-tested. The heavily used modules are, but counterexamples can be found. So, don't bet the farm on your code just because it uses a CPAN module; you should still carefully read the documentation for the module and test your program.

The CPAN FAQ explains in detail the way to be a good citizen when it comes to testing and reporting bugs that you discover in CPAN code.