Introduction

Imagine that you have two separate programs, both of which work fine by themselves, and you decide to make a third program that combines the best features from the first two. You copy both programs into a new file or cut and paste selected pieces. You find that the two programs had variables and functions with the same names that should remain separate. For example, both might have an init function or a global $count variable. When merged into one program, these separate parts would interfere with each other.

The solution to this problem is packages. Perl uses packages to partition the global namespace. The package is the basis for both traditional modules and object-oriented classes. Just as directories contain files, packages contain identifiers. Every global identifier (variables, functions, file and directory handles, and formats) has two parts: its package name and the identifier proper. These two pieces are separated from one another with a double colon. For example, the variable $CGI::needs_binmode is a global variable named $needs_binmode, which resides in package CGI.

Where the filesystem uses slashes to separate the directory from the filename, Perl uses a double colon. $Names::startup is the variable named $startup in the package Names, whereas $Dates::startup is the $startup variable in package Dates. Saying $startup by itself without a package name means the global variable $startup in the current package. (This assumes that no lexical $startup variable is currently visible. Lexical variables are explained in Chapter 10.) When looking at an unqualified variable name, a lexical takes precedence over a global. Lexicals live in scopes; globals live in packages. If you really want the global instead, you need to fully qualify it.

package is a compile-time declaration that sets the default package prefix for unqualified global identifiers, much as chdir sets the default directory prefix for relative pathnames. This effect lasts until the end of the current scope (a brace-enclosed block, file, or eval). The effect is also terminated by any subsequent package statement in the same scope. (See the following code.) All programs are in package main until they use a package statement to change this.

package Alpha;
$name = "first";

package Omega;
$name = "last";

package main;
print "Alpha is $Alpha::name, Omega is $Omega::name.\n";
Alpha is first, Omega is last.

Unlike user-defined identifiers, built-in variables with punctuation names (like $_ and $.) and the identifiers STDIN, STDOUT, STDERR, ARGV, ARGVOUT, ENV, INC, and SIG are all forced to be in package main when unqualified. That way things like STDIN, @ARGV, %ENV, and $_ are always the same no matter what package you're in; for example, @ARGV always means @main::ARGV, even if you've used package to change the default package. A fully qualified @ElseWhere::ARGV would not, and carries no special built-in meaning. Make sure to localize $_ if you use it in your module.

Modules

The unit of software reuse in Perl is the module, a file containing related functions designed to be used by programs and other modules. Every module has a public interface, a set of variables and functions that outsiders are encouraged to use. From inside the module, the interface is defined by initializing certain package variables that the standard Exporter module looks at. From outside the module, the interface is accessed by importing symbols as a side effect of the use statement. The public interface of a Perl module is whatever is documented to be public. When we talk about modules in this chapter, and traditional modules in general, we mean those that use the Exporter.

The require and use statements load a module into your program, although their semantics vary slightly. require loads modules at runtime, with a check to avoid the redundant loading of a given module. use is like require, with two added properties: compile-time loading and automatic importing.

Modules included with use are processed at compile time, but require processing happens at runtime. This is important because if a module needed by a program is missing, the program won't even start because the use fails during compilation of your script. Another advantage of compile-time use over runtime require is that function prototypes in the module's subroutines become visible to the compiler. This matters because only the compiler cares about prototypes, not the interpreter. (Then again, we don't usually recommend prototypes except for replacing built-in commands, which do have them.)

use is suitable for giving hints to the compiler because of its compile-time behavior. A pragma is a special module that acts as a directive to the compiler to alter how Perl compiles your code. A pragma's name is always all lowercase, so when writing a regular module instead of a pragma, choose a name that starts with a capital letter. Pragmas supported by the v5.8.1 release of Perl include attributes, autouse, base, bigint, bignum, bigrat, bytes, charnames, constant, diagnostics, fields, filetest, if, integer, less, locale, open, overload, sigtrap, sort, strict, subs, utf8, vars, vmsish, and warnings. Each has its own manpage.

The other difference between require and use is that use performs an implicit import on the included module's package. Importing a function or variable from one package to another is a form of aliasing; that is, it makes two different names for the same underlying thing. It's like linking files from another directory into your current one by the command ln /somedir/somefile. Once it's linked in, you no longer have to use the full pathname to access the file. Likewise, an imported symbol no longer needs to be fully qualified by package name (or declared with our or the older use vars if a variable, or with use subs if a subroutine). You can use imported variables as though they were part of your package. If you imported $English::OUTPUT_AUTOFLUSH in the current package, you could refer to it as $OUTPUT_AUTOFLUSH.

The required file extension for a Perl module is .pm. The module named FileHandle would be stored in the file FileHandle.pm. The full path to the file depends on your include path, which is stored in the global @INC variable. Recipe 12.8 shows how to manipulate this array for your own purposes.

If the module name itself contains any double colons, these are translated into your system's directory separator. That means that the File::Find module resides in the file File/Find.pm under most filesystems. For example:

require "FileHandle.pm";            # runtime load
require FileHandle;                 # ".pm" assumed; same as previous
use FileHandle;                     # compile-time load

require "Cards/Poker.pm";           # runtime load
require Cards::Poker;               # ".pm" assumed; same as previous
use Cards::Poker;                   # compile-time load

Import/Export Regulations

The following is a typical setup for a hypothetical module named Cards::Poker that demonstrates how to manage its exports. The code goes in the file named Poker.pm within the directory Cards; that is, Cards/Poker.pm. (See Recipe 12.8 for where the Cards directory should reside.) Here's that file, with line numbers included for reference:

1    package Cards::Poker;
2    use Exporter;
3    @ISA = ("Exporter");
4    @EXPORT = qw(&shuffle @card_deck);
5    @card_deck = ( );                       # initialize package global
6    sub shuffle { }                        # fill-in definition later
7    1;                                     # don't forget this

Line 1 declares the package that the module will put its global variables and functions in. Typically, a module first switches to a particular package so that it has its own place for global variables and functions, one that won't conflict with that of another program. This package name must be written exactly as in the corresponding use statement when the module is loaded.

Don't say package Poker just because the basename of your file is Poker.pm. Rather, say package Cards::Poker because your users will say use Cards::Poker. This common problem is hard to debug. If you don't make the package names specified by the package and use statements identical, you won't see a problem until you try to call imported functions or access imported variables, which will be mysteriously missing.

Line 2 loads in the Exporter module, which manages your module's public interface as described later. Line 3 initializes the special, per-package array @ISA to contain the word "Exporter". When a user says use Cards::Poker, Perl implicitly calls a special method, Cards::Poker->import( ). You don't have an import method in your package, but that's okay, because the Exporter package does, and you're inheriting from it because of the assignment to @ISA (is a). Perl looks at the package's @ISA for resolution of undefined methods. Inheritance is a topic of Chapter 13. You may ignore it for nowso long as you put code like that in lines 2 and 3 into each module you write.

Line 4 assigns the list ('&shuffle', '@card_deck') to the special, per-package array @EXPORT. When someone imports this module, variables and functions listed in that array are aliased into the caller's own package. That way they don't have to call the function Cards::Poke::shuffle(23) after the import. They can just write shuffle(23) instead. This won't happen if they load Cards::Poker with require Cards::Poker; only a use imports.

Lines 5 and 6 set up the package global variables and functions to be exported. (We presume you'll actually flesh out their initializations and definitions more than in these examples.) You're free to add other variables and functions to your module, including ones you don't put in the public interface via @EXPORT. See Recipe 12.1 for more about using the Exporter.

Finally, line 7 is a simple 1, indicating the overall return value of the module. If the last evaluated expression in the module doesn't produce a true value, an exception will be raised. Trapping this is the topic of Recipe 12.2.

Packages group and organize global identifiers. They have nothing to do with privacy. Code compiled in package Church can freely examine and alter variables in package State. Package variables are always global and are used for sharing. But that's okay, because a module is more than just a package; it's also a file, and files count as their own scope. So if you want privacy, use lexical variables instead of globals. This is the topic of Recipe 12.4.

Other Kinds of Library Files

A library is a collection of loosely related functions designed to be used by other programs. It lacks the rigorous semantics of a Perl module. The file extension .pl indicates that it's a Perl library file. Examples include syslog.pl and abbrev.pl. These are included with the standard release for compatibility with prehistoric scripts written under Perl v4 or below.

Perl librariesor in fact, any arbitrary file with Perl code in itcan be loaded in using do "file.pl" or with require "file.pl". The latter is preferred in most situations, because unlike do, require does implicit error checking. It raises an exception if the file can't be found in your @INC path, doesn't compile, or if it doesn't return a true value when any initialization code is run (the last part is what the 1 was for earlier). Another advantage of require is that it keeps track of which files have already been loaded in the global hash %INC. It doesn't reload the file if %INC indicates that the file has already been read.

Libraries work well when used by a program, but problems arise when libraries use one another. Consequently, simple Perl libraries have been rendered mostly obsolete, replaced by the more modern modules. But some programs still use libraries, usually loading them in with require instead of do.

Other file extensions are occasionally seen in Perl. A .ph is used for C header files that have been translated into Perl libraries using the h2ph tool, as discussed in Recipe 12.17. A .xs indicates an augmented C source file, possibly created by the h2xs tool, which will be compiled by the xsubpp tool and your C compiler into native machine code. This process of creating mixed-language modules is discussed in Recipe 12.18.

So far we've talked only about traditional modules, which export their interface by allowing the caller direct access to particular subroutines and variables. Most modules fall into this category. But some problemsand some programmerslend themselves to more intricately designed modules: those involving objects. An object-oriented module seldom uses the import-export mechanism at all. Instead, it provides an object-oriented interface full of constructors, destructors, methods, inheritance, and operator overloading. This is the subject of Chapter 13.

Not Reinventing the Wheel

CPAN, the Comprehensive Perl Archive Network, is a gigantic repository of nearly everything about Perl you could imagine, including source, documentation, alternate ports, and above all, modulessome 4,500 of them as of spring of 2003. Before you write a new module, check with CPAN to see whether one already exists that does what you need. Even if one doesn't, something close enough might give you ideas.

CPAN is a replicated archive, currently mirrored on nearly 250 sites. Access CPAN via http://www.cpan.org/. If you just want to poke around, you can manually browse through the directories there. There are many indices, including listings of just new modules and of all modules organized by name, author, or category.

A convenient alternative to picking through thousands of modules is the search engine available at http://search.cpan.org/. You can search for modules by their name or author, but the facility for grepping through all registered modules' documentation is often more useful. That way you don't have download and install a module just to see what it's supposed to do.

See Also

Chapters 10, 11, and 22 of Programming Perl; perlmod(1)