Introduction

The mod_perl project (http://perl.apache.org/) integrates Perl with the Apache web server. That way, you can use Perl to configure Apache, manipulate and respond to requests, write to log files, and much more.

Most people begin using mod_perl to avoid the performance penalty of CGI. With CGI programs, the web server starts a separate process for each request. This can be a costly business on most operating systems, with lots of kernel data structures to copy and file I/O to load the new process's binary. If you serve a lot of requests, the operating system may be unable to keep up with the demand for new processes, leaving your web server (and indeed the whole machine) unresponsive.

By embedding the Perl interpreter within the Apache process, mod_perl removes the need to start a separate process to generate dynamic content. Indeed, the Apache::Registry and Apache::PerlRun modules provide a CGI environment within this persistent Perl interpreter (and form the basis of Recipe 21.12). This gives you an immediate performance boost over CGI (some report 10-100x performance) but doesn't take full advantage of the integration of Perl with Apache. For that, you need to write your own handlers.

Handlers

Because Apache has access to Perl at every step as it processes a request (and vice versa), you can write code (handlers) for every phase of a request-response cycle. There are 13 phases for which you can write handlers, and each phase has a default handler (so you don't have to install a handler for every phase).

You must do three things to install a handler for a specific phase: write the code, load the code into mod_perl, and tell mod_perl to call the code.

Handlers are simply subroutines. They're passed an Apache request object as the first argument, and through that object they can learn about the request, change Apache's information about the request, log errors, generate the response, and more. The return value of a handler determines whether the current phase continues with other handlers, the current phase ends successfully and execution proceeds to the next phase, or the current phase ends with an error. The return values are constants from the Apache::Constants module.

Although you can put your handler code in Apache's httpd.conf file, it's tidier to put your handlers in a module:

# in MyApp/Content.pm
package MyApp::Content;
use Apache::Constants ':common';

sub handler {
  my $r = shift;   # get the request object
  # ...
  return OK;       # for example
}

The subroutine can be named anything, but mod_perl makes it convenient to name every handler subroutine handler and to store different handlers in different modules. So MyApp::Content holds the handler for content generation, whereas MyApp::Logging might hold the handler that logs the request.

Because the Perl interpreter doesn't go away after each request, you have to program tidily if you want to use mod_perl. This means using lexical (my) variables instead of globals and closing filehandles when done with them (or using lexical filehandles). Unclosed filehandles remain open until the next time that process runs your CGI script (when they are reopened), and global variables whose values aren't undefed will still have those values the next time that process runs your CGI script. The mod_perl_traps manpage that comes with mod_perl contains details of common mod_perl gotchas.

Load your handler module with a PerlModule directive in httpd.conf:

PerlModule MyApp::Content

This behaves like use in a Perl script: it loads and runs the module. Now that mod_perl has your code loaded, tell Apache to call it.

Directives used in httpd.conf to install handlers are:

PerlChildInitHandler
PerlPostReadRequestHandler
PerlInitHandler
PerlTransHandler
PerlHeaderParserHandler
PerlAccessHandler
PerlAuthenHandler
PerlAuthzHandler
PerlTypeHandler
PerlFixupHandler
PerlHandler
PerlLogHandler
PerlCleanupHandler
PerlChildExitHandler
PerlDispatchHandler
PerlRestartHandler

Apache Phases

Understanding the phases of a request-response transaction requires some knowledge of how Apache works and consequences of the various ways of configuring it. Apache keeps a pool of server processes (children) to handle requests in parallel. The ChildInit and ChildExit phases represent the start and end of a child process, respectively.

A PostReadRequestHandler handler is called as soon as Apache reads the request from the client. Apache extracts the URL and virtual host name, but doesn't yet attempt to figure out to which file the request maps. Therefore you can't install such a handler from a .htaccess file or the <Location>, <Directory>, or <Files> sections (or their *Match variants) of httpd.conf.

The translation phase is responsible for decoding the incoming request and guessing the file that corresponds to the URL. It is here that you could affect your own aliases and redirects. Once Apache knows the requested URL and the corresponding file to look for, it can check the <Location>, <Directory>, and <Files> sections of httpd.conf and begin looking for .htaccess files. Install a translation handler with PerlTransHandler.

The header parsing phase is misleadingly named. The headers have already been parsed and stored in the request object. The intent of this phase is to give you an opportunity to act based on the headers once you know the file that the URL corresponds to. You can examine headers within a PostReadRequestHandler, but the file isn't known yet. PostReadRequestHandler is per-server, whereas HeaderParserHandler can be per-location, per-file, or per-directory. This is the first phase of the request for which you can install a handler from any part of an httpd.conf or .htaccess file.

The PerlInitHandler is an alias for "the first available handler." Inside the <Location>, <Directory>, and <Files> sections of httpd.conf or anywhere in a .htaccess file, it is an alias for PerlHeaderParserHandler. Everywhere else, PerlInitHandler is an alias for PerlPostReadRequestHandler.

Next come the authorization and authentication phases. Add a PerlAccessHandler to limit access without requiring usernames and passwords. The authentication phase decodes the username and password from the request and decides whether the user is a valid one. The authorization phase determines whether the user is allowed to access the requested resource. Apache splits authentication from authorization so separate areas of your web site can share a user database but grant different types of access to each area. We talk about writing authentication and authorization handlers in Recipe 21.1. Most people stick to basic authentication, which trivially encodes the password as part of the request header. If you want more secure authentication, you can use digest authentication (which is tricky to implement in a way that works on all browsers) or simply encrypt the entire request by using https:// URLs to a secure server.

Once Apache has established that the client is allowed to access the requested document, the type determination phase occurs. Here Apache checks httpd.conf and .htaccess to see whether a specific content type has been forced on the requested file. If not, it uses the filename and its list of MIME types to figure out the file type. You can install a PerlTypeHandler to determine your own types.

Apache then offers you the chance to make any last-minute changes to the request via PerlFixupHandler. We use it in Recipe 21.10 to reinsert part of the URL removed earlier in a PerlHeaderParserHandler.

Then a handler must generate content. This is such a common use for mod_perl that the directive to install a content handler is simply PerlHandler. Once the content is generated, the logging phase begins, and it is normally here that the access log entry is written. You can, of course, write your own logging code to replace or augment Apache's (for example, logging to a database). This is the subject of Recipe 21.9.

The logging phase occurs before the connection to the client is closed. You can install code to run after the response is sent through a PerlCleanupHandler. Because a slow logging handler keeps the connection open (and thus the child waiting for more responses), a common mod_perl idiom is to use the cleanup phase for logging when the act of logging could take a long time (for example, when it involves a lot of I/O). Using the cleanup phase to actually clean up turns out to be rare.

That concludes the main phases and handlers. There are other handlers you can install. We don't use PerlDispatchHandler in this chapter, but it is an alternative mechanism to the system of registering handlers for every phase. If you register a PerlDispatchHandler, that handler is called for every phase. A PerlRestartHandler lets you run code whenever the Apache server restarts.

Much of the difficulty in getting started with mod_perl resides in learning how to do what you already knew how to do with CGI.pm. Cookies and form parameters are cumbersome to manipulate with pure mod_perl. This is why Recipe 21.2 and Recipe 21.3 discuss these seemingly simple topics.

mod_perl 2

As this chapter goes to press, developers are putting the finishing touches on mod_perl 2.0. This is a major revision and rewrite of mod_perl for the Apache 2.0 system. The changes between 1.0 and 2.0 are too numerous to list: they affect configuration directives and Perl classes. There's an Apache::compat module that emulates the 1.0 handler API, but (as with using Apache::Registry to emulate CGI) there's a cost to the emulation. For maximum performance and flexibility, modify your modules to use the 2.0 API.

One of the biggest changes in 2.0 is the support for threads. Not only can you now have multiple Apache processes running at once, you can also have multiple threads of execution within each process. Some tasks are easier with threads, and you may see better performance with threads. However, it's trickier to write codeespecially correct codeunder threading.

For more on mod_perl 2.0, see http://perl.apache.org/docs/2.0/.