The preceding sections of this chapter have presented a brief overview of the design of the Web, the principal components of the programming environment such as HTTP, HTML, and URLs, and of the essential request-response nature of web interactions between web browsers and web servers. Now, it's time to look at CGI and a specific Perl module, CGI.pm, that is widely used to create interactive web pages on servers.
CGI, the Common Gateway Interface, is an interface between a web server and some other program that requests such web content as HTML documents or images.
A web browser may request from the web server the output from a CGI program. In this case, the web server finds the program or script, runs the program, and sends the output of the program back to the web browser. The output of the program may be HTML, just as it may be found in a static file, but it is created by the CGI script dynamically, so it may be different each time; for instance, it may include the time of day in its display.
In other words, a CGI script is just a program that produces web content that can be displayed in a web browser. It also can read information passed to it by the web server, usually parameters filled out by the user in a form displayed on a web browser that asks for the specifics of a query. For example, the parameters might be the name of a sequence file and the name of a restriction enzyme to map in the sequence. The CGI program takes the parameters, runs, and outputs a dynamically created web page to be returned to the user's web browser.
A CGI program can be written in just about any language, but the most common for CGI programs on the Web is Perl. Perl has a very nice module called CGI.pm that eases the task of writing CGI scripts; it's a popular way to create dynamic web sites with a minimum of bother.
So, how do you write a CGI Perl program? Basically, you write a Perl program that includes the line:
You then use the CGI.pm methods so that your Perl program outputs the code for the web page you want to return. After that, it's simply a matter of placing your new CGI script in the proper place in the web server's directory structure?namely, in a directory that the web server knows is supposed to contain CGI scripts. Finally, you type in the name of the CGI script to a web browser as a URL. The web browser sends the request to the web server, which executes the CGI script, collects its output, and returns it to your web browser to be displayed as a web page (or image, sound, or whatever).
You can actually write a Perl program that just prints out HTML code (without ever using the CGI.pm module) and install that as a CGI program. For instance, you can take the HTML page shown earlier and create a CGI Perl script that dynamically outputs that page. To prove it's dynamic, I'll add a little code that includes the time of day:
#!/usr/bin/perl use strict; use warnings; my $time = localtime; print "Content-type: text/html\n\n"; print <<EndOfHTML; <html> <head> <title>Double stranded RNA can regulate genes</title> </head> <body> <h2>Double stranded RNA can regulate genes</h2> <p>A recent article in <b>Nature</b> describes the important discovery of <i>RNA interference</i>, the action of snippets of double-stranded RNA in suppressing gene expression. </p> <p> The discovery has provided a powerful new tool in investigating gene function, and has raised many questions about the nature of gene regulation in a wide variety of organisms. </p> <p> This page was created $time. </p> </body> </html> EndOfHTML
Notice that the program just prints out the HTML code. It also prints a header line: print "Content-type: text/html\n\n"; before printing the HTML code as the body of the response. Notice the two \n's in that header line; these print a blank line between the header and the body of the response, as described earlier in this chapter.
Also, notice a new last paragraph that reports the time.
After writing the program, it is necessary to install it in the cgi script directory of your web server. Because of the multiplicity of web servers and operating systems, it is not possible for me to be comprehensive on this point. On my Linux system, using the Apache web server, I simply became superuser (root), copied the script (called cgiex1) into the directory /var/www/cgi-bin, and then typed:
chmod 755 /var/www/cgi-bin/cgiex1
If you're working on a Mac OS X, the procedure is similar. If you're on a Microsoft Windows machine, the details are a little different; consult the documentation for your web server to see how to install a CGI script in the appropriate place.
Once I've installed the script, I simply entered the following URL into my web browser. Notice that the URL gives the hostname as localhost, which means the web server is on the same computer on which I'm using the web browser.
I hit the Enter or Return key, and the web server returned the web page that's displayed in my web browser (see Figure 7-2).
Notice that it's exactly the same as the previous version that just read a file, but this time, the current date on the web server is also being reported. So each time you run this program, you'll get a different output (as regards the date, that is). This qualifies the program as dynamic.
I'll go into more detail about CGI installation in the next section.
The following program was written using CGI.pm; it has the same output as the example in the previous section. Notice how almost the entire contents of this CGI script are a Perl print function with a list of arguments, ending with the argument end_html. The various arguments to print are either CGI.pm functions or text strings:
#!/usr/bin/perl use strict; use warnings; use CGI qw/:standard/; my $time = localtime; print header, start_html('Double stranded RNA can regulate genes'), h2('Double stranded RNA can regulate genes'), start_form, p, "A recent article in <b>Nature</b> describes the important discovery of <i>RNA interference</i>, the action of snippets of double-stranded RNA in suppressing gene expression.", p, "The discovery has provided a powerful new tool in investigating gene function, and has raised many questions about the nature of gene regulation in a wide variety of organisms.", p, "This page was created $time.", p, end_form;
This program uses the most common routines defined in the CGI.pm module, as imported into the program's namespace by the directive use CGI qw/:standard/;.
The function header prints the header information discussed earlier in this chapter; it takes as an argument the document type and assumes the type is text/html by default. The function start_html starts the HTML and gives the title of the document (which most web browsers display in their titlebar above the document). The functions h1, h2, and so forth give the different levels of HTML headers in the document structure. The function p starts a new paragraph of text. Finally, the function end_form closes the HTML document.
Here is the body of the document the web browser receives for display from the CGI program cgiex1.cgi:
<html> <head> <title>Double stranded RNA can regulate genes</title> </head> <body> <h2>Double stranded RNA can regulate genes</h2> <p>A recent article in <b>Nature</b> describes the important discovery of <i>RNA interference</i>, the action of snippets of double-stranded RNA in suppressing gene expression. </p> <p> The discovery has provided a powerful new tool in investigating gene function, and has raised many questions about the nature of gene regulation in a wide variety of organisms. </p> <p> This page was created Tue Apr 15 09:42:49 2003. </p> </body> </html>
This simple web page is not much less complicated than the previous version cgiex1 that didn't use CGI.pm but simply output the HTML code. However, as you write more complicated web pages, with forms to be filled in by the user, choices to click on, and a button to push to submit a request, you'll see that using CGI.pm can significantly ease your programming work.
Let's assume the CGI scripts are where they should be, ownership and permissions have been assigned, and now your web server can find and attempt to execute them. So how do you test a CGI web program?
First, check the basic syntax by running:
perl -c cgiex1.cgi
and, hopefully, getting the message:
cgiex1.cgi syntax OK
If not, you can save a bit of trouble by at least fixing the syntax of your program before installing it in the web space, where you have to test it with the web browser and web logs and so forth, as I'll demonstrate in a moment.
Copy your CGI program cgiex1.cgi into your CGI directory (on my system, it's /var/www/cgi-bin). I did this as the user root. Then, still as root, I made the program executable by typing:
chmod 755 /var/www/cgi-bin/cgiex1.cgi
Let's try the program out. Start up a web browser and type in the URL http://ocalhost/cgi-bin/cgiex1.cgi and hit the Enter key. Figure 7-3 shows what it looks like.
It worked! But what would you do if it doesn't? Even though you check the syntax, the program may have had some other problem that caused it to fail. For this demonstration, I made another version of the program with a missing semicolon near the beginning of the program, called cgiex1ouch.cgi. When I ran it, I saw something like what's in Figure 7-4.
The best way to proceed is to check the error logs for the web server to see if they give any useful hints as to why the program failed.
Again, this will be different on different systems. On Linux, Unix, and Mac OS X, some variation of the following will work. I determined where the error logs for the web server are kept on my system, which is in the directory /etc/httpd/logs, and the most recent error log there is called error_log. I opened a new command window and typed the following command:
tail +0f /etc/httpd/logs/error_log
This prints out the error file, and then waits at the end. Whenever anything new is printed to the error file, it prints it out too. So, by hitting the Return key a few times to make space after what went before, and then trying to run the program again from the web browser, I can see clearly what new error messages resulted.
Here's what I saw in this example:
syntax error at /var/www/cgi-bin/cgiex1ouch.cgi line 6, near "use warnings use CGI " Execution of /var/www/cgi-bin/cgiex1ouch.cgi aborted due to compilation errors. [Tue Apr 15 21:23:10 2003] [error] [client 127.0.0.1] Premature end of script headers: /var/www/cgi-bin/cgiex1ouch.cgi
Sure enough, I'd removed the semicolon at the end of the use warnings statement.
For most web programming jobs, this is all you'll need, because the error log will show you the error output of the program. If it's a difficult problem, you can even put extra print statements in your CGI script?in the standard way they are used to debug a misbehaving program. For instance, you might see if a program ever gets to a certain line by placing directly after that line the statement:
print STDERR "Got to here!\n";
This message will appear in the error logs (if your program gets to that point before it dies).