This section covers getting information into programs and receiving data back from them.
Perl has several convenient ways to get information into a program. The non-CGI programs in this book usually get input by opening and reading files. I've emphasized this way of getting input because it behaves very much the same way on any computer you may be using. You've observed the open and close system calls and how to associate a filehandle with a file when you open it, which then is used to read in the data. As an example:
open(FILEHANDLE, "informationfile"); @data_from_informationfile = <FILEHANDLE>; close(FILEHANDLE);
This code opens the file informationfile and associates the filehandle FILEHANDLE with it. The filehandle is then used within angle brackets to actually read in the contents of the file and store the contents in the array @data_from_informationfile. Finally, the file is closed by referring once again to the opened filehandle.
Perl allows you to read in any input that is automatically sent to your program via standard input (STDIN). STDIN is a filehandle that by default is always open. Your program may be expecting some input that way. For instance, on a Mac, you can drag and drop a file icon onto the Perl applet for your program to make the file's contents appear in STDIN. On Unix systems, you can pipe the output of some other program into the STDIN of your program with shell commands such as:
someprog | my_perl_program
You can also pipe the contents of a file into your program with:
cat file | my_perl_program
or with:
my_perl_program < file.
Your program can then read in the data (from program or file) that comes as STDIN just as if it came from a file that you've opened:
@data_from_stdin = <STDIN>;
You can name your input files on the command line. <> is shorthand for <ARGV>. The ARGV filehandle treats the array @ARGV as a list of filenames and returns the contents of all those files, one line at a time. Perl places all command-line arguments into the array @ARGV. Some of these may be special flags, which should be read and removed from @ARGV if there will also be data files named. Perl assumes that anything in @ARGV refers to an input filename when it reaches a < > command. The contents of the file or files are then available to the program using the angle brackets without a filehandle, like so:
@data_from_files = <>;
For example, on Microsoft, Unix, or on the Mac OS X, you specify input files at the command line, like so:
% my_program file1 file2 file3
The print statement is the most common way to output data from a Perl program. The print statement takes as arguments a list of scalars separated by commas. An array can be an argument, in which case, the elements of the array are all printed one after the other:
@array = ('DNA', 'RNA', 'Protein'); print @array;
This prints:
DNARNAProtein
If you want to put spaces between the elements of an array, place it between double quotes in the print statement, like this:
@array = ('DNA', 'RNA', 'Protein'); print "@array";
This prints:
DNA RNA Protein
The print statement can specify a filehandle as an optional indirect object between the print statement and the arguments, like so:
print FH "@array";
The printf function gives more control over the formatting of the output of numbers. For instance, you can specify field widths; the precision, or number of places after the decimal point; and whether the value is right- or left-justified in the field. Check the Perl documentation that comes with your copy of Perl for all the details.
The sprintf function is related to the printf function; it formats a string instead of printing it out.
The format and write commands are a way to format a multiline output, as when generating reports. format can be a useful command, but in practice it isn't used much. The full details are available in your Perl documentation, and O'Reilly's Programming Perl contains an entire chapter on format.
Standard output, with the filehandle STDOUT, is the default destination for output from a Perl program, so it doesn't have to be named. The following two statements are equivalent unless you used select to change the default output filehandle:
print "Hello biology world!\n"; print STDOUT "Hello biology world!\n";
Note that the STDOUT isn't followed by a comma. STDOUT is usually directed to the computer screen, but it may be redirected at the command line to other programs or files. This Unix command pipes the STDOUT of my_program to the STDIN of your_program:
my_program | your_program
This Unix command directs the output of my_program to the file outputfile:
my_program > outputfile
It's also common to direct certain error messages to the predefined standard error filehandle STDERR or to a file you've opened for input and named with a particular filehandle. Here are examples of these two tasks:
print STDERR "If you reached this part of the program, something is terribly wrong!"; open(OUTPUTFD, ">output_file"); print OUTPUTFD "Here is the first line in the output file output_file\n";
STDERR is also usually directed to the computer screen by default, but it can be directed into a file from the command line. This is done differently for different systems, for example, as follows (on Unix with the sh or bash shells):
myprogram 2>myprogram.error
You can also direct STDERR to a file from within your Perl program by including code such as the following before the first output to STDERR. This is the most portable way to redirect STDERR:
open (STDERR, ">myprogram.error") or die "Cannot open error file myprogram.error:$!\n";
The problem with this is that the original STDERR is lost. This method, taken from Programming Perl, saves and restores the original STDERR:
open ERRORFILE, ">myprogram.error" or die "Can't open myprogram.error"; open SAVEERR, ">&STDERR"; open STDERR, ">&ERRORFILE; print STDERR "This will appear in error file myprogram.error\n"; # now, restore STDERR close STDERR; open STDERR, ">&SAVEERR"; print STDERR "This will appear on the computer screen\n";
There are a lot of details concerning filehandles not covered in this book, and redirecting one of the predefined filehandles such as STDERR can cause problems, especially as your programs get bigger and rely more on modules and libraries of subroutines. One safe way is to define a new filehandle associated with an error file and to send all your error messages to it:
open (ERRORMESSAGES, ">myprogram.error") or die "Cannot open myprogram.error:$!\n"; print ERRORMESSAGES "This is an error message\n";
Note that the die function, and the closely related warn function, print their error messages to STDERR.