Perl as a Scripting Language


Perl as a Scripting Language

Perl stands for Practical Extraction Report Language. Larry Wall created Perl to extract information from text files and to use that information to prepare reports. Programs written in Perl, the language, are interpreted and executed by perl, the program. This book's companion CD-ROMs include Perl, and you can install it at the same time as you install Red Hat Linux (simply select the Development Tools package group).

Perl is available on a wide variety of computer systems because, like Linux, Perl can be distributed freely. In addition, Perl is popular as a scripting language among many users and system administrators, which is why I introduce Perl and describe its strengths. In Chapter 25, you learn about another scripting language (Tcl/Tk) that provides the capability to create GUIs for the scripts.

Determining Whether You Have Perl

Before you proceed with the Perl tutorial, check whether you have Perl installed on your system. Type the following command:

which perl

The which command tells you whether it finds a specified program in the directories listed in the PATH environment variable. If perl is installed, you should see the following output:

/usr/bin/perl

If the which command complains that no such program exists in the current PATH, this does not necessarily mean you do not have perl installed; it may mean that you do not have the /usr/bin directory in PATH. Ensure that /usr/bin is in PATH; either type echo $PATH or look at the message displayed by the which command (that message includes the directories in PATH). If /usr/bin is not in PATH, use the following command to redefine PATH:

export PATH=$PATH:/usr/bin

Now, try the which perl command again. If you still get an error, you may not have installed Perl. You can install Perl from the companion CD-ROMs by performing the following steps:

  1. Log in as root.

  2. Mount each CD and look for the perl RPM package. Mount the CD with the mount /dev/cdrom command or wait until GNOME's magicdev device mounts the CD. Then search for the perl RPM with the following commands:

    cd /mnt/cdrom/RedHat/RPMS
    ls perl*.rpm
  3. After you find the perl RPM file, type the following rpm (Red Hat Package Manager) command to install Perl:

    rpm -ivh perl*

After you have perl installed on your system, type the following command to see its version number:

perl -v

Following is typical output from that command:

This is perl, v5.8.0 built for i386-linux-thread-multi
(with 1 registered patch, see perl -V for more detail)

Copyright 1987-2002, Larry Wall

Perl may be copied only under the terms of either the Artistic License or the
GNU General Public License, which may be found in the Perl 5 source kit.

Complete documentation for Perl, including FAQ lists, should be found on
this system using `man perl' or `perldoc perl'.  If you have access to the
Internet, point your browser at http://www.perl.com/, the Perl Home Page.

This output tells you that you have Perl Version 5.8, patch Level 0, and that Larry Wall, the originator of Perl, holds the copyright. Perl is distributed freely under the GNU General Public License, however.

You can get the latest version of Perl by pointing your World Wide Web browser to the Comprehensive Perl Archive Network (CPAN). The following address connects you to the CPAN site nearest to you:

http://www.perl.com/CPAN/

Writing Your First Perl Script

Perl has many features of C, and, as you may know, most books on C start with an example program that displays Hello, World! on your terminal. Because Perl is an interpreted language, you can accomplish this task directly from the command line. If you enter:

perl -e 'print "Hello, World!\n";'

the system responds

Hello, World!

This command uses the -e option of the perl program to pass the Perl program as a command-line argument to the Perl interpreter. In this case, the following line constitutes the Perl program:

print "Hello, World!\n";

To convert this line to a script, simply place the line in a file, and start the file with a directive to run the perl program (as you do in shell scripts, when you place a line such as #!/bin/sh to run the Bourne shell to process the script).

To try a Perl script, follow these steps:

  1. Use a text editor, such as vi or emacs, to save the following lines in the file named hello:

    #!/usr/bin/perl
    # This is a comment.
    print "Hello, World!\n";
  2. Make the hello file executable by using the following command:

    chmod +x hello
  3. Run the Perl script by typing the following at the shell prompt:

    ./hello
    Hello, World!

That's it! You have written and tried your first Perl script.

Learning More about Perl

I devote a few sections of this chapter to giving you an overview of Perl and to showing a few simple examples. However, this discussion does not do justice to Perl. If you want to use Perl as a tool, consult one of the following books:

  • Larry Wall, Tom Christiansen, and Jon Orwant, Programming Perl, 3rd Edition (O'Reilly & Associates, 2000)

  • Randal L. Schwartz and Tom Phoenix, Learning Perl, 3rd Edition (O'Reilly & Associates, 2001)

  • Paul E. Hoffman, Perl For Dummies (John Wiley & Sons, 2000)

Programming Perl, 3rd Edition, is the authoritative guide to Perl (although it may not be the best resource for learning Perl). The book by Randal Schwartz focuses more on teaching Perl programming. Paul Hoffman's book is a good introduction for nonprogrammers wanting to learn Perl.

Getting an Overview of Perl

Most programming languages, including Perl, have some common features:

  • Variables to store different types of data. You can think of each variable as a placeholder for data-kind of like a mailbox, with a name and room to store data. The content of the variable is its value.

  • Expressions that combine variables by using operators. One expression might add several variables; another might extract a part of a string.

  • Statements that perform some action, such as assigning a value to a variable or printing a string.

  • Flow-control statements that enable statements to be executed in various orders, depending on the value of some expression. Typically, flow-control statements include for, do-while, while, and if-then-else statements.

  • Functions (also called subroutines or routines) that enable you to group several statements and give them a name. This feature enables you to execute the same set of statements by invoking the function that represents those statements. Typically, a programming language provides some predefined functions.

  • Packages and modules that enable you to organize a set of related Perl subroutines that are designed to be reusable. (Modules were introduced in Perl 5).

The next few sections provide an overview of these major features of Perl and illustrate the features through simple examples.

Learning Basic Perl Syntax

Perl is free-form, like C; no constraints exist on the exact placement of any keyword. Often, Perl programs are stored in files with names that end in .pl, but there is no restriction on the filenames you use.

As in C, each Perl statement ends with a semicolon (;). A number sign or hash mark (#) marks the start of a comment; the perl program disregards the rest of the line beginning with the number sign.

Groups of Perl statements are enclosed in braces ({...}). This feature also is similar in C.

Using Variables in Perl

You don't have to declare Perl variables before using them, as you do in C. You can recognize a variable in a Perl script easily, because each variable name begins with a special character: an at symbol (@), a dollar sign ($), or a percent sign (%). These special characters denote the variable's type.

Using Scalars

A scalar variable can store a single value, such as a number, or a text string. Scalar variables are the basic data type in Perl. Each scalar's name begins with a dollar sign ($). Typically, you start using a scalar with an assignment statement that initializes it. You even can use a variable without initializing it; the default value for numbers is zero, and the default value of a string is an empty string. If you want to see whether a scalar is defined, use the defined function as follows:

print "Name undefined!\n" if !(defined $name);

The expression (defined $name) is 1 if $name is defined. You can 'undefine' a variable by using the undef function. You can undefine $name, for example, as follows:

undef $name;

Variables are evaluated according to context. Following is a script that initializes and prints a few variables:

#!/usr/bin/perl
$title = "Red Hat Linux Professional Secrets";
$count1 = 650;
$count2 = 425;

$total = $count1 + $count2;

print "Title: $title -- $total pages\n";

When you run the preceding Perl program, it produces the following output:

Title: Red Hat Linux Professional Secrets -- 1075 pages

As the Perl statements show, when the two numeric variables are added, their numeric values are used; but when the $total variable is printed, its string representation is displayed.

Another interesting aspect of Perl is that it evaluates all variables in a string within double quotation marks ("..."). However, if you write a string inside single quotation marks ('...'), Perl leaves that string untouched. If you write

 print 'Title: $title -- $total pages\n';

with single quotes instead of double quotes, Perl displays

Title: $title -- $total pages\n

and does not generate a new line.

Insider Insight 

A useful Perl variable is $_ (the dollar sign followed by the underscore character). This special variable is known as the default argument. The Perl interpreter determines the value of $_ depending on the context. When the Perl interpreter reads input from the standard input, $_ holds the current input line; when the interpreter is searching for a specific pattern of text, $_ holds the default search pattern.

Using Arrays

An array is a collection of scalars. The array name begins with an at symbol (@). As in C, array subscripts start at zero. You can access the elements of an array with an index. Perl allocates space for arrays dynamically.

Consider the following simple script:

#!/usr/bin/perl
@commands = ("start", "stop", "draw" , "exit");

$numcmd = @commands;
print "There are $numcmd commands.\n";
print "The first command is: $commands[0]\n";

When you run the script, it produces the following output:

There are 4 commands.
The first command is: start

You can print an entire array with a simple print statement like this:

print "@commands\n";

When Perl executes this statement for the @commands array used in this section's examples, it displays the following output:

start stop draw exit
Using Associative Arrays

Associative array variables, which are declared with a percent sign (%) prefix, are unique features of Perl. Using associative arrays, you can index an array with a string, such as a name. A good example of an associative array is the %ENV array, which Perl automatically defines for you. In Perl, %ENV is the array of environment variables you can access by using the environment-variable name as an index. The following Perl statement prints the current PATH environment variable:

print "PATH = $ENV{PATH}\n";

When Perl executes this statement, it prints the current setting of PATH. In contrast to indexing regular arrays, you have to use braces to index an associative array.

Perl has many built-in functions-such as delete, each, keys, and values-that enable you to access and manipulate associative arrays.

Listing the Predefined Variables in Perl

Perl has several predefined variables that contain useful information you may need in a Perl script. Following are a few important predefined variables:

  • @ARGV is an array of strings that contains the command-line options to the script. The first option is $ARGV[0], the second one is $ARGV[1], and so on.

  • %ENV is an associative array that contains the environment variables. You can access this array by using the environment-variable name as a key. Thus, $ENV{HOME} is the home directory, and $ENV{PATH} is the current search path that the shell uses to locate commands.

  • $_ is the default argument for many functions. If you see a Perl function used without any argument, the function probably is expecting its argument in the $_ variable.

  • @_ is the list of arguments passed to a subroutine.

  • $0 is the name of the file containing the Perl program.

  • $^V is the version number of Perl you are using (for example, if you use Perl Version 5.8.0, $^V will be v5.8.0).

  • $< is the user ID (an identifying number) of the user running the script. This is useful on UNIX and Linux, where each user has an ID.

  • $$ is the script's process ID.

  • $? is the status the last system call has returned.

Using Operators and Expressions

Operators are used to combine and compare Perl variables. Typical mathematical operators are addition (+), subtraction (-), multiplication (*), and division (/). Perl and C provide nearly the same set of operators. When you use operators to combine variables, you end up with expressions. Each expression has a value.

Following are some typical Perl expressions:

error < 0
$count == 10
$count + $i
$users[$i]

These expressions are examples of the comparison operator (the first two lines), the arithmetic operator, and the array-index operator.

You can initialize an array to null by using ()-the null-list operator-as follows:

@commands = ();

The dot operator (.) enables you to concatenate two strings, as follows:

$part1 = "Hello, ";
$part2 = "World!";
$message = $part1.$part2;  # Now $message = "Hello, World!"

The repetition operator, denoted by x=, is curious but useful. You can use the x= operator to repeat a string a specified number of times. Suppose that you want to initialize a string to 65 asterisks (*). The following example shows how you can initialize the string with the x= operator:

$marker = "*";
$marker x= 65;  # Now $marker is a string of 65 asterisks.

Another powerful operator in Perl is range, which is represented by two periods (..). You can initialize an array easily by using the range operator. Following are some examples:

@numerals = (0..9); # @numerals = 0, 1, 2, 3, 4, 5, 6, 7, 8, 9
@alphabet = ('A'..'Z'); # @alphabet = capital letters A through Z

Learning Regular Expressions

If you have used Linux (or any variant of UNIX) for a while, you probably know about the grep command, which enables you to search files for a pattern of strings. Following is a typical use of grep to locate all files that have any occurrences of the string blaster or Blaster-on any line of all files with names that end in .c:

cd /usr/src/linux*/drivers/cdrom
grep "[bB]laster"  *.c

The preceding commands produce the following output on my system:

sbpcd.c: *          Works with SoundBlaster compatible cards and with "no-sound"
sbpcd.c:        0x230, 1, /* Soundblaster Pro and 16 (default) */
sbpcd.c:        0x250, 1, /* OmniCD default, Soundblaster Pro and 16 */
sbpcd.c:        0x270, 1, /* Soundblaster 16 */
sbpcd.c:        0x290, 1, /* Soundblaster 16 */
sbpcd.c:static const char *str_sb_l = "soundblaster";
sbpcd.c:static const char *str_sb = "SoundBlaster";
sbpcd.c: *                 sbpcd=0x230,SoundBlaster
sbpcd.c:        msg(DBG_INF,"   LILO boot: ... sbpcd=0x230,SoundBlaster\n");
sjcd.c: *  the SoundBlaster/Panasonic style CDROM interface. But today, the

As you can see, grep has found all occurrences of blaster and Blaster in the files with names ending in .c.

The grep command's "[bB]laster" argument is known as a regular expression, a pattern that matches a set of strings. You construct a regular expression with a small set of operators and rules that resemble the ones for writing arithmetic expressions. A list of characters inside brackets ([...]), for example, matches any single character in the list. Thus, the regular expression "[bB]laster" is a set of two strings, as follows:

blaster   Blaster

So far, this section has summarized the syntax of regular expressions. But, you have not seen how to use regular expressions in Perl. Typically, you place a regular expression within a pair of slashes and use the match (=~)or not-match (!~) operators to test a string. You can write a Perl script that performs the same search as the one done with grep earlier in this section. The following steps help you complete this exercise:

  1. Use a text editor to type and save the following script in a file named lookup:

    #!/usr/bin/perl
    
    while (<STDIN>)
    {
        if ( $_ =~ /[bB]laster/ ) { print $_; }
    }
  2. Make the lookup file executable by using the following command:

    chmod +x lookup
  3. Try the script by using the following command:

    cat /usr/src/linux*/drivers/cdrom/sbpcd.c | ./lookup

    My system responds with this:

     *    Works with SoundBlaster compatible cards and with "no-sound"
            0x230, 1, /* Soundblaster Pro and 16 (default) */
            0x250, 1, /* OmniCD default, Soundblaster Pro and 16 */
            0x270, 1, /* Soundblaster 16 */
            0x290, 1, /* Soundblaster 16 */
    static const char *str_sb_l = "soundblaster";
    static const char *str_sb = "SoundBlaster";
     *                 sbpcd=0x230,SoundBlaster
                    msg(DBG_INF,"   LILO boot: ... sbpcd=0x230,SoundBlaster\n");
     *    Works with SoundBlaster compatible cards and with "no-sound"
            0x230, 1, /* Soundblaster Pro and 16 (default) */
            0x250, 1, /* OmniCD default, Soundblaster Pro and 16 */
            0x270, 1, /* Soundblaster 16 */
            0x290, 1, /* Soundblaster 16 */
    static const char *str_sb_l = "soundblaster";
    static const char *str_sb = "SoundBlaster";
     *                 sbpcd=0x230,SoundBlaster
                    msg(DBG_INF,"   LILO boot: ... sbpcd=0x230,SoundBlaster\n");

    The cat command feeds the contents of a specific file (which, as you know from the grep example, contains some lines with the regular expression) to the lookup script. The script simply applies Perl's regular expression-match operator (=~) and prints any matching line.

The $_ variable in the script needs some explanation. The <STDIN> expression gets a line from the standard input and, by default, stores that line in the $_ variable. Inside the while loop, the regular expression is matched against the $_ string. The following single Perl statement completes the lookup script's work:

if ( $_ =~ /[bB]laster/ ) { print $_; }

This example illustrates how you might use a regular expression to search for occurrences of strings in a file.

After you use regular expressions for a while, you can better appreciate their power. The trick is to find the regular expression that performs the task you want. Following is a search that looks for all lines that begin with exactly seven spaces and end with a right parenthesis:

while (<STDIN>)
{
    if ( $_ =~ /\)\n/ && $_ =~ /^ {7}\S/ )  { print $_; }
}

Using Flow-Control Statements

So far, you have seen Perl statements intended to execute in a serial fashion, one after another. Perl also includes statements that enable you to control the flow of execution of the statements. You already have seen the if statement and a while loop. Perl includes a complete set of flow-control statements just like those in C, but with a few extra features.

In Perl, all conditional statements take the following form:

conditional-statement
{ Perl code to execute if conditional is true }

Notice that you must enclose within braces ({...}) the code that follows the conditional statement. The conditional statement checks the value of an expression to determine whether to execute the code within the braces. In Perl, as in C, any nonzero value is considered true, whereas a zero value is false.

The following sections briefly describe the syntax of the major conditional statements in Perl.

Using if and unless Statements

The Perl if statement resembles the C if statement. For example, an if statement might check a count to see whether the count exceeds a threshold, as follows:

if ( $count > 25 ) { print "Too many errors!\n"; }

You can add an else clause to the if statement, as follows:

if ($user eq "root")
{
    print "Starting simulation...\n";
}
else
{
    print "Sorry $user, you must be \"root\" to run this program.\n.";
    exit;
}

If you know C, you can see that Perl's syntax looks quite a bit like that in C. Conditionals with the if statement can have zero or more elsif clauses to account for more alternatives, such as the following:

print "Enter version number:"; # prompt user for version number
$os_version = <STDIN>;         # read from standard input
chop $os_version;  # get rid of the newline at the end of the line
# Check version number
if ($os_version >= 10 ) { print "No upgrade necessary\n";}
elsif ($os_version >= 6 && $os_version < 9) 
                                    { print "Standard upgrade\n";}
elsif ($os_version > 3 && $os_version < 6) { print "Reinstall\n";}
else { print "Sorry, cannot upgrade\n";}
Using the while Statement

Use Perl's while statement for looping-the repetition of some processing until a condition becomes false. To read a line at a time from standard input and to process that line, you might use the following:

while ($in = <STDIN>)
{
# Code to process the line
    print $in;
}

You can skip to the end of a loop with the next keyword; the last keyword exits the loop. The following while loop adds the numbers from 1 to 10, skipping 5:

while (1)
{
    $i++;
    if($i == 5) { next;}  # Jump to the next iteration if $i is 5
    if($i > 10) { last;}  # When $i exceeds 10, end the loop
    $sum += $i;           # Add the numbers
}
# At this point $sum should be 50.
Using for and foreach Statements

Perl and C's for statements have similar syntax. Use the for statement to execute a statement any number of times, based on the value of an expression. The syntax of the for statement is as follows:

for (expr_1; expr_2; expr_3) { statement block }

expr_1 is evaluated one time, at the beginning of the loop; the statement block is executed until expression expr_2 evaluates to zero. The third expression, expr_3, is evaluated after each execution of the statement block. You can omit any of the expressions, but you must include the semicolons. In addition, the braces around the statement block are required. Following is an example that uses a for loop to add the numbers from 1 to 10:

for($i=0, $sum=0; $i <= 10; $sum += $i, $i++) {}

In this example, the actual work of adding the numbers is done in the third expression, and the statement the for loop controls is an empty block ({}).

Using the goto Statement

The goto statement transfers control to a statement label. Following is an example that prompts the user for a value and repeats the request, if the value is not acceptable:

ReEnter:
print "Enter offset: ";
$offset = <STDIN>;
chop $offset;
unless ($offset > 0 && $offset < 512)
{
    print "Bad offset: $offset\n";
    goto ReEnter;
}

Accessing Linux Commands

You can execute any Linux command from Perl in several ways:

  • Call the system function with a string that contains the Linux command you want to execute.

  • Enclose a Linux command within backquotes (`command`), which also are known as grave accents. You can run a Linux command this way and capture its output.

  • Call the fork function to copy the current script and process new commands in the child process. (If a process starts another process, the new one is known as a child process.)

  • Call the exec function to overlay the current script with a new script or Linux command.

  • Use fork and exec to provide shell-like behavior. (Monitor user input, and process each user-entered command through a child process.) This section presents a simple example of how to accomplish this task.

The simplest way to execute a Linux command in your script is to use the system function with the command in a string. After the system function returns, the exit code from the command is in the $? variable. You can easily write a simple Perl script that reads a string from the standard input and processes that string with the system function. Follow these steps:

  1. Use a text editor to enter and save the following script in a file named rcmd.pl:

    #!/usr/bin/perl
    # Read user input and process command
    
    $prompt = "Command (\"exit\" to quit): ";
    print $prompt;
    
    while (<STDIN>)
    {
        chop;
        if ($_ eq "exit") { exit 0;}
    
    # Execute command by calling system
        system $_;
        unless ($? == 0) {print "Error executing: $_\n";}
        print $prompt;
    }
  2. Make the rcmd.pl file executable by using the following command:

    chmod +x rcmd.pl
  3. Run the script by typing ./rcmd.pl at the shell prompt in a terminal window. The following listing shows some sample output from the rcmd.pl script (the output depends on what commands you enter):

    Command ("exit" to quit): ps
      PID TTY          TIME CMD
      767 pts/0    00:00:00 bash
      940 pts/0    00:00:00 rcmd.pl
      945 pts/0    00:00:00 ps
    Command ("exit" to quit): exit      

Also, you can run Linux commands by using fork and exec in your Perl script. Following is an example script-psh.pl-that uses fork and exec to execute commands the user enters:

#!/usr/bin/perl

# This is a simple script that uses "fork" and "exec" to
# run a command entered by the user

$prompt = "Command (\"exit\" to quit): ";
print $prompt;

while (<STDIN>)
{
    chop;    # remove trailing newline
    if($_ eq "exit") { exit 0;}

    $status = fork;
    if($status)
    {
# In parent... wait for child process to finish...
        wait;
        print $prompt;
        next;
    }
    else
    {
        exec $_;
    }
}

The following example shows how the psh.pl script executes the ps command (remember to type chmod +x psh.pl before typing ./psh.pl):

Command ("exit" to quit