7.10 Important Shell Script Utilities

There are several programs that are particularly useful in shell scripts. Some utilities (such as basename) are really only practical when used in conjunction with other programs, and therefore usually don't find a place outside of shell scripts, but others, such as awk, can be quite useful on the command line too.

7.10.1 basename

If you need to strip the extension off of a filename or get rid of the directories in a full pathname, use the basename command.

Try these examples on the command line to get a feel for how the command works:

basename example.html .html
basename /usr/local/bin/example

In both cases, basename returns example. The first command strips the .html suffix from example.html, and the second removes the directories from the full pathname.

Here is an example of how you can use basename in a script to convert GIF image files to the PNG format:

#!/bin/sh
for file in *.gif; do
    # exit if there are no files
    if [ ! -f $file ]; then
        exit
    fi
    b=`basename $file .gif`
    echo Converting $b.gif to $b.png...
    giftopnm $b.gif | pnmtopng > $b.png
done

7.10.2 awk

You may have seen the awk command already. This is not a simple single-purpose command — it's a powerful programming language. Perhaps unfortunately, awk is now something of a lost art due to larger languages such as Perl.

However, entire books do exist on the subject of awk, including The AWK Programming Language [Aho 1988]. This said, you will see many, many people use awk to pick a single field out of an input stream like this:

ls -l | awk '{print $5}'

This command prints the fifth field of the ls output (the file size); the result is a list of file sizes.

7.10.3 sed

sed stands for stream editor, and it is an automatic text editor that takes an input stream (a file or the standard input), alters it according to some expression, and prints the results to standard output. In many respects, sed is somewhat similar to ed, the original Unix text editor. It has dozens of operations, matching tools, and addressing capabilities. Like awk, there are books about sed, including one that covers both utilities, sed & awk [Dougherty].

Although sed is a big program, and an in-depth analysis is beyond the scope of this book, it's easy to see how it works. In general, sed takes an address and an operation as one argument. The address is a set of lines, and the command determines what to do with the lines.

For example, the following command reads /etc/passwd, deletes lines three through six and sends the result to the standard output:

sed 3,6d /etc/passwd

In this example, 3,6 is the address (a range of lines), and d is the operation (delete). If you omit the address, sed operates on all lines in its input stream. The two most common sed operations are probably s (search and replace) and d.

Let's go through a few more sed examples. In all of the examples, single quotes are necessary to prevent the shell from expanding special characters like * and $.

The following command replaces the regular expression exp with text (see Section 1.5.1 for basic information on regular expressions):

sed 's/exp/text/'

The preceding command replaces only one instance of the expression exp per line. To replace all instances of exp, use the g modifier at the end of the operation:

sed 's/exp/text/g'

You can also use a regular expression as the address. The following command deletes any line that matches the regular expression exp:

sed '/exp/d'

7.10.4 xargs

If you ever have to run one command on a huge number of files, the command or shell sometimes responds that it can't fit all of the arguments in its buffer. Use xargs to get around this problem. xargs runs a command on each filename in its standard input stream.

Many people use xargs in conjunction with the find command. Here is an example that can help you verify that every file in the current directory tree that ends with .gif is actually in the GIF format:

find . -name '*.gif' -print | xargs file

In the preceding example, xargs runs the file command. However, this invocation can cause errors or leave your system open to security problems, because filenames can include spaces and newlines. If you're writing a script or need extra security, use the following form instead, which changes the find output separator and the xargs argument delimiter from a newline to a NULL character:

find . -name '*.gif' -print0 | xargs -0 file

Keep in mind that if you have a large list of files, xargs starts a lot of processes. Don't expect great performance.

Note?

You may need to add two hyphens (--) to the end of your xargs command if there is a possibility that any of the target files start with -. The -- is a way to tell a program that any arguments that follow the -- are filenames, not options. However, keep in mind that not all programs support --.

7.10.5 expr

If you need arithmetic operations in your shell scripts, the expr command can help you (and even do some string operations). The command expr 1 + 2 prints 3; run expr --help for a full list of operations. expr is a clumsy, slow way of doing math. If you find yourself using expr frequently, it probably means that you should be using a language like Perl, Python, or awk instead of a shell script.

7.10.6 exec

The exec command is a built-in shell feature that replaces the current shell process with the program you name after exec. This is a feature for saving system resources, but remember that there's no return; once you run exec in a shell script, the script and shell running the script are gone, replaced by the new command.

You can test it in a shell window. Try running exec cat. After you press CONTROL-D or CONTROL-C to terminate the cat program, your window disappears because its child process no longer exists.