The GNU text utility package includes a large number of utilities to manipulate the contents of text files. These utilities are patterned after the UNIX commands of the same name. The GNU versions of the programs usually have additional options and are optimized for speed. In general, the GNU utilities do not have any of the arbitrary limitations of their UNIX counterparts.
Table 8-4 briefly describes each text utility. The best way to learn these utilities is to try each one out. Before trying out a program, type info progname or man progname (where progname is the name of the program) to view the online help information.
Cross Ref |
A few selected text utilities are described in the following sections. You can find reference pages for many of these utilities in Appendix A. |
Program |
Description |
---|---|
cat |
Concatenates files and writes them to standard output |
cksum |
Prints the cyclic redundancy check (CRC) checksums and byte counts of files (used to verify that files have not been corrupted in transmission, by comparing the cksum output for the received files with the cksum output for the original files) |
comm |
Compares two sorted files line by line. (The sort command sorts files.) |
csplit |
Splits a file into sections determined by text patterns in the file and places each section in a separate file named xx00, xx01, and so on |
cut |
Removes sections from each line of files and writes them to standard output |
expand |
Converts tabs in each file to spaces and writes the result to standard output |
fmt |
Fills and joins lines, making each line roughly the same length, and writes the formatted lines to standard output |
fold |
Breaks lines in a file so that each line is no wider than a specified width, and writes the lines to standard output |
head |
Prints the first part of files |
join |
Joins corresponding lines of two files using a common field and writes each line to standard output |
md5sum |
Computes and checks the MD5 message digest (a 128-bit checksum using the MD5 algorithm) |
nl |
Numbers each line in a file and writes the lines to standard output |
od |
Writes the contents of files to standard output in octal and other formats. (This is used to view the contents of binary files.) |
paste |
Merges corresponding lines of one or more files into vertical columns separated by tabs, and writes each line to standard output |
pr |
Formats text files for printing |
ptx |
Produces a permuted index of file contents |
sort |
Sorts lines of text files |
split |
Splits a file into pieces |
sum |
Computes and prints a 16-bit checksum for each file and counts the number of 1,024-byte blocks in the file |
tac |
Writes each file to standard output, last line first |
tail |
Prints the last part of files |
tr |
Translates or deletes characters in files |
tsort |
Performs a topological sort (used to organize a library for efficient handling by the ar and ld commands) |
unexpand |
Converts spaces into tabs |
uniq |
Removes duplicate lines from a sorted file |
wc |
Prints the number of bytes, words, and lines in files |
For example, suppose that you want to use the wc command to display the character, word, and line count of a text file. Try the following:
wc /etc/inittab 54 236 1698 /etc/inittab
This causes wc to display the number of lines (54), words (236), and characters (1698) in the /etc/inittab file. If you simply want to see the number of lines in a file, use the -l option:
wc -l /etc/inittab 54 /etc/inittab
As you can see, in this case, wc simply displays the line count.
If you don’t specify a filename, the wc command expects input from the standard input. You can use the pipe feature of the shell to feed the output of another command to wc. This can be handy sometimes. Suppose that you want a rough count of the processes running on your system. You can get a list of all processes with the ps ax command, but instead of manually counting the lines, just pipe the output of ps to wc, and you can get a rough count, as follows:
ps ax | wc -l 65
That means that the ps command has produced 65 lines of output. Because the first line simply shows the headings for the tabular columns, you can estimate that about 64 processes are running on your system. (Of course, this count probably includes the processes used to run the ps and wc commands as well, but who’s counting?)
You can sort the lines in a text file by using the sort command. To see how the sort command works, first type more /etc/passwd to see the current contents of the /etc/passwd file. Now, type sort /etc/passwd to see the lines sorted alphabetically. If you want to sort a file and save the sorted version in another file, you have to use the Bash shell’s output redirection feature, as follows:
sort /etc/passwd > ~/sorted.text
This command sorts the lines in the /etc/passwd file and saves the output in a file named sorted.text in your home directory.
Another interesting command is tr—it substitutes one group of characters for another (or deletes a selected character) throughout a file. The tr command is useful when you want to convert a text file from one operating system to another because different operating systems use different special characters to mark the end of a line of text.
The split command is handy when you want to copy a file to a floppy disk but the file is too large to fit on a single floppy. You can then use the split command to break up the file into smaller files, each of which can fit on a floppy.
By default, split puts 1,000 lines into each file. The files are named by groups of letters such as aa, ab, ac, and so on. You can specify a prefix for the filenames. For example, to split a large file called hugefile.tar into smaller files that fit onto several high-density 3.5-inch floppy disks, use split as follows:
split -b 1440k hugefile.tar part.
This command splits the hugefile.tar file into 1,440K chunks so that each can fit onto a floppy disk. The command creates files named part.aa, part.ab, part.ac, and so on.
To combine the split files back into a single file, use the cat command as follows:
cat part.?? > hugefile.tar