Recipe 9.7 Processing All Files in a Directory Recursively

9.7.1 Problem

You want to do something to each file and subdirectory in a particular directory.

9.7.2 Solution

Use the standard File::Find module.

use File::Find;
sub process_file {
    # do whatever;
find(\&process_file, @DIRLIST);

9.7.3 Discussion

File::Find provides a convenient way to process a directory recursively. It does the directory scans and recursion for you. All you do is pass find a code reference and a list of directories. For each file in those directories, recursively, find calls your function.

Before calling your function, find by default changes to the directory being visited, whose path relative to the starting directory is stored in the $File::Find::dir variable. $_ is set to the basename of the file being visited, and the full path of that file can be found in $File::Find::name. Your code can set $File::Find::prune to true to tell find not to descend into the directory just seen.

This simple example demonstrates File::Find. We give find an anonymous subroutine that prints the name of each file visited and adds a / to the names of directories:

@ARGV = qw(.) unless @ARGV;
use File::Find;
find sub { print $File::Find::name, -d && "/", "\n" }, @ARGV;

The -d file test operator returns the empty string '' if it fails, making the && return that, too. But if -d succeeds, the && returns "/", which is then printed.

The following program prints the total bytes occupied by everything in a directory, including subdirectories. It gives find an anonymous subroutine to keep a running sum of the sizes of each file it visits. That includes all inode types, including the sizes of directories and symbolic links, not just regular files. Once the find function returns, the accumulated sum is displayed.

use File::Find;
@ARGV = (".") unless @ARGV;
my $sum = 0;
find sub { $sum += -s }, @ARGV;
print "@ARGV contains $sum bytes\n";

This code finds the largest single file within a set of directories:

use File::Find;
@ARGV = (".") unless @ARGV;
my ($saved_size, $saved_name) = (-1, "");
sub biggest {
    return unless -f && -s _ > $saved_size;
    $saved_size = -s _;
    $saved_name = $File::Find::name;
find(\&biggest, @ARGV);
print "Biggest file $saved_name in @ARGV is $saved_size bytes long.\n";

We use $saved_size and $saved_name to keep track of the name and the size of the largest file visited. If we find a file bigger than the largest seen so far, we replace the saved name and size with the current ones. When the find finishes, the largest file and its size are printed out, rather verbosely. A more general tool would probably just print the filename, its size, or both. This time we used a named function rather than an anonymous one because the function was getting big.

It's simple to change this to find the most recently changed file:

use File::Find;
@ARGV = (".") unless @ARGV;
my ($age, $name);
sub youngest {
    return if defined $age && $age > (stat($_))[9];
    $age = (stat(_))[9];
    $name = $File::Find::name;
find(\&youngest, @ARGV);
print "$name " . scalar(localtime($age)) . "\n";

The File::Find module doesn't export its $name variable, so always refer to it by its fully qualified name. Example 9-2 is more a demonstration of namespace munging than of recursive directory traversal, although it does find all directories. It makes $name in our current package an alias for the one in File::Find, which is essentially how Exporter works. Then it declares its own version of find with a prototype so it can be called like grep or map.

Example 9-2. fdirs
  #!/usr/bin/perl -lw
  # fdirs - find all directories
  @ARGV = qw(.) unless @ARGV;
  use File::Find ( );
  sub find(&@) { &File::Find::find }
  *name = *File::Find::name;
  find { print $name if -d } @ARGV;

Our own find only calls the find in File::Find, which we were careful not to import by specifying an ( ) empty list in the use statement. Rather than write this:

find sub { print $File::Find::name if -d }, @ARGV;

we can write the more pleasant:

find { print $name if -d } @ARGV;

9.7.4 See Also

The documentation for the standard File::Find and Exporter modules (also in Chapter 32 of Programming Perl); your system's find(1) manpage; Recipe 9.6