Recipe 9.4 Recognizing Two Names for the Same File

9.4.1 Problem

You want to determine whether two filenames in a list correspond to the same file on disk (because of hard and soft links, two filenames can refer to a single file). You might do this to make sure that you don't change a file you've already worked with.

9.4.2 Solution

Maintain a hash, keyed by the device and inode number of the files you've seen. The values are the names of the files:

%seen = ( );

sub do_my_thing {
    my $filename = shift;
    my ($dev, $ino) = stat $filename;

    unless ($seen{$dev, $ino}++) {
        # do something with $filename because we haven't
        # seen it before

9.4.3 Discussion

A key in %seen is made by combining the device number ($dev) and inode number ($ino) of each file. Files that are the same will have the same device and inode numbers, so they will have the same key.

If you want to maintain a list of all files of the same name, instead of counting the number of times seen, save the name of the file in an anonymous array.

foreach $filename (@files) {
    ($dev, $ino) = stat $filename;
    push( @{ $seen{$dev,$ino} }, $filename);

foreach $devino (sort keys %seen) {
    ($dev, $ino) = split(/$;/o, $devino);
    if (@{$seen{$devino}} > 1) {
        # @{$seen{$devino}} is a list of filenames for the same file

The $; variable contains the separator string using the old multidimensional associative array emulation syntax, $hash{$x,$y,$z}. It's still a one-dimensional hash, but it has composite keys. The key is really join($; => $x, $y, $z). The split separates them again. Although you'd normally just use a real multilevel hash directly, here there's no need, and it's cheaper not to.

9.4.4 See Also

The $; ($SUBSEP) variable in perlvar(1), and in the "Special Variables" section of Chapter 28 of Programming Perl; the stat function in perlfunc(1) and in Chapter 29 of Programming Perl; Chapter 5