4.8 Autovivification and Hashes

Autovivification also works for hash references. If a variable containing undef is dereferenced as if it were a hash reference, a reference to an empty anonymous hash is inserted, and the operation continues.

One place this comes in very handy is in a typical data reduction task. For example let's say the Professor gets an island-area network up and running (perhaps using Coco-Net or maybe Vines), and now wants to track the traffic from host to host. He begins logging the number of bytes transferred to a log file, giving the source host, the destination host, and the number of transferred bytes:

professor.hut gilligan.crew.hut 1250
professor.hut lovey.howell.hut 910
thurston.howell.hut lovey.howell.hut 1250
professor.hut lovey.howell.hut 450
professor.hut laser3.copyroom.hut 2924
ginger.girl.hut professor.hut 1218
ginger.girl.hut maryann.girl.hut 199
...

Now the Professor wants to produce a summary of the source host, the destination host, and the total number of transferred bytes for the day. Tabulating the data is as simple as:

my %total_bytes;
while (<>) {
  my ($source, $destination, $bytes) = split;
  $total_bytes{$source}{$destination} += $bytes;
}

Let's see how this works on the first line of data. You'll be executing:

$total_bytes{"professor.hut"}{"gilligan.crew.hut"} += 1250;

Because %total_bytes is initially empty, the first key of professor.hut is not found, but it establishes an undef value for the dereferencing as a hash reference. (Keep in mind that an implicit arrow is between the two sets of curly braces here.) Perl sticks in a reference to an empty anonymous hash in that element, which then is immediately extended to include the element with a key of gilligan.crew.hut. Its initial value is undef, which acts like a zero when you add 1250 to it, and the result of 1250 is inserted back into the hash.

Any later data line that contains this same source host and destination host will re-use that same value, adding more bytes to the running total. But each new destination host extends a hash to include a new initially undef byte count, and each new source host uses autovivification to create a destination host hash. In other words, Perl does the right thing, as always.

Once you've processed the file, it's time to display the summary. First, you determine all the sources:

for my $source (keys %total_bytes) {
...

Now, you should get all destinations. The syntax for this is a bit tricky. You want all keys of the hash, resulting from dereferencing the value of the hash element, in the first structure:

for my $source (keys %total_bytes) {
  for my $destination (keys %{ $total_bytes{$source} }) {
....

For good measure, you should probably sort both lists to be consistent:

for my $source (sort keys %total_bytes) {
  for my $destination (sort keys %{ $total_bytes{$source} }) {
    print "$source => $destination:",
     " $total_bytes{$source}{$destination} bytes\n";
  }
  print "\n";
}

This is a typical data-reduction report generation strategy. Simply create a hash-of-hashrefs (perhaps nested even deeper, as you'll see later), using autovivification to fill in the gaps in the upper data structures as needed, and then walk through the resulting data structure to display the results.