Recipe 5.16 Representing Relationships Between Data

5.16.1 Problem

You want to represent relationships between elements of datafor instance, the mother of relationship in a family tree or parent process for a process table. This is closely related to representing tables in relational databases (tables represent relationships between information) and to representing computer science graph structures (edges represent relationships between nodes).

5.16.2 Solution

Use a hash to represent the relationship.

5.16.3 Discussion

Here's part of the family tree from the Bible:

%father = ( 'Cain'      => 'Adam',
            'Abel'      => 'Adam',
            'Seth'      => 'Adam',
            'Enoch'     => 'Cain',
            'Irad'      => 'Enoch',
            'Mehujael'  => 'Irad',
            'Methusael' => 'Mehujael',
            'Lamech'    => 'Methusael',
            'Jabal'     => 'Lamech',
            'Jubal'     => 'Lamech',
            'Tubalcain' => 'Lamech',
            'Enos'      => 'Seth' );

This lets us, for instance, easily trace a person's lineage:

while (<>) {
    do {
        print "$_ ";        # print the current name
        $_ = $father{$_};   # set $_ to $_'s father
    } while defined;        # until we run out of fathers
    print "\n";

We can already ask questions like "Who begat Seth?" by checking the %father hash. By inverting this hash, we invert the relationship. This lets us use Recipe 5.9 to answer questions like "Whom did Lamech beget?"

while ( ($k,$v) = each %father ) {
    push( @{ $children{$v} }, $k );

$" = ', ';                  # separate output with commas
while (<>) {
    if ($children{$_}) {
        @children = @{$children{$_}};
    } else {
        @children = "nobody";
    print "$_ begat @children.\n";

Hashes can also represent relationships such as the C language #includes. A includes B if A contains #include B. This code builds the hash (it doesn't look for files in /usr/include as it should, but that's a minor change):

foreach $file (@files) {
    local *FH;
    unless (open(FH, " < $file")) {
        warn "Couldn't read $file: $!; skipping.\n";

    while (<FH>) {
        next unless /^\s*#\s*include\s*<([^>]+)>/;
        push(@{$includes{$1}}, $file);
    close FH;

This shows which files with include statements are not included in other files:

@include_free = ( );                 # list of files that don't include others
@uniq{map { @$_ } values %includes} = undef;
foreach $file (sort keys %uniq) {
        push( @include_free , $file ) unless $includes{$file};

The values of %includes are anonymous arrays because a single file can (and often does) include more than one other file. We use map to build up a big list of the included files and remove duplicates using a hash.

5.16.4 See Also

Recipe 4.7; the more complex data structures in Recipe 11.9 through Recipe 11.14