Recipe 20.13 Processing Server Logs

20.13.1 Problem

You need to summarize your server logs, but you don't have a customizable program to do it.

20.13.2 Solution

Parse the error log yourself with regular expressions, or use the Logfile modules from CPAN.

20.13.3 Discussion

Example 20-9 is a sample report generator for an Apache weblog.

Example 20-9. sumwww
  #!/usr/bin/perl -w
  # sumwww - summarize web server log activity
  
  $lastdate = "";
  daily_logs( );
  summary( );
  exit;
  
  # read CLF files and tally hits from the host and to the URL
  sub daily_logs {
      while (<>) {
          ($type, $what) = /"(GET|POST)\s+(\S+?) \S+"/ or next;
          ($host, undef, undef, $datetime) = split;
          ($bytes) = /\s(\d+)\s*$/ or next;
          ($date)  = ($datetime =~ /\[([^:]*)/);
          $posts  += ($type eq POST);
          $home++ if m, / ,;
          if ($date ne $lastdate) {
              if ($lastdate) { write_report( )     }
              else           { $lastdate = $date  }
          }
          $count++;
          $hosts{$host}++;
          $what{$what}++;
          $bytesum += $bytes;
      }
      write_report( ) if $count;
  }
  
  # use *typeglob aliasing of global variables for cheap copy
  sub summary  {
      $lastdate = "Grand Total";
      *count   = *sumcount;
      *bytesum = *bytesumsum;
      *hosts   = *allhosts;
      *posts   = *allposts;
      *what    = *allwhat;
      *home    = *allhome;
      write;
  }
  
  # display the tallies of hosts and URLs, using formats
  sub write_report {
      write;
  
      # add to summary data
      $lastdate    = $date;
      $sumcount   += $count;
      $bytesumsum += $bytesum;
      $allposts   += $posts;
      $allhome    += $home;
  
      # reset daily data
      $posts = $count = $bytesum = $home = 0;
      @allwhat{keys %what}   = keys %what;
      @allhosts{keys %hosts} = keys %hosts;
      %hosts = %what = ( );
  }
  
  format STDOUT_TOP =
  @|||||||||| @|||||| @||||||| @||||||| @|||||| @|||||| @|||||||||||||
  "Date",     "Hosts", "Accesses", "Unidocs", "POST", "Home", "Bytes"
  ----------- ------- -------- -------- ------- ------- --------------
  .
  
  format STDOUT =
  @>>>>>>>>>> @>>>>>> @>>>>>>> @>>>>>>> @>>>>>> @>>>>>> @>>>>>>>>>>>>>
  $lastdate,  scalar(keys %hosts), 
              $count, scalar(keys %what),
                               $posts,  $home,   $bytesum
  .

Here's sample output from that program:

     Date      Hosts  Accesses Unidocs   POST    Home       Bytes 
----------- ------- -------- -------- ------- ------- -------------- 
19/May/1998     353     6447     3074     352      51       16058246 
20/May/1998    1938    23868     4288     972     350       61879643 
21/May/1998    1775    27872     6596    1064     376       64613798 
22/May/1998    1680    21402     4467     735     285       52437374 
23/May/1998    1128    21260     4944     592     186       55623059 
Grand Total    6050   100849    10090    3715    1248      250612120 

Use the Logfile::Apache module from CPAN, shown in Example 20-10, to write a similar, but less specific, program. This module is distributed with other Logfile modules in a single Logfile distribution (Logfile-0.115.tar.gz at the time of this writing).

Example 20-10. aprept
  #!/usr/bin/perl -w
  # aprept - report on Apache logs
  
  use Logfile::Apache;
  
  $l = Logfile::Apache->new(
      File  => "-",                   # STDIN
      Group => [ Domain, File ]);
  
  $l->report(Group => Domain, Sort => Records);
  $l->report(Group => File,   List => [Bytes,Records]);

The new constructor reads a log file and builds indices internally. Supply a filename with the parameter named File and the fields to index in the Group parameter. The possible fields are Date (date request), Hour (time of day the request was received), File (file requested), User (username parsed from request), Host (hostname requesting the document), and Domain (Host translated into "France", "Germany", etc.).

To produce a report on STDOUT, call the report method. Give the index to use with the Group parameter, and optionally say how to sort (Records is by number of hits, Bytes by number of bytes transferred) or how to break it down further (by number of bytes or number of records).

Here's some sample output:

Domain                  Records 
= == == == == == == == == == == == == == == ==
US Commercial        222 38.47% 
US Educational       115 19.93% 
Network               93 16.12% 
Unresolved            54  9.36% 
Australia             48  8.32% 
Canada                20  3.47% 
Mexico                 8  1.39% 
United Kingdom         6  1.04% 

File                               Bytes          Records
= == == == == == == == == == == == == == == == == == == == == == == == == == == == ==
/                           13008  0.89%         6  1.04% 
/cgi-bin/MxScreen           11870  0.81%         2  0.35% 
/cgi-bin/pickcards          39431  2.70%        48  8.32% 
/deckmaster                143793  9.83%        21  3.64% 
/deckmaster/admin           54447  3.72%         3  0.52% 

20.13.4 See Also

The documentation for the CPAN module Logfile::Apache; perlform(1) and Chapter 7 of Programming Perl