25.5 Rules Check Header Contents

Recall that a header line declaration looks like the following:

H?flags?name:field   

Here, the H begins the line and tells sendmail that a header definition follows. The ?flags? expression causes sendmail to include the header only if one of the flags is found in the selected delivery agent's F= equate. As you saw in the previous section, beginning with V8.10, a macro name can replace the flags. The name and a colon then follow.

Beginning with V8.10, sendmail allows the name of a rule set to replace the field value. That rule set declaration can come in two forms:

Hname: $>  rule  set
Hname: $>+  rule  set  don't strip comments

Both forms basically say the same thing: if sendmail finds a header name already in a message it is processing, it passes the existing header field to the rule set indicated. The + in the second form tells sendmail to leave intact (not strip) parenthesized RFC2822 comments from the passed field:

text (comments)

The $> in the earlier declaration passes just text to the rule set, while $>+ passes the unstripped text with RFC2822 comments intact.

If the rule set specified is not a legal rule set name, or if it is missing, the following error will be printed and logged:

cf file name: line number: invalid rule set name: "bad name" 

If the named rule set does not exist in the configuration file, the effect is the same as if it did exist and had returned a legal value.

Rule sets called to process headers can return two possible rejection values, a $#error or a $#discard. If a $#error is returned, the entire message is rejected. If a $#discard is returned, the message is accepted, then silently discarded. If anything else is returned, the message and that header are both allowed. To illustrate, consider the following code which rejects spam messages that are addressed with a To: header that contains unwanted usernames:

LOCAL_CONFIG
C{SpamUserNames} investor adult friend you ValuedCustomer Valued-Customer
HTo: $>ScreenTo

LOCAL_RULESETS
SScreenTo
R $* $={SpamUserNames} @ $*      $#error $: "553 To: header rejected"
R $*                             $: OK

In the LOCAL_CONFIG part of your mc file, the line beginning with C declares a class and assigns values to that class. The class name is {SpamUserNames} and the class contains as its values six usernames that commonly appear as the user part of addresses in the To: header.

The line beginning with H declares a To: header and a rule set to handle that header. The $> tells sendmail to strip parenthesized RFC2822 comments from the address that followed the To: in the message, and to pass that stripped address to the ScreenTo rule set.

The LOCAL_RULESETS part of this mc file contains a single rule set, the ScreenTo rule set, which contains two rules. The first rule asks if the address in the workspace has a user part that matches any of the names listed in the class $={SpamUserNames}. If the address contains an objectionable username, the entire message is rejected by returning the error delivery agent with the expression $#error.

The last rule (the $*) causes all other addresses to return OK. Technically, the last rule is not needed because, even in its absence, the original workspace will be returned, and because that original workspace will contain neither $#error nor $#discard, the message will be allowed.

The $: part following the $#error is required. It tells sendmail how to reject the message. See error for a description of how this process works.

25.5.1 Use $>+ to Include RFC2822 Comments

Some headers contain addresses, along with other important information, that appears as RFC2822 commentary. The Received: header is one such header:

                         RFC2822 commentary starts here                and ends here 
                                                                        
 Received: from some.other.domain (root@some.other.domain [29.22.14.17])
         by your.domain (8.12.4/8.12.4) with ESMTP id g5CMW6KF010979
         for <you@your.domain>; Wed, 12 Jun 2002 16:32:09 -0600 (MDT)

Other headers, such as the Subject: header, do not contain addresses:

Subject: Make money now (Adult Triple-X web site)

When screening such headers, it is important that they are not interpreted as addresses or information might be lost.

Consider the previous Subject: header's value. If such a header were screened with an H configuration file line like this:

HSubject: $>ScreenSubject

the rule set named ScreenSubject would be given the following value to parse:

Make money now

Beginning with V8.10, sendmail offers the $>+ operator to prevent parenthetical RFC2822 comments from being stripped out of headers that do not contain addresses as values:

HSubject: $>+ScreenSubject
             
          note

By using this new operator, the original subject is passed to the ScreenSubject rule set in a form that is much more intact:

Make money now(Adult Triple-X web site)

Note that because of the way sendmail splits up addresses and pastes them back together, the space between the now and the ( has been lost. But this does not matter because of the way rule matching operates.

As a side benefit, the ${currHeader} sendmail macro is filled with the header's value, and so will contain the original header value unchanged and quoted. The fact that it is quoted is important because quoting prevents the value from being viewed by sendmail as tokens.

Consider the need to screen out messages that contain the text Adult Triple-X anywhere in the Subject: header.

LOCAL_CONFIG
KRegxxx regex -a@MATCH Adult Triple-X
HSubject: $>+ScreenSubject

LOCAL_RULESETS
SScreenSubject
R$*           $: $( Regxxx $&{currHeader} $)
R@MATCH       $#error $@ 5.7.0 $: "553 pornographic subject"

Here, the LOCAL_CONFIG part of this mc file contains two configuration commands. The first creates a regular expression database map (regex) called Regxxx. It says to return (the -a) the value @MATCH if the value looked up contains the text Adult Triple-X surrounded by any other text.

The second declares a header with the H configuration command. This tells sendmail to pass the value of all Subject: headers to the rule set named ScreenSubject. The addition of the + to the $ > prevents sendmail from stripping RFC parenthetical comments from the value.

The LOCAL_RULESETS part of this mc file contains a single rule set, the ScreenSubject rule set, which contains two rules. The first rule looks up the unaltered Subject:'s value in the ${currHeader} sendmail macro using the Regxxx database map. If the value in the ${currHeader} macro contains the text Adult Triple-X anywhere in it, the first rule returns the new workspace value @MATCH. If the text Adult Triple-X is not found, the value of the ${currHeader} macro is returned as the workspace.

The second rule looks for a match by detecting a workspace that contains only @MATCH. If there is a match, the message is rejected with the error message "553 pornographic subject."

25.5.1.1 Check the header's length

Sometimes it can be desirable to reject headers based on their length. As we described in the previous section, when a header is screened with $> or $>+, the unaltered value of the header is stored in the ${currHeader} macro. At the same time, the length of the header's value is also stored in the ${hdrlen} macro.

To illustrate one possible use for this macro, consider the following abstract from your mc file:

LOCAL_CONFIG
Kcompute arith              V8.10 and above
HSubject: $>ScreenSubject

LOCAL_RULESETS
SScreenSubject
R$*           $: $(compute l $@ 200 $@ $&{hdrlen} $)
RTRUE         $#error $@ 5.7.0 $: "553 Subject too long"

The LOCAL_CONFIG part of this mc file contains two configuration commands. The first declares an arith database map (arith) named compute. The second tells sendmail to screen all Subject: headers using the ScreenSubject rule set.

The LOCAL_RULESETS part of this mc file contains a single rule set, the ScreenSubject rule set, which has two rules. The first rule uses the compute database map to compare the value in the ${hdrlen} macro with the constant 200. The l asks if 200 is less than the value in ${hdrlen}. If it is, this rule will return TRUE in the workspace. Otherwise, it will return FALSE.

The second rule says that if the first rule returned TRUE (200 is less than the header's length, or the header's length is greater than 199), reject the message.

25.5.2 H* a Default for All Headers

The previous two sections have shown it is possible to screen specific headers for properties to accept or reject. There will be times, however, when you might wish to screen all headers that do not have their own rule sets. Using an * in place of the header name provides just such a mechanism:

H*: $>ScreenAll

The * tells sendmail to pass all headers, except those that have their own H configuration line rule set, to the ScreenAll rule set. Use $>+ instead of $>, if you want to prevent sendmail from stripping RFC2822 parenthetical comments from each header's value.

Consider a site that sends email only to mailing lists. On such a site, it is desirable to prevent mail that is considered spam from going out. One way to do this is to reject all mail that contains addresses that are either in Cc: or Bcc: headers (good addresses should only be in To: headers). Such a site might have an mc file that contains the following:

LOCAL_CONFIG
C{BannedRecipientHeaders} Cc Bcc
H*:     $>CheckBanned

LOCAL_RULESETS
SCheckBanned
R $*                              $: $&{hdr_name}
R $={BannedRecipientHeaders}      $#error $@ 5.7.0 $: "553 Banned recipient header"

The LOCAL_CONFIG part of this mc file contains two configuration commands. The first declares a class called BannedRecipientHeaders and assigns to that class a list of header names that should be banned, those being the Cc: or Bcc: headers with the colon removed.

The second configuration command starts with the wildcard form of the H configuration command. The * in place of a header's name causes all headers, other than those that have their own H configuration commands, to be screened by the CheckBanned rule set.

The LOCAL_RULESETS part of this mc file contains a single rule set, the CheckBanned rule set, which contains two rules. The first rule simply replaces the workspace with the value in the ${hdr_name} sendmail macro. That macro contains as its current value the name of the header passed to this rule set.

The second rule checks, on its LHS, to see if the header name is one of those listed in the class $={BannedRecipientHeaders}. If the header is found, the entire message is rejected.

Note that this example will also reject inbound mail that contains Cc: or Bcc: headers. A better design would include a test to be sure the message originated from the local machine.

25.5.3 The check_eoh Rule Set

After all headers have been processed by sendmail, a couple of statistics become available that can be of use in screening messages. One is the number of headers found. The other is the total number of bytes in all the headers (including the names, colons, whitespace, and values). If you should ever need this information, you can process it by declaring a special rule set named check_eoh. If that rule set exists, it will be passed the number of headers, and the total number of bytes in all the headers:

number of headers  $|   total bytes 

If it exists, sendmail will call the check_eoh rule set after all headers have otherwise been processed.

Some users have been known to bury information in headers that should not leave a security-conscious site. Clearly, it is not possible to individually screen all possible headers. Instead, one approach might simply be to reject messages that contain more than 25 headers or more than 10000 bytes of headers. The following extract from a site's mc file does just that:

LOCAL_CONFIG
Kcompute arith

LOCAL_RULESETS
Scheck_eoh
R $* $| $*           $: $(compute l $@ 25 $@ $1 $) $| $2
R TRUE $| $*         $#error $@ 5.7.0 $: "553 Too many headers"
R $* $| $*           $: $(compute l $@ 10000 $@ $2 $)
R TRUE               $#error $@ 5.7.0 $: "553 Too many header bytes"

The LOCAL_CONFIG part of this mc file declares an arith database map (arith) named compute.

The LOCAL_RULESETS part of this mc file declares the specially named rule set check_eoh, which has four rules.

The first rule passes $1, the value to the left of the $| in the workspace, to the compute database map. A comparison is made to see if 25 is less than that value. If it is, this rule will return TRUE, or a $| and $2 in the workspace. Otherwise, it will return FALSE, or a $| and $2.

The second rule checks to see if the comparison was true. If it was (if 25 is less than the number of headersthat is, if the number of headers is greater than 25), the message is rejected.

The third rule passes the value to the right of the $| in the workspace, to the compute database map. A comparison is made to see if 10000 is less than that valuethat is, less than the total number of bytes in the values of all the headers. If it is, this rule will return TRUE. Otherwise, it will return FALSE.

The fourth rule checks to see if the comparison was true. If it was (if 10000 is less than the number of bytesthat is, if the number of bytes is greater than 9999), the message is rejected.

Note that this example could wrongly reject inbound mail. A better design would include a test to be sure the message originated from the local network.

25.5.3.1 Check for missing headers

The check_eoh rule set can also be used to detect missing headers. Although the Message-Id: is not mandatory, its absence often indicates that a message is a spam.[5] The following abstract from an mc file shows one way to detect a missing header, and to reject a message based on that absence:

[5] But be aware that header checks are also performed for command-line submitted mail. If a program such as cron(8) or lpd generates mail lacking a Message-Id: header, that mail will also be rejected. So avoid placing rules such as these in your submit.cf file.

LOCAL_CONFIG
Kstorage macro
HMessage-Id: $>ScreenMessageId

LOCAL_RULESETS
SScreenMessageId
R $*                     $: $(storage {GotMessageId} $@ YES $) $1

Scheck_eoh
R $*                     $: < $&{GotMessageId} >
R $*                     $: $(storage {GotMessageId} $) $1
R < YES >                $@ OK
R < >                    $#error $@ 5.7.0 $: 553 Missing Header

The LOCAL_CONFIG part of this mc file contains two configuration commands. The first declares a macro-type database map (macro) which is used to store a value into a sendmail macro via a rule set. The second configuration command causes the Message-Id: header to be screened by the ScreenMessageId rule set.

The LOCAL_RULESETS part of this mc file declares two rule sets. The ScreenMessageId rule set has a single rule which simply stores the literal value YES into the ${GotMessageId} macro. This means that the Message-Id: header was found.

The check_eoh rule set, which contains five rules, is called after all headers have been processed. The first rule fetches the current value (the $& prefix) found in the {GotMessageId} macro and places that value (surrounded by angle braces) into the workspace. If the {GotMessageId} macro lacks a value (if no Message-Id: header was found), the workspace will contain angle braces with nothing between them.

The second rule clears the value from the ${GotMessageId} macro so that it can be reused for the next message that is processed by sendmail.

The third rule looks for a literal <YES> in the workspace, which would appear if the Message-Id: header had been found, and causes the message to be accepted by returning a $@OK on the RHS.

The last rule looks for nothing between the angle braces, which means there was no Message-Id: header in the message. The $#error causes the message to be rejected with the line error 553 5.7.0 Missing Header.

You probably should not use these rules as is because email that originates internally might not have a Message-Id: header and you will need to allow for such mail.



    Part I: Build and Install
    Part II: Administration
    Part III: The Configuration File
    Chapter 21. The D (Define a Macro) Configuration Command
    Chapter 24. The O (Options) Configuration Command