3.2 Preventing Attacks on Formatting Functions

3.2.1 Problem

You use functions such as printf( ) or syslog( ) in your program, and you want to ensure that you use them in such a way that an attacker cannot coerce them into behaving in ways that you do not intend.

3.2.2 Solution

Functions such as the printf( ) family of functions provide a flexible and powerful way to format data easily. Unfortunately, they can be extremely dangerous as well. Following the guidelines outlined in the following Section 3.2.3 will allow you to easily avert many of the problems with these functions.

3.2.3 Discussion

The printf( ) family of functions?and other functions that use them, such as syslog( ) on Unix systems?all require an argument that specifies a format, as well as a variable number of additional arguments that are substituted at various locations in the format string to produce formatted output. The functions come in two major varieties:

  • Those that output to a file (printf( ) outputs to stdout)

  • Those that output to a string

Both can be dangerous, but the latter variety is significantly more so.

The format string is copied, character by character, until a percent (%) symbol is encountered. The characters that immediately follow the percent symbol determine what will be output in their place. For each substitution in the format string, the next argument in the variable argument list is used. Because of the way that variable-sized argument lists work in C (see Recipe 13.4), the functions assume that the number of arguments present in the argument list is equal to the number of substitutions required by the format string. The GCC compiler in particular will recognize calls to the functions in the printf( ) family, and it will emit warnings if it detects data type mismatches or an incorrect number of arguments in the variable argument list.

If you adhere to the following guidelines when using the printf( ) family of functions, you can be reasonably certain that you are using the functions safely:

Beware of the "%n" substitution.

All but one of the substitutions recognized by the printf( ) family of functions use arguments from the variable argument list as data to be substituted into the output. The lone exception is "%n", which writes the number of bytes written to the output buffer or file into the memory location pointed to by the next argument in the argument list.

While the "%n" substitution has its place, few programmers are aware of it and its implications. In particular, if external input is used for the format string, an attacker can embed a "%n" substitution into the format string to overwrite portions of the stack. The real problem occurs when all of the arguments in the variable argument list have been exhausted. Because arguments are passed on the stack in C, the formatting function will write into the stack.

To combat malicious uses of "%n", Immunix has produced a set of patches for glibc 2.2 (the standard C runtime library for Linux) known as FormatGuard. The patches take advantage of a GCC compiler extension that allows the preprocessor to distinguish between macros having the same name, but different numbers of arguments. FormatGuard essentially consists of a large set of macros for the syslog( ), printf( ), fprintf( ), sprintf( ), and snprintf( ) functions; the macros call safe versions of the respective functions. The safe functions count the number of substitutions in the format string, and ensure that the proper number of arguments has been supplied.

Do not use a string from an external source directly as the format specification.

Strings obtained from an external source may contain unexpected percent symbols in them, causing the formatting function to attempt to substitute arguments that do not exist. If you need simply to output the string str (to stdout using printf( ), for example), do the following:

printf("%s", str);

Following this rule to the letter is not always desirable. In particular, your program may need to obtain format strings from a data file as a consequence of internationalization requirements. The format strings will vary to some extent depending on the language in use, but they should always have identical substitutions.

When using vsprintf( ) or sprintf( ) to output to a string, be very careful of using the "%s" substitution without specifying a precision.

The vsprintf( ) and sprintf( ) functions both assume an infinite amount of space is available in the buffer into which they write their output. It is especially common to use these functions with a statically allocated output buffer. If a string substitution is made without specifying the precision, and that string comes from an external source, there is a good chance that an attacker may attempt to overflow the static buffer by forcing a string that is too long to be written into the output buffer. (See Recipe 3.3 for a discussion of buffer overflows.)

One solution is to check the length of the string to be substituted into the output before using it with vsprintf( ) or sprintf( ). Unfortunately, this solution is error-prone, especially later in your program's life when another programmer has to make a change to the size of the buffer or the format string, necessitating a change to the check.

A better solution is to use a precision modifier in the format string. For example, if no more than 12 characters from a string should ever be substituted into the output, use "%.12s" instead of simply "%s". The advantage to this solution is that it is part of the formatting function call; thus, it is less likely to be overlooked in the event of a later change to the format string.

Avoid using vsprintf( ) and sprintf( ). Use vsnprintf( ) and snprintf( ) or vasprintf( ) and asprintf( ) instead. Alternatively, use a secure string library such as SafeStr (see Recipe 3.4).

The functions vsprintf( ) and sprintf( ) assume that the buffer into which they write their output is large enough to hold it all. This is never a safe assumption to make and frequently leads to buffer overflow vulnerabilities. (See Recipe 3.3.)

The functions vasprintf( ) and asprintf( ) dynamically allocate a buffer to hold the formatted output that is exactly the required size. There are two problems with these functions, however. The first is that they're not portable. Most modern BSD derivatives (Darwin, FreeBSD, NetBSD, and OpenBSD) have them, as does Linux. Unfortunately, older Unix systems and Windows do not. The other problem is that they're slower because they need to make two passes over the format string, one to calculate the required buffer size, and the other to actually produce output in the allocated buffer.

The functions vsnprintf( ) and snprintf( ) are just as fast as vsprintf( ) and sprintf( ), but like vasprintf( ) and asprintf( ), they are not yet portable. They are defined in the C99 standard for C, and they typically enjoy the same availability as vasprintf( ) and asprintf( ). They both require an additional argument that specifies the length of the output buffer, and they will never write more data into the buffer than will fit, including the NULL terminating character.

3.2.4 See Also

  • FormatGuard from Immunix: http://www.immunix.org/formatguard.html

  • Recipe 3.3, Recipe 13.4