1.1 Sanitizing the Environment

1.1.1 Problem

Attackers can often control the value of important environment variables, sometimes even remotely?for example, in CGI scripts, where invocation data is passed through environment variables.

You need to make sure that an attacker does not set environment variables to malicious values.

1.1.2 Solution

Many programs and libraries, including the shared library loader on both Unix and Windows systems, depend on environment variable settings. Because environment variables are inherited from the parent process when a program is executed, an attacker can easily sabotage variables, causing your program to behave in an unexpected and even insecure manner.

Typically, Unix systems are considerably more dependent on environment variables than are Windows systems. In fact, the only scenario common to both Unix and Windows is that there is an environment variable defining the path that the system should search to find an executable or shared library (although differently named variables are used on each platform). On Windows, one environment variable controls the search path for finding both executables and shared libraries. On Unix, these are controlled by separate environment variables. Generally, you should not specify a filename and then rely on these variables for determining the full path. Instead, you should always use absolute paths to known locations.[1]

[1] Note that the shared library environment variable can be relatively benign on modern Unix-based operating systems, because the environment variable will get ignored when a program that can change permissions (i.e., a setuid program) is invoked. Nonetheless, it is better to be safe than sorry!

Certain variables expected to be present in the environment can cause insecure program behavior if they are missing or improperly set. Make sure, therefore, that you never fully purge the environment and leave it empty. Instead, variables that should exist should be forced to sane values or, at the very least, treated as highly suspect and examined closely before they're used. Remove any unknown variables from the environment altogether.

1.1.3 Discussion

The standard C runtime library defines a global variable,[2] environ, as a NULL-terminated array of strings, where each string in the array is of the form "name=value".

[2] The use of the term "variable" can quickly become confusing because C defines variables and the environment defines variables. In this recipe, when we are referring to a C variable, we simply say "variable," and when we are referring to an environment variable, we say "environment variable."

Most systems do not declare the variable in any standard header file, Linux being the notable exception, providing a declaration in unistd.h. You can gain access to the variable by including the following extern statement in your code:

extern char **environ;

Several functions defined in stdlib.h, such as getenv( ) and putenv( ), provide access to environment variables, and they all operate on this variable. You can therefore make changes to the contents of the array or even build a new array and assign it to the variable.

This variable also exists in the standard C runtime library on Windows; however, the C runtime on Windows is not as tightly bound to the operating system as it is on Unix. Directly manipulating the environ variable on Windows will not necessarily produce the same effects as it will on Unix; in the majority of Windows programs, the C runtime is never used at all, instead favoring the Win32 API to perform the same functions as those provided by the C runtime. Because of this, and because of Windows' lack of dependence on environment variables, we do not recommend using the code in this recipe on Windows. It simply does not apply. However, we do recommend that you at least skim the textual content of this recipe so that you're aware of potential pitfalls that could affect you on Windows.

On a Unix system, if you invoke the command printenv at a shell prompt, you'll likely see a sizable list of environment variables as a result. Many of the environment variables you will see are set by whichever shell you're using (i.e., bash or tcsh). You should never use nor trust any of the environment variables that are set by the shell. In addition, a malicious user may be able to set other environment variables.

In most cases, the information contained in the environment variables set by the shell can be determined by much more reliable means. For example, most shells set the HOME environment variable, which is intended to be the user's home directory. It's much more reliable to call getuid( ) to determine who the user is, and then call getpwuid( ) to get the user's password file record, which will contain the user's home directory. For example:

#include <sys/types.h>
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <pwd.h>
   
int main(int argc, char *argv[  ]) {
  uid_t         uid;
  struct passwd *pwd;
   
  uid = getuid(  );
  printf("User's UID is %d.\n", (int)uid);
  if (!(pwd = getpwuid(uid))) {
    printf("Unable to get user's password file record!\n");
    endpwent(  );
    return 1;
  }
  printf("User's home directory is %s\n", pwd->pw_dir);
  endpwent(  );
   
  return 0;
}

The code above is not thread-safe. Be sure multiple threads do not try to manipulate the password database at the same time.

In many cases, it is reasonably safe to throw away most of the environment variables that your program will inherit from its parent process, but you should make it a point to be aware of any environment variables that will be used by code you're using, including the operating system's dynamic loader and the standard C runtime library. In particular, dynamic loaders on ELF-based Unix systems (among the Unix variants we're explicitly supporting in this book, Darwin is the major exception here because it does not use ELF (Executable and Linking Format) for its executable format) and most standard implementations of malloc( ) all recognize a wide variety of environment variables that control their behavior.

In most cases, you should never be doing anything in your programs that will make use of the PATH environment variable. Circumstances do exist in which it may be reasonable to do so, but make sure to weigh your options carefully beforehand. Indeed, you should consider carefully whether you should be using any environment variable in your programs. Regardless, if you launch external programs from within your program, you may not have control over what the external programs do, so you should take care to provide any external programs you launch with a sane and secure environment.

In particular, the two environment variables IFS and PATH should always be forced to sane values. The IFS environment variable is somewhat obscure, but it is used by many shells to determine which character separates command-line arguments. Modern Unix shells use a reasonable default value for IFS if it is not already set. Nonetheless, you should defensively assume that the shell does nothing of the sort. Therefore, instead of simply deleting the IFS environment variable, set it to something sane, such as a space, tab, and newline character.

The PATH environment variable is used by the shell and some of the exec*( ) family of standard C functions to locate an executable if a path is not explicitly specified. The search path should never include relative paths, especially the current directory as denoted by a single period. To be safe, you should always force the setting of the PATH environment variable to _PATH_STDPATH, which is defined in paths.h. This value is what the shell normally uses to initialize the variable, but an attacker or naïve user could change it later. The definition of _PATH_STDPATH differs from platform to platform, so you should generally always use that value so that you get the right standard paths for the system your program is running on.

Finally, the TZ environment variable denotes the time zone that the program should use, when relevant. Because users may not be in the same time zone as the machine (which will use a default whenever the variable is not set), it is a good idea to preserve this variable, if present. Note also that this variable is generally used by the OS, not the application. If you're using it at the application level, make sure to do proper input validation to protect against problems such as buffer overflow.

Finally, a special environment variable,, is defined to be the time zone on many systems. All systems will use it if it is defined, but while most systems will get along fine without it, some systems will not function properly without its being set. Therefore, you should preserve it if it is present.

Any other environment variables that are defined should be removed unless you know, for some reason, that you need the variable to be set. For any environment variables you preserve, be sure to treat them as untrusted user input. You may be expecting them to be set to reasonable values?and in most cases, they probably will be?but never assume they are. If for some reason you're writing CGI code in C, the list of environment variables passed from the web server to your program can be somewhat large, but these are largely trustworthy unless an attacker somehow manages to wedge another program between the web server and your program.

Of particular interest among environment variables commonly passed from a web server to CGI scripts are any environment variables whose names begin with HTTP_ and those listed in Table 1-1.

Table 1-1. Environment variables commonly passed from web servers to CGI scripts

Environment variable name

Comments

AUTH_TYPE

If authentication was required to make the request, this contains the authentication type that was used, usually "BASIC".

CONTENT_LENGTH

The number of bytes of content, as specified by the client.

CONTENT_TYPE

The MIME type of the content sent by the client.

GATEWAY_INTERFACE

The version of the CGI specification with which the server complies.

PATH_INFO

Extra path information from the URL.

PATH_TRANSLATED

Extra path information from the URL, translated by the server.

QUERY_STRING

The portion of the URL following the question mark.

REMOTE_ADDR

The IP address of the remote client in dotted decimal form.

REMOTE_HOST

The host name of the remote client.

REMOTE_IDENT

If RFC1413 identification was used, this contains the user name that was retrieved from the remote identification server.

REMOTE_USER

If authentication was required to make the request, this contains the user name that was authenticated.

REQUEST_METHOD

The method used to make the current request, usually either "GET" or "POST".

SCRIPT_NAME

The name of the script that is running, canonicalized to the root of the web site's document tree (e.g., DocumentRoot in Apache).

SERVER_NAME

The host name or IP address of the server.

SERVER_PORT

The port on which the server is running.

SERVER_PROTOCOL

The protocol used to make the request, typically "HTTP/1.0" or "HTTP/1.1".

SERVER_SOFTWARE

The name and version of the server.

The code presented in this section defines a function called spc_sanitize_environment( ) that will build a new environment with the IFS and PATH environment variables set to sane values, and with the TZ environment variable preserved from the original environment if it is present. You can also specify a list of environment variables to preserve from the original in addition to the TZ environment variable.

The first thing that spc_sanitize_environment( ) does is determine how much memory it will need to allocate to build the new environment. If the memory it needs cannot be allocated, the function will call abort( ) to terminate the program immediately. Otherwise, it will then build the new environment and replace the old environ pointer with a pointer to the newly allocated one. Note that the memory is allocated in one chunk rather than in smaller pieces for the individual strings. While this is not strictly necessary (and it does not provide any specific security benefit), it's faster and places less strain on memory allocation. Note, however, that you should be performing this operation early in your program, so heap fragmentation shouldn't be much of an issue.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <paths.h>
   
extern char **environ;
   
/* These arrays are both NULL-terminated. */
static char *spc_restricted_environ[  ] = {
  "IFS= \t\n",
  "PATH=" _PATH_STDPATH,
  0
};
   
static char *spc_preserve_environ[  ] = {
  "TZ",
  0
};
   
void spc_sanitize_environment(int preservec, char **preservev) {
  int    i;
  char   **new_environ, *ptr, *value, *var;
  size_t arr_size = 1, arr_ptr = 0, len, new_size = 0;
   
  for (i = 0;  (var = spc_restricted_environ[i]) != 0;  i++) {
    new_size += strlen(var) + 1;
    arr_size++;
  }
  for (i = 0;  (var = spc_preserve_environ[i]) != 0;  i++) {
    if (!(value = getenv(var))) continue;
    new_size += strlen(var) + strlen(value) + 2; /* include the '=' */
    arr_size++;
  }
  if (preservec && preservev) {
    for (i = 0;  i < preservec && (var = preservev[i]) != 0;  i++) {
      if (!(value = getenv(var))) continue;
      new_size += strlen(var) + strlen(value) + 2; /* include the '=' */
      arr_size++;
    }
  }
   
  new_size += (arr_size * sizeof(char *));
  if (!(new_environ = (char **)malloc(new_size))) abort(  );
  new_environ[arr_size - 1] = 0;
   
  ptr = (char *)new_environ + (arr_size * sizeof(char *));
  for (i = 0;  (var = spc_restricted_environ[i]) != 0;  i++) {
    new_environ[arr_ptr++] = ptr;
    len = strlen(var);
    memcpy(ptr, var, len + 1);
    ptr += len + 1;
  }
  for (i = 0;  (var = spc_preserve_environ[i]) != 0;  i++) {
    if (!(value = getenv(var))) continue;
    new_environ[arr_ptr++] = ptr;
    len = strlen(var);
    memcpy(ptr, var, len);
    *(ptr + len + 1) = '=';
    memcpy(ptr + len + 2, value, strlen(value) + 1);
    ptr += len + strlen(value) + 2; /* include the '=' */
  }
  if (preservec && preservev) {
    for (i = 0;  i < preservec && (var = preservev[i]) != 0;  i++) {
      if (!(value = getenv(var))) continue;
      new_environ[arr_ptr++] = ptr;
      len = strlen(var);
      memcpy(ptr, var, len);
      *(ptr + len + 1) = '=';
      memcpy(ptr + len + 2, value, strlen(value) + 1);
      ptr += len + strlen(value) + 2; /* include the '=' */
    }
  }
   
  environ = new_environ;
}

1.1.4 See Also

Recipe 1.7, Recipe 1.8