5.3 Using kill to Control Processes

Linux and other Unix-like operating systems support a form of interprocess communication called signals. The kill command is used to send a signal to a running process. How a process responds to a signal, if it responds at all, depends on the specific signal sent and on the handler set by the process. If you are familiar with Unix signal handling, you will find that Apache adheres to the usual conventions, and you can probably skip this section. This section describes the use of kill in relation to Apache for readers who aren't accustomed to working with signals.

The name "kill" is a misnomer; it sounds as if the command is inherently destructive, but kill simply sends signals to programs. Only a few signals will actually kill the process by default. Most signals can be caught by the process, which may choose to either perform a specific action or ignore the signal. When a process is in a zombie or uninterruptible sleep( ) state, it might ignore any signals.

The following example will help dispel any fear of using this command. Most people who are familiar with the command line know that pressing Ctrl-C will usually terminate a process running in a console. For example, it is common to execute:

panic% tail -f /home/httpd/httpd_perl/logs/error_log

to monitor the Apache server's error_log file. The only way to stop tail is by pressing Ctrl-C in the console in which the process is running. The same result can be achieved by sending the INT (interrupt) signal to this process. For example:

panic% kill -INT 17084

When this command is run, the tail process is aborted, assuming that the process identifier (PID) of the tail process is 17084.

Every process running in the system has its own PID. kill identifies processes by their PIDs. If kill were to use process names and there were two tail processes running, it might send the signal to the wrong process. The most common way to determine the PID of a process is to use ps to display information about the current processes on the machine. The arguments to this utility vary depending on the operating system. For example, on BSD-family systems, the following command works:

panic% ps auxc | grep tail

On a System V Unix flavor such as Solaris, the following command may be used instead:

panic% ps -eaf | grep tail

In the first part of the command, ps prints information about all the current processes. This is then piped to a grep command that prints lines containing the text "tail". Assuming only one such tail process is running, we get the following output:

root  17084  0.1  0.1  1112  408  pts/8  S  17:28  0:00  tail

The first column shows the username of the account running the process, the second column shows the PID, and the last column shows the name of the command. The other columns vary between operating systems.

Processes are free to ignore almost all signals they receive, and there are cases when they will. Let's run the less command on the same error_log file:

panic% less /home/httpd/httpd_perl/logs/error_log

Neither pressing Ctrl-C nor sending the INT signal will kill the process, because the implementers of this utility chose to ignore that signal. The way to kill the process is to type q.

Sometimes numerical signal values are used instead of their symbolic names. For example, 2 is normally the numeric equivalent of the symbolic name INT. Hence, these two commands are equivalent on Linux:

panic% kill -2 17084
panic% kill -INT 17084

On Solaris, the -s option is used when working with symbolic signal names:

panic% kill -s INT 17084

To find the numerical equivalents, either refer to the signal(7) manpage, or ask Perl to help you:

panic% perl -MConfig -e 'printf "%6s %2d\n", $_, $sig++ \
                    for split / /, $Config{sig_name}'

If you want to send a signal to all processes with the same name, you can use pkill on Solaris or killall on Linux.

5.3.1 kill Signals for Stopping and Restarting Apache

Apache performs certain actions in response to the KILL, TERM, HUP, and USR1 signals (as arguments to kill). All Apache system administrators should be familiar with the use of these signals to control the Apache web server.

By referring to the signal.h file, we learn the numerical equivalents of these signals:

#define SIGHUP     1    /* hangup, generated when terminal disconnects */
#define SIGKILL    9    /* last resort */
#define SIGTERM   15    /* software termination signal */
#define SIGUSR1   30    /* user defined signal 1 */

The four types of signal are:

KILL signal: forcefully shutdown

The KILL (9) signal should never be used unless absolutely necessary, because it will unconditionally kill Apache, without allowing it to clean up properly. For example, the httpd.pid file will not be deleted, and any existing requests will simply be terminated halfway through. Although failure to delete httpd.pid is harmless, if code was registered to run upon child exit but was not executed because Apache was sent the KILL signal, you may have problems. For example, a database connection may be closed incorrectly, leaving the database in an inconsistent state.

The three other signals have safe and legitimate uses, and the next sections will explain what happens when each of them is sent to an Apache server process.

It should be noted that these signals should be sent only to the parent process, not to any of the child processes. The parent process PID may be found either by using ps auxc | grep apache (where it will usually be the lowest-numbered Apache process) or by executing cat on the httpd.pid file. See Section 5.3.3, later in this chapter, for more information.

TERM signal: stop now

Sending the TERM signal to the parent causes it to attempt to kill off all its children immediately. Any requests in progress are terminated, and no further requests are accepted. This operation may take tens of seconds to complete. To stop a child, the parent sends it an HUP signal. If the child does not die before a predetermined amount of time, the parent sends a second HUP signal. If the child fails to respond to the second HUP, the parent then sends a TERM signal, and if the child still does not die, the parent sends the KILL signal as a last resort. Each failed attempt to kill a child generates an entry in the error_log file.

Before each process is terminated, the Perl cleanup stage happens, in which Perl END blocks and global objects' DESTROY methods are run.

When all child processes have been terminated, all open log files are closed and the parent itself exits.

Unless an explicit signal name is provided, kill sends the TERM signal by default. Therefore:

panic# kill -TERM 1640

and:

panic# kill 1640

will do the same thing.

HUP signal: restart now

Sending the HUP signal to the parent causes it to kill off its children as if the TERM signal had been sent. That is, any requests in progress are terminated, but the parent does not exit. Instead, the parent rereads its configuration files, spawns a new set of child processes, and continues to serve requests. It is almost equivalent to stopping and then restarting the server.

If the configuration files contain errors when restart is signaled, the parent will exit, so it is important to check the configuration files for errors before issuing a restart. We'll cover how to check for errors shortly.

Using this approach to restart mod_perl-enabled Apache may cause the processes' memory consumption to grow after each restart. This happens when Perl code loaded in memory is not completely torn down, leading to a memory leak.

USR1 signal: gracefully restart now

The USR1 signal causes the parent process to advise the children to exit after serving their current requests, or to exit immediately if they are not serving a request. The parent rereads its configuration files and reopens its log files. As each child dies off, the parent replaces it with a child from the new generation (the new children use the new configuration) and the new child processes begin serving new requests immediately.

The only difference between USR1 and HUP is that USR1 allows the children to complete any current requests prior to terminating. There is no interruption in the service, unlike with the HUP signal, where service is interrupted for the few (and sometimes more) seconds it takes for a restart to complete.

By default, if a server is restarted using the USR1 or the HUP signal and mod_perl is not compiled as a DSO, Perl scripts and modules are not reloaded. To reload modules pulled in via PerlRequire, PerlModule, or use, and to flush the Apache::Registry cache, either completely stop the server and then start it again, or use this directive in httpd.conf:

PerlFreshRestart On

(This directive is not always recommended. See Chapter 22 for further details.)

5.3.2 Speeding Up Apache's Termination and Restart

Restart or termination of a mod_perl server may sometimes take quite a long time, perhaps even tens of seconds. The reason for this is a call to the perl_destruct( ) function during the child exit phase, which is also known as the cleanup phase. In this phase, the Perl END blocks are run and the DESTROY method is called on any global objects that are still around.

Sometimes this will produce a series of messages in the error_log file, warning that certain child processes did not exit as expected. This happens when a child process, after a few attempts have been made to terminate it, is still in the middle of perl_destruct( ). So when you shut down the server, you might see something like this:

[warn]   child process 7269 still did not exit,
         sending a SIGTERM
[error]  child process 7269 still did not exit,
         sending a SIGKILL
[notice] caught SIGTERM, shutting down

First, the parent process sends the TERM signal to all of its children, without logging a thing. If any of the processes still doesn't quit after a short period, it sends a second TERM, logs the PID of the process, and marks the event as a warning. Finally, if the process still hasn't terminated, it sends the KILL signal, which unconditionaly terminates the process, aborting any operation in progress in the child. This event is logged as an error.

If the mod_perl scripts do not contain any END blocks or DESTROY methods that need to be run during shutdown, or if the ones they have are nonessential, this step can be avoided by setting the PERL_DESTRUCT_LEVEL environment variable to -1. (The -1 value for PERL_DESTRUCT_LEVEL is special to mod_perl.) For example, add this setting to the httpd.conf file:

PerlSetEnv PERL_DESTRUCT_LEVEL -1

What constitutes a significant cleanup? Any change of state outside the current process that cannot be handled by the operating system itself. Committing database transactions and removing the lock on a resource are significant operations, but closing an ordinary file is not. For example, if DBI is used for persistent database connections, Perl's destructors should not be switched off.

5.3.3 Finding the Right Apache PID

In order to send a signal to a process, its PID must be known. But in the case of Apache, there are many httpd processes running. Which one should be used? The parent process is the one that must be signaled, so it is the parent's PID that must be identified.

The easiest way to find the Apache parent PID is to read the httpd.pid file. To find this file, look in the httpd.conf file. Open httpd.conf and look for the PidFile directive. Here is the line from our httpd.conf file:

PidFile /home/httpd/httpd_perl/logs/httpd.pid

When Apache starts up, it writes its own process ID in httpd.pid in a human-readable format. When the server is stopped, httpd.pid should be deleted, but if Apache is killed abnormally, httpd.pid may still exist even if the process is not running any more.

Of course, the PID of the running Apache can also be found using the ps(1) and grep(1) utilities (as shown previously). Assuming that the binary is called httpd_perl, the command would be:

panic% ps auxc | grep httpd_perl

or, on System V:

panic% ps -ef | grep httpd_perl

This will produce a list of all the httpd_perl (parent and child) processes. If the server was started by the root user account, it will be easy to locate, since it will belong to root. Here is an example of the sort of output produced by one of the ps command lines given above:

root   17309 0.9 2.7 8344 7096 ?  S 18:22 0:00 httpd_perl
nobody 17310 0.1 2.7 8440 7164 ?  S 18:22 0:00 httpd_perl
nobody 17311 0.0 2.7 8440 7164 ?  S 18:22 0:00 httpd_perl
nobody 17312 0.0 2.7 8440 7164 ?  S 18:22 0:00 httpd_perl

In this example, it can be seen that all the child processes are running as user nobody whereas the parent process runs as user root. There is only one root process, and this must be the parent process. Any kill signals should be sent to this parent process.

If the server is started under some other user account (e.g., when the user does not have root access), the processes will belong to that user. The only truly foolproof way to identify the parent process is to look for the process whose parent process ID (PPID) is 1 (use ps to find out the PPID of the process).

If you have the GNU tools installed on your system, there is a nifty utility that makes it even easier to discover the parent process. The tool is called pstree, and it is very simple to use. It lists all the processes showing the family hierarchy, so if we grep the output for the wanted process's family, we can see the parent process right away. Running this utility and greping for httpd_perl, we get:

panic% pstree -p | grep httpd_perl
  |-httpd_perl(17309)-+-httpd_perl(17310)
  |                   |-httpd_perl(17311)
  |                   |-httpd_perl(17312)

And this one is even simpler:

panic% pstree -p | grep 'httpd_perl.*httpd_perl'
  |-httpd_perl(17309)-+-httpd_perl(17310)

In both cases, we can see that the parent process has the PID 17309.

ps's f option, available on many Unix platforms, produces a tree-like report of the processes as well. For example, you can run ps axfwwww to get a tree of all processes.