Managing Processes

Managing Processes

Now that you have examined what processes are, you will now look at some special features of processes as implemented in Solaris. One of the most innovative characteristics of processes under Solaris is the process file system (PROCFS), which is mounted as the /proc file system. Images of all currently active processes are stored in the /proc file system by their PID.

Here’s an example. First, a process is identified—in this example, the current Korn shell for the user pwatters:

# ps -eaf | grep pwatters
 pwatters 310   291  0   Mar 20 ?        0:04 /usr/openwin/bin/Xsun
 pwatters 11959 11934  0 09:21:42 pts/1    0:00 grep pwatters
 pwatters 11934 11932  1 09:20:50 pts/1    0:00 -ksh

Now that you have a target PID (11934), you can change to the /proc/11934 directory and you will be able to view the image of this process:

# cd /proc/11934
 # ls -l
total 3497
-rw-------   1 pwatters     other    1769472 Mar 30 09:20 as
-r--------   1 pwatters     other        152 Mar 30 09:20 auxv
-r--------   1 pwatters     other         32 Mar 30 09:20 cred
--w-------   1 pwatters     other          0 Mar 30 09:20 ctl
lr-x------   1 pwatters     other          0 Mar 30 09:20 cwd ->>
dr-x------   2 pwatters     other       1184 Mar 30 09:20 fd
-r--r--r--   1 pwatters     other        120 Mar 30 09:20 lpsinfo
-r--------   1 pwatters     other        912 Mar 30 09:20 lstatus
-r--r--r--   1 pwatters     other        536 Mar 30 09:20 lusage
dr-xr-xr-x   3 pwatters     other         48 Mar 30 09:20 lwp
-r--------   1 pwatters     other       2016 Mar 30 09:20 map
dr-x------   2 pwatters     other        544 Mar 30 09:20 object
-r--------   1 pwatters     other       2552 Mar 30 09:20 pagedata
-r--r--r--   1 pwatters     other        336 Mar 30 09:20 psinfo
-r--------   1 pwatters     other       2016 Mar 30 09:20 rmap
lr-x------   1 pwatters     other          0 Mar 30 09:20 root ->>
-r--------   1 pwatters     other       1440 Mar 30 09:20 sigact
-r--------   1 pwatters     other       1232 Mar 30 09:20 status
-r--r--r--   1 pwatters     other        256 Mar 30 09:20 usage
-r--------   1 pwatters     other          0 Mar 30 09:20 watch
-r--------   1 pwatters     other       3192 Mar 30 09:20 xmap

Each of the directories with the name associated with the PID contains additional subdirectories, which contain state information and related control functions. In addition, a watchpoint facility is provided, which is responsible for controlling memory access.

Tip 

A series of proc tools are available to interpret the information contained in the /proc subdirectories.

Using proc tools

The proc tools are designed to operate on data contained within the /proc file system. Each utility takes a PID as its argument and performs operations associated with the PID. For example, the pflags command prints the flags and data model details for the PID in question.

For the preceding Korn shell example, you can easily print out this status information:

# /usr/proc/bin/pflags 29081
29081:  /bin/ksh
        data model = _ILP32  flags = PR_ORPHAN
  /1:   flags = PR_PCINVAL|PR_ASLEEP [ waitid(0x7,0x0,0x804714c,0x7) ]

You can also print the credential information for this process, including the effective and real UID and GID of the process owner, by using the pcred command:

$ /usr/proc/bin/pcred 29081
29081:  e/r/sUID=100  e/r/sGID=10

Here, both the effective and the real UID is 100 (user pwatters), and the effective and real GID is 10 (group staff).

To examine the address space map of the target process, you can use the pmap command and all of the libraries it requires to execute:

# /usr/proc/bin/pmap 29081
29081:  /bin/ksh
08046000      8K read/write/exec     [ stack ]
08048000    160K read/exec         /usr/bin/ksh
08070000      8K read/write/exec   /usr/bin/ksh
08072000     28K read/write/exec     [ heap ]
DFAB4000     16K read/exec         /usr/lib/locale/en_AU/en_AU.so.2
DFAB8000      8K read/write/exec   /usr/lib/locale/en_AU/en_AU.so.2
DFABB000      4K read/write/exec     [ anon ]
DFABD000     12K read/exec         /usr/lib/libmp.so.2
DFAC0000      4K read/write/exec   /usr/lib/libmp.so.2
DFAC4000    552K read/exec         /usr/lib/libc.so.1
DFB4E000     24K read/write/exec   /usr/lib/libc.so.1
DFB54000      8K read/write/exec     [ anon ]
DFB57000    444K read/exec         /usr/lib/libnsl.so.1
DFBC6000     20K read/write/exec   /usr/lib/libnsl.so.1
DFBCB000     32K read/write/exec     [ anon ]
DFBD4000     32K read/exec         /usr/lib/libsocket.so.1
DFBDC000      8K read/write/exec   /usr/lib/libsocket.so.1
DFBDF000      4K read/exec         /usr/lib/libdl.so.1
DFBE1000      4K read/write/exec     [ anon ]
DFBE3000    100K read/exec         /usr/lib/ld.so.1
DFBFC000     12K read/write/exec   /usr/lib/ld.so.1
 total     1488K

It’s always surprising to see how many libraries are loaded when an application is executed, especially something as complicated as a shell, leading to a total of 1488KB memory used. You can obtain a list of the dynamic libraries linked to each process by using the pldd command:

# /usr/proc/bin/pldd 29081
29081:  /bin/ksh
/usr/lib/libsocket.so.1
/usr/lib/libnsl.so.1
/usr/lib/libc.so.1
/usr/lib/libdl.so.1
/usr/lib/libmp.so.2
/usr/lib/locale/en_AU/en_AU.so.2

As discussed in the previous section “Sending Signals,” signals are the way in which processes communicate with each other, and they can also be used from shells to communicate with spawned processes (usually to suspend or kill them).

By using the psig command, it is possible to list the signals associated with each process:

$ /usr/proc/bin/psig 29081
29081:  /bin/ksh
HUP     caught  RESTART
INT     caught  RESTART
QUIT    ignored
ILL     caught  RESTART
TRAP    caught  RESTART
ABRT    caught  RESTART
EMT     caught  RESTART
FPE     caught  RESTART
KILL    default
BUS     caught  RESTART
SEGV    default
SYS     caught  RESTART
PIPE    caught  RESTART
ALRM    caught  RESTART
TERM    ignored
USR1    caught  RESTART
USR2    caught  RESTART
CLD     default NOCLDSTOP
PWR     default
WINCH   default
URG     default
POLL    default
STOP    default
TSTP    ignored
CONT    default
TTIN    ignored
TTOU    ignored
VTALRM  default
PROF    default
XCPU    caught  RESTART
XFSZ    ignored
WAITING default
LWP     default
FREEZE  default
THAW    default
CANCEL  default
LOST    default
RTMIN   default
RTMIN+1 default
RTMIN+2 default
RTMIN+3 default
RTMAX-3 default
RTMAX-2 default
RTMAX-1 default
RTMAX   default

It is also possible to print a hexadecimal format stack trace for the lightweight process (LWP) in each process by using the pstack command. This can be useful in the same way that the truss command was used:

$ /usr/proc/bin/pstack 29081
29081:  /bin/ksh
 dfaf5347 waitid   (7, 0, 804714c, 7)
 dfb0d9db _waitPID (ffffffff, 8047224, 4) + 63
 dfb40617 waitPID  (ffffffff, 8047224, 4) + 1f
 0805b792 job_wait (719d) + 1ae
 08064be8 sh_exec  (8077270, 14) + af0
 0805e3a1 ???????? ()
 0805decd main     (1, 8047624, 804762c) + 705
  0804fa78 ???????? ()

Perhaps the most commonly used proc tool is the pfiles command, which displays all of the open files for each process. This is useful for determining operational dependencies between data files and applications:

$ /usr/proc/bin/pfiles 29081
29081:  /bin/ksh
  Current rlimit: 64 file descriptors
   0: S_IFCHR mode:0620 dev:102,0 ino:319009 UID:6049 GID:7 rdev:24,8
      O_RDWR|O_LARGEFILE
   1: S_IFCHR mode:0620 dev:102,0 ino:319009 UID:6049 GID:7 rdev:24,8
      O_RDWR|O_LARGEFILE
   2: S_IFCHR mode:0620 dev:102,0 ino:319009 UID:6049 GID:7 rdev:24,8
      O_RDWR|O_LARGEFILE
  63: S_IFREG mode:0600 dev:174,2 ino:990890 UID:6049 GID:1 size:3210
      O_RDWR|O_APPEND|O_LARGEFILE FD_CLOEXEC

In addition, it is possible to obtain the current working directory of the target process by using the pwdx command:

$ /usr/proc/bin/pwdx 29081
29081:  /home/paul

If you need to examine the process tree for all parent and child processes containing the target PID, you can use the ptree command. This is useful for determining dependencies between processes that are not apparent by consulting the process list:

$ /usr/proc/bin/ptree 29081
247   /usr/dt/bin/dtlogin -daemon
  28950 /usr/dt/bin/dtlogin -daemon
    28972 /bin/ksh /usr/dt/bin/Xsession
      29012 /usr/dt/bin/sdt_shell -c       unset DT;      DISPLAY=lion:0;
        29015 ksh -c       unset DT;      DISPLAY=lion:0;
                /usr/dt/bin/dt
          29026 /usr/dt/bin/dtsession
            29032 dtwm
              29079 /usr/dt/bin/dtterm
                29081 /bin/ksh
                  29085 /usr/local/bin/bash
                    29230 /usr/proc/bin/ptree 29081

Here, ptree has been executed from the Bourne again shell (bash), which was started from the Korn shell (ksh), spawned from the dtterm terminal window, which was spawned from the dtwm window manager, and so on.

Tip 

Although many of these proc tools will seem obscure, they are often very useful when trying to debug process-related application errors, especially in large applications like database management systems.

Using the lsof Command

lsof stands for "list open files" and lists information about files that are currently opened by the active processes running on Solaris. It is not included in the Solaris distribution; however, the current version can always be downloaded from ftp://vic.cc.purdue.edu/pub/tools/unix/lsof. Keep in mind that lsof is very sensitive to changes in OS releases, and recompilation may be necessary between Solaris 8 and 9.

What can you use lsof for? The answer largely depends on how many problems you encounter that relate to processes and files. Often, administrators are interested in knowing which processes are currently using a target file or files from a particular directory. This can occur when a file is locked by one application but is required by another application (again, a database system’s data files are one example where this might happen, if two database instances attempt to write to the files at once). If you know the path to a file of interest, you can use lsof to determine which processes are using files in that directory.

To examine the processes that are using files in the /tmp file system, use this:

$ lsof /tmp
COMMAND    PID USER      FD   TYPE DEVICE SIZE/OFF      NODE NAME
ssion      338 pwatters  txt   VREG    0,1   271596 471638794 /tmp (swap)
(unknown)  345 pwatters  txt   VREG    0,1   271596 471638794 /tmp (swap)
le        2295 pwatters  txt   VREG    0,1   271596 471638794 /tmp (swap)
le        2299 pwatters  txt   VREG    0,1   271596 471638794 /tmp (swap)

Obviously, there’s a bug in the routines that obtain the command name (the first four characters are missing!), but since the PID is correct, this is enough information to identify the four applications that are currently using files in /tmp. For example, dtsession (PID 338) manages the CDE session for the user pwatters, who is using a temporary text file in the /tmp directory. Later versions of lsof have fixed this bug.

Another common problem that lsof is used for, with respect to the /tmp file system, is the identification of processes that continue to write to unlinked files: thus space is being consumed, but it may appear that no files are growing any larger! This confusing activity can be traced back to a process by using lsof. However, rather than using lsof on the /tmp directory directly, you would need to examine the root directory ("/") on which /tmp is mounted. After finding the process that is writing to an open file, the process can be killed. If the size of a file is changing across several different sampling epochs (for example, by running the command once a minute), you’ve probably found the culprit:

# lsof /
COMMAND    PID   USER   FD  TYPE DEVICE SIZE/OFF   NODE NAME
(unknown)    1   root  txt  VREG  102,0   446144 118299 / (/dev/dsk/c0d0s0)
(unknown)    1   root  txt  VREG  102,0     4372 293504 / (/dev/dsk/c0d0s0)
(unknown)    1   root  txt  VREG  102,0   173272 293503 / (/dev/dsk/c0d0s0)
sadm        62   root  txt  VREG  102,0   954804 101535 / (/dev/dsk/c0d0s0)
sadm        62   root  txt  VREG  102,0   165948 101569 / (/dev/dsk/c0d0s0)
sadm        62   root  txt  VREG  102,0    16132 100766 / (/dev/dsk/c0d0s0)
sadm        62   root  txt  VREG  102,0     8772 100765 / (/dev/dsk/c0d0s0)
sadm        62   root  txt  VREG  102,0   142652 101571 / (/dev/dsk/c0d0s0)

One of the restrictions on mounting a file system is that you can’t unmount that file system if files are open on it: if files are open on a file system and it is dismounted, any changes made to the files may not be saved, resulting in data loss. Looking at a process list may not always reveal which processes are opening which files, and this can be very frustrating if Solaris refuses to unmount a file system because some files are open. Again, lsof can be used to identify the processes that are opening files on a specific file system.

The first step is to consult the output of the df command to obtain the names of currently mounted file systems:

$ df -k
Filesystem            kbytes    used   avail capacity  Mounted on
/proc                      0       0       0     0%    /proc
/dev/dsk/c0d0s0      2510214  929292 1530718    38%    /
fd                         0       0       0     0%    /dev/fd
/dev/dsk/c0d0s3      5347552  183471 5110606     4%    /usr/local
swap                  185524   12120  173404     7%    /tmp

If you wanted to unmount the /dev/dsk/c0d0s3 file system, but you were prevented from doing so because of open files, you can obtain a list of all open files under /usr/local by using this command:

$ lsof /dev/dsk/c0d0s3
COMMAND PID   USER  FD TYPE DEVICE SIZE/OFF   NODE NAME
httpd   981   root txt VREG  102,3  1747168 457895 /usr/local
httpd   982   root txt VREG  102,3   333692  56455 /usr/local
httpd   983   root txt VREG  102,3   333692  56455 /usr/local
httpd   984   root txt VREG  102,3   333692  56455 /usr/local
javac   985   root txt VREG  102,3   333692  56455 /usr/local
httpd   986   root txt VREG  102,3   333692  56455 /usr/local
httpd   987   root txt VREG  102,3   333692  56455 /usr/local
httpd   988   root txt VREG  102,3   333692  56455 /usr/local
httpd   989   root txt VREG  102,3   333692  56455 /usr/local
httpd   990   root txt VREG  102,3   333692  56455 /usr/local

Obviously, all of these processes will need to stop using the open files before the file system can be unmounted. If you’re not sure where a particular command is running from, or on which file system its data files are stored, you can also use lsof to check open files by passing the PID on the command line. First, you need to identify a PID by using the ps command:

$ ps -eaf | grep apache
  nobody  4911  4905  0   Mar 22 ?        0:00 /usr/local/apache/bin/httpd
  nobody  4910  4905  0   Mar 22 ?        0:00 /usr/local/apache/bin/httpd
  nobody  4912  4905  0   Mar 22 ?        0:00 /usr/local/apache/bin/httpd
  nobody  4905     1  0   Mar 22 ?        0:00 /usr/local/apache/bin/httpd
  nobody  4907  4905  0   Mar 22 ?        0:00 /usr/local/apache/bin/httpd
  nobody  4908  4905  0   Mar 22 ?        0:00 /usr/local/apache/bin/httpd
  nobody  4913  4905  0   Mar 22 ?        0:00 /usr/local/apache/bin/httpd
  nobody  4909  4905  0   Mar 22 ?        0:00 /usr/local/apache/bin/httpd
  nobody  4906  4905  0   Mar 22 ?        0:00 /usr/local/apache/bin/httpd

Now examine the process 4905 for Apache to see what files are currently being opened by it:

$ lsof -p 4905
COMMAND  PID  USER   FD   TYPE DEVICE  SIZE/OFF       NODE NAME
d       4905 nobody txt   VREG  102,3   333692  56455 /usr/local
(/dev/dsk/c0d0s3)
d       4905 nobody txt   VREG  102,0    17388 100789 / (/dev/dsk/c0d0s0)
d       4905 nobody txt   VREG  102,0   954804 101535 / (/dev/dsk/c0d0s0)
d       4905 nobody txt   VREG  102,0   693900 101573 / (/dev/dsk/c0d0s0)
d       4905 nobody txt   VREG  102,0    52988 100807 / (/dev/dsk/c0d0s0)
d       4905 nobody txt   VREG  102,0     4396 100752 / (/dev/dsk/c0d0s0)
d       4905 nobody txt   VREG  102,0   175736 100804 / (/dev/dsk/c0d0s0)

Apache obviously has a number of open files!

The ps Command

The following table summarizes the main options used with ps.

Option

Description

-a

Lists most frequently requested processes.

-A, -e

Lists all processes.

-c

List processes in scheduler format.

-d

List all processes.

-f

Prints comprehensive process information.

-g

Prints process information on a group basis for a single group.

-G

Prints process information on a group basis for a list of groups.

-j

Includes SID and PGID in printout.

-l

Prints complete process information.

-L

Displays LWP details.

-p

Lists process details for a list of specified processes.

-P

Lists the CPU ID to which a process is bound.

-s

Lists session leaders.

-t

Lists all processes associated with a specific terminal.

-u

Lists all processes for a specific user.

kill

The following table summarizes the main signals used to communicate with processes using kill.

Signal

Code

Action

Description

SIGHUP

1

Exit

Hang up

SIGINT

2

Exit

Interrupt

SIGQUIT

3

Core

Quit

SIGILL

4

Core

Illegal instruction

SIGTRAP

5

Core

Trace

SIGABRT

6

Core

Abort

SIGEMT

7

Core

Emulation trap

SIGFPE

8

Core

Arithmetic exception

SIGKILL

9

Exit

Killed

SIGBUS

10

Core

Bus error

SIGSEGV

11

Core

Segmentation fault

SIGSYS

12

Core

Bad system call

SIGPIPE

13

Exit

Broken pipe

SIGALRM

14

Exit

Alarm clock

SIGTERM

15

Exit

Terminate

pgrep

The pgrep command is used to search for a list of processes whose names match a pattern specified on the command line. The command returns a list of corresponding PIDs. This list can then be piped to another command, such as kill, to perform some action on the processes or send them a signal.

For example, to kill all processes associated with the name “java,” the following command would be used:

$ kill -9 `pgrep java`

pkill

The pkill command can be used to send signals to processes that have the same name. It is a more specific version of ?, since it can be used only to send signals, and the list of PIDs cannot be piped to another program.

To kill all processes associated with the name “java,” the following command would be used:

$ pkill -9 java

killall

The killall command is used to kill all processes running on a system. It is called by shutdown when the system is being brought to run level 0. However, since a signal can be passed to the killall command, it is possible for a superuser to send a different signal (other than 15) to all processes. For example, to send a SIGHUP signal to all processes, the following command could be used:

# killall 1


Part I: Solaris 9 Operating Environment, Exam I