Now that you have examined what processes are, you will now look at some special features of processes as implemented in Solaris. One of the most innovative characteristics of processes under Solaris is the process file system (PROCFS), which is mounted as the /proc file system. Images of all currently active processes are stored in the /proc file system by their PID.
Here’s an example. First, a process is identified—in this example, the current Korn shell for the user pwatters:
# ps -eaf | grep pwatters pwatters 310 291 0 Mar 20 ? 0:04 /usr/openwin/bin/Xsun pwatters 11959 11934 0 09:21:42 pts/1 0:00 grep pwatters pwatters 11934 11932 1 09:20:50 pts/1 0:00 -ksh
Now that you have a target PID (11934), you can change to the /proc/11934 directory and you will be able to view the image of this process:
# cd /proc/11934 # ls -l total 3497 -rw------- 1 pwatters other 1769472 Mar 30 09:20 as -r-------- 1 pwatters other 152 Mar 30 09:20 auxv -r-------- 1 pwatters other 32 Mar 30 09:20 cred --w------- 1 pwatters other 0 Mar 30 09:20 ctl lr-x------ 1 pwatters other 0 Mar 30 09:20 cwd ->> dr-x------ 2 pwatters other 1184 Mar 30 09:20 fd -r--r--r-- 1 pwatters other 120 Mar 30 09:20 lpsinfo -r-------- 1 pwatters other 912 Mar 30 09:20 lstatus -r--r--r-- 1 pwatters other 536 Mar 30 09:20 lusage dr-xr-xr-x 3 pwatters other 48 Mar 30 09:20 lwp -r-------- 1 pwatters other 2016 Mar 30 09:20 map dr-x------ 2 pwatters other 544 Mar 30 09:20 object -r-------- 1 pwatters other 2552 Mar 30 09:20 pagedata -r--r--r-- 1 pwatters other 336 Mar 30 09:20 psinfo -r-------- 1 pwatters other 2016 Mar 30 09:20 rmap lr-x------ 1 pwatters other 0 Mar 30 09:20 root ->> -r-------- 1 pwatters other 1440 Mar 30 09:20 sigact -r-------- 1 pwatters other 1232 Mar 30 09:20 status -r--r--r-- 1 pwatters other 256 Mar 30 09:20 usage -r-------- 1 pwatters other 0 Mar 30 09:20 watch -r-------- 1 pwatters other 3192 Mar 30 09:20 xmap
Each of the directories with the name associated with the PID contains additional subdirectories, which contain state information and related control functions. In addition, a watchpoint facility is provided, which is responsible for controlling memory access.
Tip |
A series of proc tools are available to interpret the information contained in the /proc subdirectories. |
The proc tools are designed to operate on data contained within the /proc file system. Each utility takes a PID as its argument and performs operations associated with the PID. For example, the pflags command prints the flags and data model details for the PID in question.
For the preceding Korn shell example, you can easily print out this status information:
# /usr/proc/bin/pflags 29081 29081: /bin/ksh data model = _ILP32 flags = PR_ORPHAN /1: flags = PR_PCINVAL|PR_ASLEEP [ waitid(0x7,0x0,0x804714c,0x7) ]
You can also print the credential information for this process, including the effective and real UID and GID of the process owner, by using the pcred command:
$ /usr/proc/bin/pcred 29081 29081: e/r/sUID=100 e/r/sGID=10
Here, both the effective and the real UID is 100 (user pwatters), and the effective and real GID is 10 (group staff).
To examine the address space map of the target process, you can use the pmap command and all of the libraries it requires to execute:
# /usr/proc/bin/pmap 29081 29081: /bin/ksh 08046000 8K read/write/exec [ stack ] 08048000 160K read/exec /usr/bin/ksh 08070000 8K read/write/exec /usr/bin/ksh 08072000 28K read/write/exec [ heap ] DFAB4000 16K read/exec /usr/lib/locale/en_AU/en_AU.so.2 DFAB8000 8K read/write/exec /usr/lib/locale/en_AU/en_AU.so.2 DFABB000 4K read/write/exec [ anon ] DFABD000 12K read/exec /usr/lib/libmp.so.2 DFAC0000 4K read/write/exec /usr/lib/libmp.so.2 DFAC4000 552K read/exec /usr/lib/libc.so.1 DFB4E000 24K read/write/exec /usr/lib/libc.so.1 DFB54000 8K read/write/exec [ anon ] DFB57000 444K read/exec /usr/lib/libnsl.so.1 DFBC6000 20K read/write/exec /usr/lib/libnsl.so.1 DFBCB000 32K read/write/exec [ anon ] DFBD4000 32K read/exec /usr/lib/libsocket.so.1 DFBDC000 8K read/write/exec /usr/lib/libsocket.so.1 DFBDF000 4K read/exec /usr/lib/libdl.so.1 DFBE1000 4K read/write/exec [ anon ] DFBE3000 100K read/exec /usr/lib/ld.so.1 DFBFC000 12K read/write/exec /usr/lib/ld.so.1 total 1488K
It’s always surprising to see how many libraries are loaded when an application is executed, especially something as complicated as a shell, leading to a total of 1488KB memory used. You can obtain a list of the dynamic libraries linked to each process by using the pldd command:
# /usr/proc/bin/pldd 29081 29081: /bin/ksh /usr/lib/libsocket.so.1 /usr/lib/libnsl.so.1 /usr/lib/libc.so.1 /usr/lib/libdl.so.1 /usr/lib/libmp.so.2 /usr/lib/locale/en_AU/en_AU.so.2
As discussed in the previous section “Sending Signals,” signals are the way in which processes communicate with each other, and they can also be used from shells to communicate with spawned processes (usually to suspend or kill them).
By using the psig command, it is possible to list the signals associated with each process:
$ /usr/proc/bin/psig 29081 29081: /bin/ksh HUP caught RESTART INT caught RESTART QUIT ignored ILL caught RESTART TRAP caught RESTART ABRT caught RESTART EMT caught RESTART FPE caught RESTART KILL default BUS caught RESTART SEGV default SYS caught RESTART PIPE caught RESTART ALRM caught RESTART TERM ignored USR1 caught RESTART USR2 caught RESTART CLD default NOCLDSTOP PWR default WINCH default URG default POLL default STOP default TSTP ignored CONT default TTIN ignored TTOU ignored VTALRM default PROF default XCPU caught RESTART XFSZ ignored WAITING default LWP default FREEZE default THAW default CANCEL default LOST default RTMIN default RTMIN+1 default RTMIN+2 default RTMIN+3 default RTMAX-3 default RTMAX-2 default RTMAX-1 default RTMAX default
It is also possible to print a hexadecimal format stack trace for the lightweight process (LWP) in each process by using the pstack command. This can be useful in the same way that the truss command was used:
$ /usr/proc/bin/pstack 29081 29081: /bin/ksh dfaf5347 waitid (7, 0, 804714c, 7) dfb0d9db _waitPID (ffffffff, 8047224, 4) + 63 dfb40617 waitPID (ffffffff, 8047224, 4) + 1f 0805b792 job_wait (719d) + 1ae 08064be8 sh_exec (8077270, 14) + af0 0805e3a1 ???????? () 0805decd main (1, 8047624, 804762c) + 705 0804fa78 ???????? ()
Perhaps the most commonly used proc tool is the pfiles command, which displays all of the open files for each process. This is useful for determining operational dependencies between data files and applications:
$ /usr/proc/bin/pfiles 29081 29081: /bin/ksh Current rlimit: 64 file descriptors 0: S_IFCHR mode:0620 dev:102,0 ino:319009 UID:6049 GID:7 rdev:24,8 O_RDWR|O_LARGEFILE 1: S_IFCHR mode:0620 dev:102,0 ino:319009 UID:6049 GID:7 rdev:24,8 O_RDWR|O_LARGEFILE 2: S_IFCHR mode:0620 dev:102,0 ino:319009 UID:6049 GID:7 rdev:24,8 O_RDWR|O_LARGEFILE 63: S_IFREG mode:0600 dev:174,2 ino:990890 UID:6049 GID:1 size:3210 O_RDWR|O_APPEND|O_LARGEFILE FD_CLOEXEC
In addition, it is possible to obtain the current working directory of the target process by using the pwdx command:
$ /usr/proc/bin/pwdx 29081 29081: /home/paul
If you need to examine the process tree for all parent and child processes containing the target PID, you can use the ptree command. This is useful for determining dependencies between processes that are not apparent by consulting the process list:
$ /usr/proc/bin/ptree 29081 247 /usr/dt/bin/dtlogin -daemon 28950 /usr/dt/bin/dtlogin -daemon 28972 /bin/ksh /usr/dt/bin/Xsession 29012 /usr/dt/bin/sdt_shell -c unset DT; DISPLAY=lion:0; 29015 ksh -c unset DT; DISPLAY=lion:0; /usr/dt/bin/dt 29026 /usr/dt/bin/dtsession 29032 dtwm 29079 /usr/dt/bin/dtterm 29081 /bin/ksh 29085 /usr/local/bin/bash 29230 /usr/proc/bin/ptree 29081
Here, ptree has been executed from the Bourne again shell (bash), which was started from the Korn shell (ksh), spawned from the dtterm terminal window, which was spawned from the dtwm window manager, and so on.
Tip |
Although many of these proc tools will seem obscure, they are often very useful when trying to debug process-related application errors, especially in large applications like database management systems. |
lsof stands for "list open files" and lists information about files that are currently opened by the active processes running on Solaris. It is not included in the Solaris distribution; however, the current version can always be downloaded from ftp://vic.cc.purdue.edu/pub/tools/unix/lsof. Keep in mind that lsof is very sensitive to changes in OS releases, and recompilation may be necessary between Solaris 8 and 9.
What can you use lsof for? The answer largely depends on how many problems you encounter that relate to processes and files. Often, administrators are interested in knowing which processes are currently using a target file or files from a particular directory. This can occur when a file is locked by one application but is required by another application (again, a database system’s data files are one example where this might happen, if two database instances attempt to write to the files at once). If you know the path to a file of interest, you can use lsof to determine which processes are using files in that directory.
To examine the processes that are using files in the /tmp file system, use this:
$ lsof /tmp COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME ssion 338 pwatters txt VREG 0,1 271596 471638794 /tmp (swap) (unknown) 345 pwatters txt VREG 0,1 271596 471638794 /tmp (swap) le 2295 pwatters txt VREG 0,1 271596 471638794 /tmp (swap) le 2299 pwatters txt VREG 0,1 271596 471638794 /tmp (swap)
Obviously, there’s a bug in the routines that obtain the command name (the first four characters are missing!), but since the PID is correct, this is enough information to identify the four applications that are currently using files in /tmp. For example, dtsession (PID 338) manages the CDE session for the user pwatters, who is using a temporary text file in the /tmp directory. Later versions of lsof have fixed this bug.
Another common problem that lsof is used for, with respect to the /tmp file system, is the identification of processes that continue to write to unlinked files: thus space is being consumed, but it may appear that no files are growing any larger! This confusing activity can be traced back to a process by using lsof. However, rather than using lsof on the /tmp directory directly, you would need to examine the root directory ("/") on which /tmp is mounted. After finding the process that is writing to an open file, the process can be killed. If the size of a file is changing across several different sampling epochs (for example, by running the command once a minute), you’ve probably found the culprit:
# lsof / COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME (unknown) 1 root txt VREG 102,0 446144 118299 / (/dev/dsk/c0d0s0) (unknown) 1 root txt VREG 102,0 4372 293504 / (/dev/dsk/c0d0s0) (unknown) 1 root txt VREG 102,0 173272 293503 / (/dev/dsk/c0d0s0) sadm 62 root txt VREG 102,0 954804 101535 / (/dev/dsk/c0d0s0) sadm 62 root txt VREG 102,0 165948 101569 / (/dev/dsk/c0d0s0) sadm 62 root txt VREG 102,0 16132 100766 / (/dev/dsk/c0d0s0) sadm 62 root txt VREG 102,0 8772 100765 / (/dev/dsk/c0d0s0) sadm 62 root txt VREG 102,0 142652 101571 / (/dev/dsk/c0d0s0)
One of the restrictions on mounting a file system is that you can’t unmount that file system if files are open on it: if files are open on a file system and it is dismounted, any changes made to the files may not be saved, resulting in data loss. Looking at a process list may not always reveal which processes are opening which files, and this can be very frustrating if Solaris refuses to unmount a file system because some files are open. Again, lsof can be used to identify the processes that are opening files on a specific file system.
The first step is to consult the output of the df command to obtain the names of currently mounted file systems:
$ df -k Filesystem kbytes used avail capacity Mounted on /proc 0 0 0 0% /proc /dev/dsk/c0d0s0 2510214 929292 1530718 38% / fd 0 0 0 0% /dev/fd /dev/dsk/c0d0s3 5347552 183471 5110606 4% /usr/local swap 185524 12120 173404 7% /tmp
If you wanted to unmount the /dev/dsk/c0d0s3 file system, but you were prevented from doing so because of open files, you can obtain a list of all open files under /usr/local by using this command:
$ lsof /dev/dsk/c0d0s3 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME httpd 981 root txt VREG 102,3 1747168 457895 /usr/local httpd 982 root txt VREG 102,3 333692 56455 /usr/local httpd 983 root txt VREG 102,3 333692 56455 /usr/local httpd 984 root txt VREG 102,3 333692 56455 /usr/local javac 985 root txt VREG 102,3 333692 56455 /usr/local httpd 986 root txt VREG 102,3 333692 56455 /usr/local httpd 987 root txt VREG 102,3 333692 56455 /usr/local httpd 988 root txt VREG 102,3 333692 56455 /usr/local httpd 989 root txt VREG 102,3 333692 56455 /usr/local httpd 990 root txt VREG 102,3 333692 56455 /usr/local
Obviously, all of these processes will need to stop using the open files before the file system can be unmounted. If you’re not sure where a particular command is running from, or on which file system its data files are stored, you can also use lsof to check open files by passing the PID on the command line. First, you need to identify a PID by using the ps command:
$ ps -eaf | grep apache nobody 4911 4905 0 Mar 22 ? 0:00 /usr/local/apache/bin/httpd nobody 4910 4905 0 Mar 22 ? 0:00 /usr/local/apache/bin/httpd nobody 4912 4905 0 Mar 22 ? 0:00 /usr/local/apache/bin/httpd nobody 4905 1 0 Mar 22 ? 0:00 /usr/local/apache/bin/httpd nobody 4907 4905 0 Mar 22 ? 0:00 /usr/local/apache/bin/httpd nobody 4908 4905 0 Mar 22 ? 0:00 /usr/local/apache/bin/httpd nobody 4913 4905 0 Mar 22 ? 0:00 /usr/local/apache/bin/httpd nobody 4909 4905 0 Mar 22 ? 0:00 /usr/local/apache/bin/httpd nobody 4906 4905 0 Mar 22 ? 0:00 /usr/local/apache/bin/httpd
Now examine the process 4905 for Apache to see what files are currently being opened by it:
$ lsof -p 4905 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME d 4905 nobody txt VREG 102,3 333692 56455 /usr/local (/dev/dsk/c0d0s3) d 4905 nobody txt VREG 102,0 17388 100789 / (/dev/dsk/c0d0s0) d 4905 nobody txt VREG 102,0 954804 101535 / (/dev/dsk/c0d0s0) d 4905 nobody txt VREG 102,0 693900 101573 / (/dev/dsk/c0d0s0) d 4905 nobody txt VREG 102,0 52988 100807 / (/dev/dsk/c0d0s0) d 4905 nobody txt VREG 102,0 4396 100752 / (/dev/dsk/c0d0s0) d 4905 nobody txt VREG 102,0 175736 100804 / (/dev/dsk/c0d0s0)
Apache obviously has a number of open files!
The following table summarizes the main options used with ps.
Option |
Description |
---|---|
-a |
Lists most frequently requested processes. |
-A, -e |
Lists all processes. |
-c |
List processes in scheduler format. |
-d |
List all processes. |
-f |
Prints comprehensive process information. |
-g |
Prints process information on a group basis for a single group. |
-G |
Prints process information on a group basis for a list of groups. |
-j |
Includes SID and PGID in printout. |
-l |
Prints complete process information. |
-L |
Displays LWP details. |
-p |
Lists process details for a list of specified processes. |
-P |
Lists the CPU ID to which a process is bound. |
-s |
Lists session leaders. |
-t |
Lists all processes associated with a specific terminal. |
-u |
Lists all processes for a specific user. |
The following table summarizes the main signals used to communicate with processes using kill.
Signal |
Code |
Action |
Description |
---|---|---|---|
SIGHUP |
1 |
Exit |
Hang up |
SIGINT |
2 |
Exit |
Interrupt |
SIGQUIT |
3 |
Core |
Quit |
SIGILL |
4 |
Core |
Illegal instruction |
SIGTRAP |
5 |
Core |
Trace |
SIGABRT |
6 |
Core |
Abort |
SIGEMT |
7 |
Core |
Emulation trap |
SIGFPE |
8 |
Core |
Arithmetic exception |
SIGKILL |
9 |
Exit |
Killed |
SIGBUS |
10 |
Core |
Bus error |
SIGSEGV |
11 |
Core |
Segmentation fault |
SIGSYS |
12 |
Core |
Bad system call |
SIGPIPE |
13 |
Exit |
Broken pipe |
SIGALRM |
14 |
Exit |
Alarm clock |
SIGTERM |
15 |
Exit |
Terminate |
The pgrep command is used to search for a list of processes whose names match a pattern specified on the command line. The command returns a list of corresponding PIDs. This list can then be piped to another command, such as kill, to perform some action on the processes or send them a signal.
For example, to kill all processes associated with the name “java,” the following command would be used:
$ kill -9 `pgrep java`
The pkill command can be used to send signals to processes that have the same name. It is a more specific version of ?, since it can be used only to send signals, and the list of PIDs cannot be piped to another program.
To kill all processes associated with the name “java,” the following command would be used:
$ pkill -9 java
The killall command is used to kill all processes running on a system. It is called by shutdown when the system is being brought to run level 0. However, since a signal can be passed to the killall command, it is possible for a superuser to send a different signal (other than 15) to all processes. For example, to send a SIGHUP signal to all processes, the following command could be used:
# killall 1