Measuring performance is a necessary task to determine whether current utilization levels require a system to be upgraded and/or whether user applications and system services are executing as quickly and efficiently as possible. Solaris provides a wide variety of tools to tune and monitor the operation of individual devices and core system elements, and other tools that can be applied to improve performance. These tools work with the kernel, disk, memory, network, compilers, applications, and system services. An alternative to the tools provided with Solaris is to use the SymbEL tools developed by Adrian Cockroft and Richard Pettit (www.sun.com/sun-on-net/performance/se3), which are fully described in their book, Sun Performance and Tuning, published by Sun Microsystems Press (1998). In this chapter, we examine how to use some of the standard Solaris tools to monitor performance, identify performance issues and bottlenecks, and implement new settings.
The following applications are commonly used to measure system performance:
iostat |
Collects data about input/output operations for CPUs, disks, terminals, and tapes from the command line. |
vmstat |
Collects data on virtual memory performance from the command line and prints a summary. |
mpstat |
Breaks down CPU usage per operation type. |
sar |
Runs through cron or the command line to collect statistics on disk, tape, CPU, buffering, input/output, system calls, interprocess communication, and many other variables. |
The following sections examine how each of these commands is used.
The kernel maintains low-level counters to measure various operations, which you can access by using iostat. When you first execute it, iostat reports statistics gathered since booting. Subsequently, the difference between the first report and the current state is reported for all statistics. Thus, when you run it at regular intervals (such as each minute), you can obtain high-resolution samples for establishing system performance within a specific epoch by using iostat. This can be very useful for gaining an accurate picture of how system resources are allocated.
To display disk usage statistics, the following command produces 10 reports over epochs of 60 seconds:
# iostat -x 60 10 device r/s w/s kr/s kw/s wait actv svc_ t %w %b sd0 0.2 0.4 12.2 9.0 1.0 2.0 38.6 0 1 ... device r/s w/s kr/s kw/s wait actv svc_t %w %b sd0 0.3 0.3 12.5 8.0 2.0 1.0 33.2 0 1 ...
Let’s review what each column indicates for the disk device:
device |
Shows the device name (sd1 indicates a disk). |
r/s |
Displays the number of disk reads per second. |
w/s |
Prints the number of disk writes per second. |
kr/s |
Shows the total amount of data read per second (in kilobytes). |
kw/s |
Displays the total amount of data written per second (in kilobytes). |
wait |
Prints the mean number of waiting transactions. |
actv |
Shows the mean number of transactions being processed. |
svc_t |
Displays the mean period for service in milliseconds. |
%w |
Prints the percentage of time spent waiting. |
%b |
Shows the percentage of time that the disk is working. |
To display statistics for the CPU at second intervals 20 times, you could use the following command:
# iostat –c 1 20
The output would display four columns, showing user time, system time, I/O wait, and idle time, respectively, in percentage terms.
One of the greatest performance issues in system tuning is virtual memory capacity and performance. Obviously, if your server is using large amounts of swap, running off a slow disk, the time to perform various operations will increase. One application that reports on the current state of virtual memory is the vmstat command, which displays a large collection of statistics concerning virtual memory performance. As you can see from the following display, the virtual memory report on the server is not encouraging: 1,346,736,431 total address translation faults were recorded, as well as 38,736,546 major faults, 1,346,736,431 minor faults, and 332,163,181 copy-on-write faults. This suggests that more virtual memory is required to support operations, or at least, the disk on which the swap partition is placed should be upgraded to 10,000 rpm:
# vmstat -s 253 swap ins 237 swap outs 253 pages swapped in 705684 pages swapped out 1346736431 total address trans. faults taken 56389345 page ins 23909231 page outs 152308597 pages paged in 83982504 pages paged out 26682276 total reclaims 26199677 reclaims from free list 0 micro (hat) faults 1346736431 minor (as) faults 38736546 major faults 332163181 copy-on-write faults 316702360 zero fill page faults 99616426 pages examined by the clock daemon 782 revolutions of the clock hand 126834545 pages freed by the clock daemon 14771875 forks 3824010 vforks 29303326 execs 160142153 cpu context switches 2072002374 device interrupts 3735561061 traps 2081699655 system calls 1167634213 total name lookups (cache hits 70%) 46612294 toolong 964665958 user cpu 399229996 system cpu 1343911025 idle cpu 227505892 wait cpu
Another factor influencing performance is the system load—obviously, on a system that runs a large number of processes and consistently has a load of greater than 1.0 cannot be relied upon to give adequate performance in times of need. You can use the mpstat command to examine a number of system parameters, including the system load, over a number of regular intervals. Many administrators take several hundred samples using mpstat and compute an average system load for specific times of the day when a peak load is expected (for example, at 9:00 A.M.). This can greatly assist in capacity planning of CPUs to support expanding operations.
Tip |
SPARC hardware architectures support large numbers of CPUs, so it’s not difficult to scale up to meet demand. |
The output from mpstat contains several columns, which measure the following parameters:
Context switches
Cross-calls between CPUs
Idle percentage of CPU time
Interrupts
Minor and major faults
Sys percentage of CPU time
Thread migrations
User percentage of CPU time
For the server output shown next, the proportion of system time consumed is well below 100 percent—the peak value is 57 percent for only one of the CPUs in this dual-processor system. Sustained values of sys at or near the 100-percent level indicate that you should add more CPUs to the system:
# mpstat 5 CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 0 46 1 250 39 260 162 94 35 104 0 75 31 14 8 47 1 45 1 84 100 139 140 92 35 102 0 14 35 13 7 45 CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 0 141 3 397 591 448 539 233 38 111 0 26914 64 35 1 0 1 119 0 1136 426 136 390 165 40 132 0 21371 67 33 0 0 CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 0 0 0 317 303 183 367 163 28 63 0 1110 94 6 0 0 1 0 0 4 371 100 340 148 27 86 0 56271 43 57 0 0
The sar command is the most versatile method for collecting system performance data. From the command line, it produces a number of snapshots of current system activity over a specified number of time intervals. Or, if you don’t specify an interval, the current day’s data extracted from sar’s regular execution by cron is used. For example, to display a summary of disk activity for the current day, you can use the following command:
# sar –d SunOS 5.9 sun4u 01/25/02 09:54:33 device %busy avque r+w/s blk/s avwait avserv sd01 27 5.8 6 8 21.6 28.6 sd03 17 2.4 4 7 14.2 21.2 sd05 13 1.7 3 6 9.3 18.3 sd06 35 6.9 8 10 25.7 31.8
In this example, you can see that several disk devices are shown with varying percentages of busy time, mean number of transaction requests in the queue, mean number of disk reads and writes per second, mean number of disk blocks written per second, mean time for waiting in the queue, and mean time for service in the queue.
When a new disk, memory, or CPU is added to the system, you should take a baseline sar report to determine the effect on performance. For example, after adding an 128MB of RAM on the system, you should be able to quantify the effect on mean system performance by comparing sar output before and after the event during a typical day’s workload.
In previous sections, we’ve examined how to use tools such as sar, vmstat, and iostat to measure system performance before and after key events such as adding new RAM or CPUs or upgrading disks to faster speeds. In addition to these hardware changes, it is possible to increase the performance of an existing system by tuning the kernel. This could involve switching from a 32-bit to a 64-bit kernel, if supported by hardware, and setting appropriate parameters for shared memory, semaphores, and message queues in /etc/system. However, note that the Solaris 9 kernel is self-tuning to some extent for normal operations. Once database servers with special requirements are installed, or many users must be supported on a single system, it may be necessary to tweak some parameters and reboot.
If a system is slow, the process list is the first place to look, as described in Chapter 8. One of the reasons that so much space is devoted to process management in this book is that it is often user processes, rather than system CPU time, that adversely impact system performance. The only time that kernel tuning will really assist is where shared memory and other parameters need to be adjusted for database applications, other large applications, or where system time for processes far exceeds the user time. This can generally be established by using the time command. We examine some commonly modified parameters in the /etc/system file shortly, which you can use to improve system performance. After you make changes to /etc/system, you need to reboot the system.
Note |
If a syntax error is detected in /etc/system, the system may not be able to booted except with the boot –as command. |
The first step in tuning the kernel is generally to set the maximum number of processes permitted per user to a sensible value. This is a hard limit that prevents individual users from circumventing limits imposed by quotas and nice values set by the superuser. To insert a maximum of 100 processes per user, you need to make the following entry in /etc/system:
set maxuprc=100
If you are running a database server, your manual will no doubt supply minimum requirements for shared memory for the server. Shared memory is memory that can be locked but can be shared between processes, thereby reducing overhead for memory allocation. You can set the following parameters to determine how shared memory is allocated:
shmmax |
The peak shared memory amount. |
shmmin |
The smallest shared memory amount. |
shmmni |
The largest number of concurrent identifiers permitted. |
shmseg |
The quantity of segments permitted for each process. |
semmap |
The initial quantity of entries in the semaphore map. |
semmni |
The largest number of semaphore sets permitted. |
semmns |
The total number of semaphores permitted. |
semmsl |
The largest number of semaphores in each semaphore set. |
The following example entry for /etc/system allocates 128MB of shared memory and sets other parameters appropriately:
set shmsys:shminfo_shmmax=134217728 set shmsys:shminfo_shmmin=100 set shmsys:shminfo_shmmni=100 set shmsys:shminfo_shmseg=100 set semsys:seminfo_semmap=125 set semsys:seminfo_semmni=250 set semsys:seminfo_semmns=250