Performance

Measuring performance is a necessary task to determine whether current utilization levels require a system to be upgraded and/or whether user applications and system services are executing as quickly and efficiently as possible. Solaris provides a wide variety of tools to tune and monitor the operation of individual devices and core system elements, and other tools that can be applied to improve performance. These tools work with the kernel, disk, memory, network, compilers, applications, and system services. An alternative to the tools provided with Solaris is to use the SymbEL tools developed by Adrian Cockroft and Richard Pettit (www.sun.com/sun-on-net/performance/se3), which are fully described in their book, Sun Performance and Tuning, published by Sun Microsystems Press (1998). In this chapter, we examine how to use some of the standard Solaris tools to monitor performance, identify performance issues and bottlenecks, and implement new settings.

Collecting Performance Data

The following applications are commonly used to measure system performance:

iostat	Collects data about input/output operations for CPUs, disks, terminals, and tapes from the command line.
vmstat	Collects data on virtual memory performance from the command line and prints a summary.
mpstat	Breaks down CPU usage per operation type.
sar	Runs through cron or the command line to collect statistics on disk, tape, CPU, buffering, input/output, system calls, interprocess communication, and many other variables.

The following sections examine how each of these commands is used.

iostat

The kernel maintains low-level counters to measure various operations, which you can access by using iostat. When you first execute it, iostat reports statistics gathered since booting. Subsequently, the difference between the first report and the current state is reported for all statistics. Thus, when you run it at regular intervals (such as each minute), you can obtain high-resolution samples for establishing system performance within a specific epoch by using iostat. This can be very useful for gaining an accurate picture of how system resources are allocated.

To display disk usage statistics, the following command produces 10 reports over epochs of 60 seconds:

# iostat -x 60 10
device r/s w/s kr/s kw/s wait actv svc_ t %w %b
sd0    0.2 0.4 12.2 9.0  1.0  2.0  38.6   0  1
...
device r/s w/s kr/s kw/s wait actv svc_t %w %b
sd0    0.3 0.3 12.5 8.0  2.0  1.0  33.2   0  1
...

Let’s review what each column indicates for the disk device:

device	Shows the device name (sd1 indicates a disk).
r/s	Displays the number of disk reads per second.
w/s	Prints the number of disk writes per second.
kr/s	Shows the total amount of data read per second (in kilobytes).
kw/s	Displays the total amount of data written per second (in kilobytes).
wait	Prints the mean number of waiting transactions.
actv	Shows the mean number of transactions being processed.
svc_t	Displays the mean period for service in milliseconds.
%w	Prints the percentage of time spent waiting.
%b	Shows the percentage of time that the disk is working.

To display statistics for the CPU at second intervals 20 times, you could use the following command:

# iostat –c 1 20

The output would display four columns, showing user time, system time, I/O wait, and idle time, respectively, in percentage terms.

vmstat

One of the greatest performance issues in system tuning is virtual memory capacity and performance. Obviously, if your server is using large amounts of swap, running off a slow disk, the time to perform various operations will increase. One application that reports on the current state of virtual memory is the vmstat command, which displays a large collection of statistics concerning virtual memory performance. As you can see from the following display, the virtual memory report on the server is not encouraging: 1,346,736,431 total address translation faults were recorded, as well as 38,736,546 major faults, 1,346,736,431 minor faults, and 332,163,181 copy-on-write faults. This suggests that more virtual memory is required to support operations, or at least, the disk on which the swap partition is placed should be upgraded to 10,000 rpm:

# vmstat -s
          253 swap ins
      237 swap outs
      253 pages swapped in
   705684 pages swapped out
1346736431 total address trans. faults taken
 56389345 page ins
 23909231 page outs
152308597 pages paged in
 83982504 pages paged out
 26682276 total reclaims
 26199677 reclaims from free list
        0 micro (hat) faults
1346736431 minor (as) faults
 38736546 major faults
332163181 copy-on-write faults
316702360 zero fill page faults
 99616426 pages examined by the clock daemon
      782 revolutions of the clock hand
126834545 pages freed by the clock daemon
 14771875 forks
  3824010 vforks
 29303326 execs
160142153 cpu context switches
2072002374 device interrupts
3735561061 traps
2081699655 system calls
1167634213 total name lookups (cache hits 70%)
 46612294 toolong
964665958 user   cpu
399229996 system cpu
1343911025 idle   cpu
227505892 wait   cpu

mpstat

Another factor influencing performance is the system load—obviously, on a system that runs a large number of processes and consistently has a load of greater than 1.0 cannot be relied upon to give adequate performance in times of need. You can use the mpstat command to examine a number of system parameters, including the system load, over a number of regular intervals. Many administrators take several hundred samples using mpstat and compute an average system load for specific times of the day when a peak load is expected (for example, at 9:00 A.M.). This can greatly assist in capacity planning of CPUs to support expanding operations.

Tip

SPARC hardware architectures support large numbers of CPUs, so it’s not difficult to scale up to meet demand.

The output from mpstat contains several columns, which measure the following parameters:

Context switches
Cross-calls between CPUs
Idle percentage of CPU time
Interrupts
Minor and major faults
Sys percentage of CPU time
Thread migrations
User percentage of CPU time

For the server output shown next, the proportion of system time consumed is well below 100 percent—the peak value is 57 percent for only one of the CPUs in this dual-processor system. Sustained values of sys at or near the 100-percent level indicate that you should add more CPUs to the system:

# mpstat 5
CPU minf mjf xcal intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  0   46   1  250   39  260  162   94   35  104    0    75   31  14   8  47
  1   45   1   84  100  139  140   92   35  102    0    14   35  13   7  45
CPU minf mjf xcal intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  0  141   3  397  591  448  539  233   38  111    0 26914   64  35   1   0
  1  119   0 1136  426  136  390  165   40  132    0 21371   67  33   0   0
CPU minf mjf xcal intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  0    0   0  317  303  183  367  163   28   63    0  1110   94   6   0   0
  1    0   0    4  371  100  340  148   27   86    0 56271   43  57   0   0

sar

The sar command is the most versatile method for collecting system performance data. From the command line, it produces a number of snapshots of current system activity over a specified number of time intervals. Or, if you don’t specify an interval, the current day’s data extracted from sar’s regular execution by cron is used. For example, to display a summary of disk activity for the current day, you can use the following command:

# sar –d
SunOS 5.9 sun4u     01/25/02
09:54:33   device   %busy    avque   r+w/s  blk/s  avwait  avserv
             sd01      27      5.8       6      8    21.6    28.6
             sd03      17      2.4       4      7    14.2    21.2
             sd05      13      1.7       3      6     9.3    18.3
             sd06      35      6.9       8     10    25.7    31.8

In this example, you can see that several disk devices are shown with varying percentages of busy time, mean number of transaction requests in the queue, mean number of disk reads and writes per second, mean number of disk blocks written per second, mean time for waiting in the queue, and mean time for service in the queue.

When a new disk, memory, or CPU is added to the system, you should take a baseline sar report to determine the effect on performance. For example, after adding an 128MB of RAM on the system, you should be able to quantify the effect on mean system performance by comparing sar output before and after the event during a typical day’s workload.

Performance Tuning

In previous sections, we’ve examined how to use tools such as sar, vmstat, and iostat to measure system performance before and after key events such as adding new RAM or CPUs or upgrading disks to faster speeds. In addition to these hardware changes, it is possible to increase the performance of an existing system by tuning the kernel. This could involve switching from a 32-bit to a 64-bit kernel, if supported by hardware, and setting appropriate parameters for shared memory, semaphores, and message queues in /etc/system. However, note that the Solaris 9 kernel is self-tuning to some extent for normal operations. Once database servers with special requirements are installed, or many users must be supported on a single system, it may be necessary to tweak some parameters and reboot.

If a system is slow, the process list is the first place to look, as described in Chapter 8. One of the reasons that so much space is devoted to process management in this book is that it is often user processes, rather than system CPU time, that adversely impact system performance. The only time that kernel tuning will really assist is where shared memory and other parameters need to be adjusted for database applications, other large applications, or where system time for processes far exceeds the user time. This can generally be established by using the time command. We examine some commonly modified parameters in the /etc/system file shortly, which you can use to improve system performance. After you make changes to /etc/system, you need to reboot the system.

Note

If a syntax error is detected in /etc/system, the system may not be able to booted except with the boot –as command.

The first step in tuning the kernel is generally to set the maximum number of processes permitted per user to a sensible value. This is a hard limit that prevents individual users from circumventing limits imposed by quotas and nice values set by the superuser. To insert a maximum of 100 processes per user, you need to make the following entry in /etc/system:

set maxuprc=100

If you are running a database server, your manual will no doubt supply minimum requirements for shared memory for the server. Shared memory is memory that can be locked but can be shared between processes, thereby reducing overhead for memory allocation. You can set the following parameters to determine how shared memory is allocated:

shmmax	The peak shared memory amount.
shmmin	The smallest shared memory amount.
shmmni	The largest number of concurrent identifiers permitted.
shmseg	The quantity of segments permitted for each process.
semmap	The initial quantity of entries in the semaphore map.
semmni	The largest number of semaphore sets permitted.
semmns	The total number of semaphores permitted.
semmsl	The largest number of semaphores in each semaphore set.

The following example entry for /etc/system allocates 128MB of shared memory and sets other parameters appropriately:

set shmsys:shminfo_shmmax=134217728
set shmsys:shminfo_shmmin=100
set shmsys:shminfo_shmmni=100
set shmsys:shminfo_shmseg=100
set semsys:seminfo_semmap=125
set semsys:seminfo_semmni=250
set semsys:seminfo_semmns=250