Solaris provides a centralized auditing service known as system accounting. This service is very useful for accounting for the various tasks that your system may be involved in—you can use it to monitor resource usage, troubleshoot system failures, isolate bottlenecks in the system, and assist in system security. In addition, system accounting acts as a real accounting service, and you can use it for billing in the commercial world. In this section, we review the major components of system accounting, including several applications and scripts that are responsible for preparing daily reports on connections, process, and disk load and usage statements for users.
Tip |
Once you enable the appropriate script in /etc/init.d, system accounting does not typically involve administrator intervention. |
Collecting data for accounting is simple: Create a startup script (/etc/rc2.d/S22acct) in order to begin collecting data soon after the system enters multiuser mode and optionally create a kill script (/etc/rc0.d/K22acct) to turn off data collection cleanly before the system shuts down. As per standard System V practice, you should create a single script in /etc/init.d (for example, /etc/init.d/accounting) and link it symbolically to both of those filenames, thus ensuring that both a start and a stop parameter can be interpreted by the script. When accounting is enabled, details of processes, storage, and user activity are recorded in specially created log files, which are then processed daily by the /usr/lib/acct/runacct program. The output from runacct is also processed by /usr/lib/acct/prdaily (which generates the reports described in the next section). There is also a separate monthly billing program called /usr/lib/acct/monacct, which is executed monthly and generates accounts for individual users.
The accounting file startup and shutdown script should look like this:
parameter=$1 case $parameter in 'start') echo "Initializing process accounting" /usr/lib/acct/startup ;; 'stop') echo "Halting process accounting" /usr/lib/acct/shutacct ;; esac
When called with the start parameter, this script executes another script, /usr/lib/acct/startup, which is responsible for executing the /usr/lib/acct/acctwtmp program—which sets up record-writing utilities in the /var/adm/wtmp file. It then starts a script called turnacct, which is called with the name of the file in which the kernel records the process accounting details (usually named /var/adm/pacct). Finally, the startup section of the script removes all the temporary files associated with previous accounting activities.
Once you have enabled data collection, generating reports is a simple matter of setting up a cron job for a nonprivileged user (usually adm), typically at a time of low system load. In the following example, accounting runs are performed at 6:00 A.M.:
0 6 * * * /usr/lib/acct/runacct 2>> /var/adm/acct/nite/fd2log
Accounting runs involve several discrete stages, which are executed in the following order:
SETUP |
Prepares accounting files for running the report. |
WTMPFIX |
Checks the soundness of the wtmpx file and repairs it, if necessary. |
CONNECT |
Gathers data for user connect time. |
PROCESS |
Gathers data for process usage. |
MERGE |
Integrates the connection and process data. |
FEES |
Gathers fee information and applies to connection and process data. |
DISK |
Gathers data on disk usage and integrates with fee, connection, and process data. |
MERGETACCT |
Integrates accounting data for the past 24 hours (daytacct) with the total accounting data (/var/adm/acct/sum/tacct). |
CMS |
Generates command summaries. |
CLEANUP |
Removes transient data and cleans up before terminating. |
After each stage of runacct has been successfully completed, the statefile (/var/adm/acct/nite/statefile) is overwritten with the name of that stage. Thus, if the accounting is disrupted for any reason, it can be easily resumed by rereading the statefile. On January 23rd, if the statefile contained FEES but terminated during DISK, you could restart the accounting run for the day by using the following command:
# runacct 2301 DISK >>>> /var/adm/acct/nite/fd2log
Once the daily run has been completed, the lastdate file is updated with the current date in ddmm format, where dd is the day and mm is the month of the last run. In addition, you can review a number of files manually to obtain usage summaries. For example, the daily report is stored in a file called rprtddmm, where dd is the day and mm is the month of the run. This contains the cms and lastlogin data, as well as a connection usage summary:
Jan 26 02:05 2002 DAILY REPORT FOR johnson Page 1 from Fri Jan 25 02:05:23 2002 to Sat Jan 26 02:05:54 2002 TOTAL DURATION IS 46 MINUTES LINE MINUTES PERCENT # SESS # ON # OFF /dev/pts/1 0 0 0 0 0 pts/1 46 0 8 8 8 TOTALS 46 -- 8 8 8
Here you can see that the total connection time for the previous day was 46 minutes.
The loginlog file contains a list of the last login dates for all local users. Some system accounts appear as never having logged in, which is expected:
00-00-00 adm 00-00-00 bin 00-00-00 daemon 00-00-00 listen 00-00-00 lp 00-00-00 noaccess 00-00-00 nobody 00-00-00 nuucp 00-00-00 smtp 00-00-00 sys 02-01-20 root 02–01–26 pwatters
You should check the loginlog file for access to system accounts, which should never be accessed, and for unexpected usage of user accounts.
A typical command summary (cms) statement generated by the runacct program is shown in Table 20-1.
|
Once you know what each column in this report represents, it becomes obvious that in this example, reading, sending, and receiving mail are the main uses of this server, on a daily basis at least, while the runacct command, which actually performs the accounting, was one of the least-used programs. Here is an explanation of the columns in the preceding report:
COMMAND NAME Shows the command as executed. This can lead to some ambiguity, because different commands could have the same filename. In addition, any shell or Perl scripts executed would be displayed under the shell and Perl interpreter, respectively, rather than showing up as a process on their own.
NUMBER CMNDS Displays the number of times that the command named under COMMAND NAME was executed during the accounting period.
TOTAL KCOREMIN Shows the cumulative sum of memory segments (in kilobytes) used by the process identified under COMMAND NAME per minute of execution time.
TOTAL CPU-MIN Prints the accumulated processing time for the program named under COMMAND NAME.
TOTAL REAL-MIN Shows the actual time in minutes that the program named in COMMAND NAME consumed during the accounting period.
MEAN SIZE-K Indicates the average of the cumulative sum of consumed memory segments (TOTAL KCOREMIN) over the set of invocations denoted by NUMBER CMDS.
MEAN CPU-MIN The average CPU time computed from the quotient of NUMBER CMDS divided by TOTAL CPU-MIN.
HOG FACTOR The amount of CPU time divided by actual elapsed time. This ratio indicates the degree to which a system is available compared to its use. The hog factor is often used as a metric to determine overall load levels for a system, and it is useful for planning upgrades and expansion.
CHARS TRNSFRD Displays the sum of the characters transferred by system calls.
BLOCKS READ Shows the number of physical block reads and writes that the program named under COMMAND NAME accounted for.
Often, the values of these parameters are confusing. For example, let’s compare the characteristics of pine, which is a mail client, and sendmail, which is a mail transport agent. pine was executed only five times, but accounted for 1426.41 KCOREMIN, while sendmail was executed 171 times with a KCOREMIN of 176.44. The explanation for this apparent anomaly is that users probably log in once in the morning and leave their pine mail client running all day. The users sent an average of 34.2 messages during this day, many of which contained attachments—thus accounting for the high CPU overhead.
When examined over a number of days, accounting figures provide a useful means of understanding how processes are making use of the system’s resources. When examined in isolation, however, they can sometimes misrepresent the dominant processes that the machine is used for. This is a well-known aspect of statistical sampling: Before you can make any valid generalizations about a phenomenon, your observations must be repeated and sampled randomly. Thus, it is useful to compare the day-to-day variation of a system’s resource use with the monthly figures that are generated by /usr/lib/acct/monacct. Compare these daily values with the previous month’s values generated by monacct in Table 20-2.
|
As you can see in Table 20-2, the individual day’s figures were misleading. In fact, spread over a whole month, the Netscape program tended to use more resources than the pine mail client, being invoked 1,538 times, and using 163985.79 KCOREMIN, compared to 165 invocations and 43839.27 KCOREMIN for pine.
Tip |
It is very useful to examine monthly averages for a more reliable, strategic overview of system activity, while daily summaries are useful for making tactical decisions about active processes. |
In the previous section, we looked at the output for monacct, which is the monthly accounting program. To enable monacct, you need to create a cron job for the adm account, which is similar to the entry for the runacct command in the previous section:
0 5 1 * * /usr/lib/acct/monacct
In addition to computing per-process statistics, monacct also computes usage information on a per-user basis, which you can use to bill customers according to the number of CPU minutes they used. Examine the user reports in Table 20-3 for the same month that was reviewed in the previous section.
|
Of the nonsystem users, obviously pwatters is going to have a large bill this month, with 65 prime CPU minutes consumed. Billing could also proceed on the basis of KCOREMINS utilized; pwatters, in this case, used 104572 KCOREMINS. How an organization bills its users is probably already well established, but even if users are not billed for cash payment, examining how the system is used is very valuable for planning expansion and for identifying rogue processes that reduce the availability of a system for legitimate processes.