15.3 Measurements: What, Where, and How

Measuring performance is the key to improving performance. You need reliable metrics to gauge performance and effectively compare results before you begin tuning and after you make changes. Before getting into the specifics of measurements, let's look at a study that shows where bottlenecks tend to be.

15.3.1 An Instructive Analysis

A Mercury Interactive Corporation analysis of thousands of load tests on company web sites[1] found that enterprise performance problems come from four main areas: databases, web servers, application servers, and the network. Each area typically causes about a quarter of the performance problems.

[1] Drew Robb, "Stopping Web Performance Thieves," Datamation, June 24, 2002, http://itmanagement.earthweb.com/ecom/article/0,,11952_1370691,00.html.

The most common database problems were insufficient indexing, fragmented databases, out-of-date statistics, and faulty application design. Solutions included tuning the index, compacting the database, updating the database, and rewriting the application so the database server controlled the query process.

The most common web-server problems were poor design algorithms, incorrect configurations, poorly written code, memory problems, and overloaded CPUs.

The most common application-server problems were poor cache management, nonoptimized database queries, incorrect software configuration, and poor concurrent handling of client requests.

The most common network problems included inadequate bandwidth somewhere along the communication route, and undersized, misconfigured, or incompatible routers, switches, firewalls, and load balancers.

15.3.2 Suggested Measurements

The results from this useful study may help you focus on the most likely problems. However, not all bottlenecks are listed here, and even if yours are, pinpointing their precise location can be difficult. Taking these suggested measurements may help you isolate the main bottlenecks.

Note that different tools take different measurements, and it is not always possible to match one tool's measurements with another or with this list. For example, some tools cannot measure the time from the (simulated) user click, but might start measuring once they send the HTTP request. A typical J2EE monitoring tool either uses the application server's performance-monitoring API to get server-side information or automatically adds a measurement wrapper by using techniques like code injection.

JVM heap size

Eliminate memory leaks before undertaking other tuning to avoid wasted tuning effort. Eliminating memory leaks is absolutely necessary for J2EE applications, and bottlenecks can be changed (eliminated or added) when eliminating these leaks. See Section 2.5.

Total response time

Measure the time taken from presentation start to presentation completioni.e., from when the (simulated) user clicks the button to when the information is displayed.

Total server-side service time

Measure the total time it takes to service the request on the server. Try not to include transfer time to and from the client. You can obtain this measurement by wrapping the doGet( ) and doPost( ) servlet methods or by using a ServletFilter that logs execution times. Here is a simple filter:

public void doFilter(ServletRequest request, ServletResponse response,
                FilterChain chain) throws IOException, ServletException {
  long before = System.currentTimeMillis(  );
  chain.doFilter(request, response);
  long after = System.currentTimeMillis(  );
  ... //log the time in your logger

Naturally, this filter should be the first in the filter chain so the time taken for any other filter is included in the total time recorded. The time measured will include some network transfer time, since the server-to-client socket write does not complete on the server until the last portion of data is written to the server's network buffer.

JDBC requests

This measurement is fully covered in Section 16.1, which explains how to create and use wrappers to measure JDBC performance.

RMI communications

Turn on RMI logging with the java.rmi.server.logCalls property:

% java -Djava.rmi.server.logCalls=true ...

Section 2.6 details this technique.

A second technique uses smart proxies to monitor the performance of RMI calls. This technique replaces objects that make a remote call with proxy objects (from java.lang.reflect.Proxy) that can wrap the remote call with timing logic.

File descriptors

The number of available file descriptors is limited in each process and in the overall system. Each open file and open socket requires a file descriptor. Use ulimit (Unix only) to monitor the number of file descriptors available to the processes, and make sure this number is high enough to service all connections. In Windows, you can monitor open files and sockets from the performance monitor.

Bean life cycle

Essentially, all methods that handle the life cycle of the bean need to be wrapped, including the bean constructor, setEntityContext( ), ejbHome( ), ejbCreate( ), ejbActivate( ), ejbLoad( ), ejbStore( ), ejbPassivate( ), ejbRemove( ), ejbFind( ), and unsetEntityContext( ). Look for too many calls to these methods, which can occur with excessive cycling of objects (too many creates) or excessive passivation.

Transaction boundaries

Begin, commit, and abort calls need to be wrapped. Wrapping can be difficult because the container can be responsible for such calls. Relying on the JDBC wrapper to catch transaction boundaries might be easiest. First, you would need to verify that all transaction boundaries correspond to database transaction boundaries.

Cache sizes

Cache sizesthe number of objects held and the physical size usedshould be monitored. There is no generic method to do this.

CPU utilization

Use operating-system utilities to measure CPU utilization (no Java API measures CPU utilization of the JVM). Windows has a performance monitor, and Unix has the sar, vmstat, and iostat utilities.

Stack traces

Generate stack dumps on Unix by sending a kill -QUIT signal (kill -3) to the JVM process or by pressing Ctrl-\ in the window where the Java program was started. On Windows, press Ctrl-Break in the window where the Java program is running or (prior to SDK 1.3) click the Close button on the command window. The stack dump lists the state and Java stack of every currently running thread. In the case of a deadlock, two or more threads will be in the "W" (wait) state, indicating that they are waiting for locks to be released. The method at the top of the stack listing is the "current" methodi.e., the method that requested a lock and caused the thread to move into a wait state as it waits for the lock to be granted. Thus, identifying which methods are causing the deadlock is easy.

GC pauses

When garbage collection kicks in, current VMs stop other processing activity. These perceptible pauses in activity can result in unacceptable performance. Use the latest available VMs, and try to tune the garbage collection to minimize "stop the world" pauses. Chapter 2 and Chapter 3 discuss garbage-collection algorithms and tuning. Concurrent garbage collection (-Xconcgc in Version 1.4.1 of the Sun VM) allows pause times to be minimized.

Network bandwidth

Use netperf (Unix) or the performance monitor (Windows) to measure network bandwidth. See also the Section 14.4 in Chapter 14.

15.3.3 Symptoms of Performance Problems

Any of the following symptoms can indicate a performance problem:

  • Slow response times

  • Excessive database table scans

  • Database deadlocks

  • Pages not available

  • Memory leaks

  • High CPU usage (consistently over 85%)

15.3.4 Useful Statistics to Analyze from Measurements

After taking the measurements described here, you may want to analyze several statistics, including the number of users, the number of components, throughput (queries per minute), transaction rates, average and maximum response times, and CPU utilization. You should look for trends and anomalies, and try to identify whether any resource is limited in the current system. For example, increasing the number of concurrent users over time may show that throughput flattens out, indicating that the current maximum throughput was reached.