15.4 Load Testing

The discussion of load testing in Section 1.6 of Chapter 1 is relevant to J2EE application tuning. Here's a summary of the steps involved:

  1. Specify performance targets and benchmarks, including scaling requirements. Include all user types, such as information-gathering requests and transaction clients, in your benchmarks. Performance requirements should include the required response times for end users, the perceived steady state and peak user loads, the average and peak amount of data transferred per request, and the expected growth in user load over the first or next 12 months.

  2. Create a testing environment that mirrors the expected real-world environment as closely as possible. Generally, there will be differences, but the most critical aspects to simulate closely are the expected client activity, the application data, and the peak scaling requirements (amount of data and number of users). The only reliable way to determine a system's scalability is to perform load tests in which the volume and characteristics of the anticipated traffic are simulated as realistically as possible. Characterize the anticipated load as objectively and systematically as possible, use existing log files when possible, and characterize user sessions (such as the number and types of pages viewed or the duration of sessions). Determine the range and distribution of session variation. Don't use averages; use representative profiles.

  3. Load-test the system, find bottlenecks, and eliminate them.

Load testing should be repeatable. Use load-test suites and frameworks. Many products are available, including free tools (see http://www.JavaPerformanceTuning.com/resources.shtml). Continuously retest and measure against established benchmarks to ensure that application performance hasn't degraded as changes are made. The server must be designed to handle peak loads, so tests including expected peak loads should be scrutinized. Peak user loads are the number of concurrent sessions managed by the application server, not the number of possible users.

The key elements of a load-test design are the test objective (e.g., can the server handle N sessions per hour at peak load level?), pass/fail criteria (e.g., pass if response times stay within a certain range), script description (e.g., user1: page1, page2, ...; user2: page1, page3, start transaction1, etc.), and scenario description (which scripts at which frequency and how the load increases). One stress-test methodology requires the following steps:

  1. Determine the maximum acceptable response time for getting a page.

  2. Estimate the maximum number of simultaneous users.

  3. Simulate user requests, gradually adding simulated users until the application response delay becomes greater than the acceptable response time.

  4. Optimize until you reach the desired number of users.

When testing performance, run tests overnight and on weekends to generate longer-term trends. Your tests could generate inaccurate results. Consider these potential pitfalls:

  • Testing without a real network connection can give false measures.

  • Low user simulation can be markedly different from high user simulation.

  • Network throughput may be larger than in the deployed environment.

  • Nonpersistent performance depends on processor and memory.

  • Disk speed is crucial for persistent messages.

Performance testing should continue even after the application is deployed. For applications expected to perform 24/7, inconsequential issues like database logging can degrade performance. Continuous monitoring is the key to spotting even the slightest abnormality. Set performance capacity thresholds, monitor them, and look for trends. When application transaction volumes reach 40% of maximum expected volumes, you should execute plans to expand the capacity of system. Note that 40% is an arbitrary choice, but it's a good place to start; if you're at 40% and don't see the first hints of more serious problems, like significant spikes in usage profiles, you might relax and set a new, higher threshold. The point is that you should watch for signs that your application is outgrowing the system and make plans for an upgrade well before the upgrade is needed.