Specify the required performance.
Ensure performance objectives are clear.
Specify target response times for as much of the system as possible.
Specify all variations in benchmarks, including expected response ranges (e.g., 80% of responses for X must fall within 3 seconds).
Include benchmarks for the full range of scaling expected (e.g., low to high numbers of users, data, files, file sizes, objects, etc.).
Specify and use a benchmark suite based on real user behavior. This is particularly important for multiuser benchmarks.
Agree on all target times with users, customers, managers, etc., before tuning.
Make your benchmarks long enough: over five seconds is a good target.
Use elapsed time (wall-clock time) for the primary time measurements.
Ensure the benchmark harness does not interfere with the performance of the application.
Run benchmarks before starting tuning, and again after each tuning exercise.
Take care that you are not measuring artificial situations, such as full caches containing exactly the data needed for the test.
Break down distributed application measurements into components, transfer layers, and network transfer times.
Tune systematically: understand what affects the performance; define targets; tune; monitor and redefine targets when necessary.
Approach tuning scientifically: measure performance; identify bottlenecks; hypothesize on causes; test hypothesis; make changes; measure improved performance.
Determine which resources are limiting performance: CPU, memory, or I/O.
Accurately identify the causes of the performance problems before trying to tune them.
Use the strategy of identifying the main bottlenecks, fixing the easiest, then repeating.
Don't tune what does not need tuning. Avoid "fixing" nonbottlenecked parts of the application.
Measure that the tuning exercise has improved speed.
Target one bottleneck at a time. The application running characteristics can change after each alteration.
Improve a CPU limitation with faster code, better algorithms, and fewer short-lived objects.
Improve a system-memory limitation by using fewer objects or smaller long-lived objects.
Improve I/O limitations by targeted redesigns or speeding up I/O, perhaps by multithreading the I/O.
Work with user expectations to provide the appearance of better performance.
Hold back releasing tuning improvements until there is at least a 20% improvement in response times.
Avoid giving users a false expectation that a task will be finished sooner than it will.
Reduce the variation in response times. Bear in mind that users perceive the mean response time as the actual 90th percentile value of the response times.
Keep the user interface responsive at all times.
Aim to always give user feedback. The interface should not be dead for more than two seconds when carrying out tasks.
Provide the ability to abort or carry on alternative tasks.
Provide user-selectable tuning parameters where this makes sense.
Use threads to separate potentially blocking functions.
Calculate "look-ahead" possibilities while the user response is awaited.
Provide partial data for viewing as soon as possible, without waiting for all requested data to be received.
Cache locally items that may be looked at again or recalculated.
Quality-test the application after any optimizations have been made.
Document optimizations fully in the code. Retain old code in comments.