It is important to understand that the user has a particular view of performance that allows you to cut some corners. The user of an application sees changes as part of the performance. A browser that gives a running countdown of the amount left to be downloaded from a server is seen to be faster than one that just sits there, apparently hung, until all the data is downloaded. People expect to see something happening, and a good rule of thumb is that if an application is unresponsive for more than three seconds, it is seen as slow. Some Human Computer Interface authorities put the user patience limit at just two seconds; an IBM study from the early '70s suggested people's attention began to wander after waiting for more than just one second. For performance improvements, it is also useful to know that users are not generally aware of response time improvements of less than 20%. This means that when tuning for user perception, you should not deliver any changes to the users until you have made improvements that add more than a 20% speedup.
A few long response times make a bigger impression on the memory than many shorter ones. According to Arnold Allen,[1] the perceived value of the average response time is not the average, but the 90th percentile value: the value that is greater than 90% of all observed response times. With a typical exponential distribution, the 90th percentile value is 2.3 times the average value. Consequently, as long as you reduce the variation in response times so that the 90th percentile value is smaller than before, you can actually increase the average response time, and the user will still perceive the application as faster. For this reason, you may want to target variation in response times as a primary goal. Unfortunately, this is one of the more complex targets in performance tuning: it can be difficult to determine exactly why response times are varying.
[1] Introduction to Computer Performance Analysis with Mathematica (Academic Press).
If the interface provides feedback and allows the user to carry on other tasks or abort and start another function (preferably both), the user sees this as a responsive interface and doesn't consider the application as slow as he might otherwise. If you give users an expectancy of how long a particular task might take and why, they often accept this and adjust their expectations. Modern web browsers provide an excellent example of this strategy in practice. People realize that the browser is limited by the bandwidth of their connection to the Internet and that downloading cannot happen faster than a given speed. Good browsers always try to show the parts they have already received so that the user is not blocked, and they also allow the user to terminate downloading or go off to another page at any time, even while a page is partly downloaded. Generally, it is not the browser that is seen to be slow, but rather the Internet or the server site. In fact, browser creators have made a number of tradeoffs so that their browsers appear to run faster in a slow environment. I have measured browser display of identical pages under identical conditions and found browsers that are actually faster at full page display but seem slower because they do not display partial pages, download embedded links concurrently, and so on. Modern web browsers provide a good example of how to manage user expectations and perceptions of performance.
However, one area in which some web browsers have misjudged user expectation is when they give users a momentary false expectation that operations have finished when in fact another is to start immediately. This false expectation is perceived as slow performance. For example, when downloading a page with embedded links such as images, the browser status bar often shows reports like "20% of 34K," which moves up to "56% of 34K," etc., until it reaches 100% and indicates that the page has finished downloading. However, at this point, when the user expects that all the downloading has finished, the status bar starts displaying "26% of 28K" and so on, as the browser reports separately on each embedded graphic as it downloads them. This frustrates users who initially expected the completion time from the first download report and had geared themselves up to do something, only to have to wait again (often repeatedly). A better practice would be to report on how many elements need to be downloaded as well as the current download status, giving the user a clearer expectation of the full download time.
Where there are varying possibilities for performance tradeoffs (e.g., resolution versus frame rate for animation, compression size versus speed of compression for compression utilities, etc.), the best strategy is to put the user in control. It is better to provide the option to choose between faster performance and better functionality. When users have made the choice themselves, they are often more willing to put up with actions taking longer in return for better functionality. When users do not have this control, their response is usually less tolerant.
This strategy also allows those users who have strong performance requirements to be provided for at their own cost. But it is always important to provide a reasonable default in the absence of any choice from the user. Where there are many different parameters, consider providing various levels of user-controlled tuning parameters, e.g., an easy set of just a few main parameters, a middle level, and an expert level with access to all parameters. This must, of course, be well documented to be really useful.
A lot of time (in CPU cycles) passes while the user is reacting to the application interface. This time can be used to anticipate what the user wants to do (using a background low-priority thread), so that precalculated results are ready to assist the user immediately. This makes an application appear blazingly fast.
Similarly, ensuring that your application remains responsive to the user, even while it is executing some other function, makes it seem fast and responsive. For example, I always find that when starting up an application, applications that draw themselves on screen quickly and respond to repaint requests even while still initializing (you can test this by putting the window in the background and then bringing it to the foreground) give the impression of being much faster than applications that seem to be chugging away unresponsively. Starting different word-processing applications with a large file to open can be instructive, especially if the file is on the network or a slow (removable) disk. Some act very nicely, responding almost immediately while the file is still loading; others just hang unresponsively with windows only partially refreshed until the file is loaded; others don't even fully paint themselves until the file has finished loading. This illustrates what can happen if you do not use threads appropriately.
In Java, the key to making an application responsive is multithreading. Use threads to ensure that any particular service is available and unblocked when needed. Of course, this can be difficult to program correctly and manage. Handling interthread communication with maximal responsiveness (and minimal bugs) is a complex task, but it does tend to make for a very snappily built application.
When you display the results of some activity on the screen, there is often more information than can fit on a single screen. For example, a request to list all the details on all the files in a particular large directory may not fit on one display screen. The usual way to display this is to show as much as will fit on a single screen and indicate that there are more items available with a scrollbar. Other applications or other information may use a "more" button or have other ways of indicating how to display or move on to the extra information.
In these cases, you initially need to display only a partial result of the activity. This tactic can work very much in your favor. For activities that take too long and for which some of the results can be returned more quickly than others, it is certainly possible to show just the first set of results while continuing to compile more results in the background. This gives the user an apparently much quicker response than if you were to wait for all the results to be available before displaying them.
This situation is often the case for distributed applications. A well-known example is (again!) found in web browsers that display the initial screenful of a page as soon as it is available, without waiting for the whole page to be downloaded. The general case is when you have a long activity that can provide results in a stream so that the results can be accessed a few at a time. For distributed applications, sending all the data is often what takes a long time; in this case, you can build streaming into the application by sending one screenful of data at a time. Also, bear in mind that when there is a really large amount of data to display, the user often views only some of it and aborts, so be sure to build in the ability to stop the stream and restore its resources at any time.
This section briefly covers the general principles of caching. Caching is an optimization technique I return to in several different sections of this book when appropriate to the problem under discussion. For example, in the area of disk access, there are several caches that apply: from the lowest-level hardware cache up through the operating-system disk read and write caches, cached filesystems, and file reading and writing classes that provide buffered I/O. Some caches cannot be tuned at all; others are tuneable at the operating-system level or in Java. Where it is possible for a developer to take advantage of or tune a particular cache, I provide suggestions and approaches that cover the caching technique appropriate to that area of the application. In cases where caches are not directly tuneable, it is still worth knowing the effect of using the cache in different ways and how this can affect performance. For example, disk hardware caches almost always apply a read-ahead algorithm: the cache is filled with the next block of data after the one just read. This means that reading backward through a file (in chunks) is not as fast as reading forward through the file.
Caches are effective because it is expensive to move data from one place to another or to calculate results. If you need to do this more than once to the same piece of data, it is best to hang onto it the first time and refer to the local copy in the future. This applies, for example, to remote access of files such as browser downloads. The browser caches the downloaded file locally on disk to ensure that a subsequent access does not have to reach across the network to reread the file, thus making it much quicker to access a second time. It also applies, in a different way, to reading bytes from the disk. Here, the cost of reading one byte for operating systems is the same as reading a page (usually 4 or 8 KB), as data is read into memory a page at a time by the operating system. If you are going to read more than one byte from a particular disk area, it is better to read in a whole page (or all the data if it fits on one page) and access bytes through your local copy of the data.
General aspects of caching are covered in more detail in Section 11.7. Caching is an important performance-tuning technique that trades space for time, and it should be used whenever extra memory space is available to the application.