14.1 Hard Disks

In most cases, applications can be tuned so that disk I/O does not cause any serious performance problems. But if, after application tuning, you find that disk I/O is still causing a performance problem, your best bet may be to upgrade the system disks. Identifying whether the system has a problem with disk utilization is the first step. Each system provides its own tools to identify disk usage (Windows has a performance monitor, and Unix has the sar, vmstat, and iostat utilities.) At minimum, you need to identify whether paging is an issue (look at disk-scan rates) and assess the overall utilization of your disks (e.g., performance monitor on Windows, output from iostat -D on Unix). It may be that the system has a problem independent of your application (e.g., unbalanced disks), and correcting this problem may resolve the performance issue.

If the disk analysis does not identify an obvious system problem that is causing the I/O overhead, you could try making a disk upgrade or a reconfiguration. This type of tuning can consist of any of the following:

  • Upgrading to faster disks

  • Adding more swap space to handle larger buffers

  • Changing the disks to be striped (where files are striped across several disks, thus providing parallel I/O, e.g., with a RAID system)

  • Running the data on raw partitions when this is shown to be faster

  • Distributing simultaneously accessed files across multiple disks to gain parallel I/O

  • Using memory-mapped disks or files (see Section 14.1.3 later in this chapter)

If you have applications that run on many systems and you do not know the specification of the target system, bear in mind that you can never be sure that any particular disk is local to the user. There is a significant possibility that the disk being used by the application is a network-mounted disk. This doubles the variability in response times and throughput. The weakest link, whether it is the network or the disk, is the limiting factor in this case. And this weakest link will probably not even be constant. A network disk is a shared resource, as is the network itself, so performance is hugely and unpredictably affected by other users and network load.

14.1.1 Disk I/O

Do not underestimate the impact of disk writes on the system as a whole. For example, all database vendors strongly recommend that the system swap files[1] be placed on a separate disk from their databases. The impact of not doing so can decrease database throughput (and system activity) by an order of magnitude. This performance decrease comes from not splitting the I/O of two disk-intensive applications (in this case, OS paging and database I/O).

[1] The disk files for the virtual memory of the operating system; see the later section Section 14.3.

Identifying that there is an I/O problem is usually fairly easy. The most basic symptom is that things take longer than expected, while at the same time the CPU is not at all heavily worked. The disk-monitoring utilities will also tell you that there is a lot of work being done to the disks. At the system level, you should determine the average and peak requirements on the disks. Your disks will have some statistics that are supplied by the vendor, including:

  • The average and peak transfer rates, normally in megabytes (MB) per second, e.g., 5MB/sec. From this, you can calculate how long an 8K page takes to be transferred from disk; for example, 5MB/sec is about 5K/ms, so an 8K page takes just under 2 ms to transfer.

  • Average seek time, normally in milliseconds (ms). This is the time required for the disk head to move radially to the correct location on the disk.

  • Rotational speed, normally in revolutions per minute (rpm), e.g., 7200 rpm. From this, you can calculate the average rotational delay in moving the disk under the disk-head reader, i.e., the time taken for half a revolution. For example, for 7200 rpm, one revolution takes 60,000 ms (60 seconds) divided by 7200 rpm, which is about 8.3 ms. So half a revolution takes just over 4 ms, which is consequently the average rotational delay.

This list allows you to calculate the actual time it takes to load a random 8K page from the disk, this being seek time + rotational delay + transfer time. Using the examples given in the list, you have 10 + 4 + 2 = 16 ms to load a random 8K page (almost an order of magnitude slower than the raw disk throughput). This calculation gives you a worst-case scenario for the disk-transfer rates for your application, allowing you to determine if the system is up to the required performance. Note that if you are reading data stored sequentially on disk (as when reading a large file), the seek time and rotational delay are incurred less than once per 8K page loaded. Basically, these two times are incurred only at the beginning of opening the file and whenever the file is fragmented. But this calculation is confounded by other processes also executing I/O to the disk at the same time. This overhead is part of the reason why swap and other intensive I/O files should not be put on the same disk.

One mechanism for speeding up disk I/O is to stripe disks. Disk striping allows data from a particular file to be spread over several disks. Striping allows reads and writes to be performed in parallel across the disks without requiring any application changes. This can speed up disk I/O quite effectively. However, be aware that the seek and rotational overhead previously listed still applies, and if you are making many small random reads, there may be no performance gain from striping disks.

Finally, note again that using remote disks adversely affects I/O performance. You should not be using remote disks mounted from the network with any I/O-intensive operations if you need good performance.

14.1.2 Clustering Files

Reading many files sequentially is faster if the files are clustered together on the disk, allowing the disk-head reader to flow from one file to the next. This clustering is best done in conjunction with defragmenting the disks. The overhead in finding the location of a file on the disk (detailed in the previous section) is also minimized for sequential reads if the files are clustered.

If you cannot specify clustering files at the disk level, you can still provide similar functionality by putting all the files together into one large file (as is done with the ZIP filesystem). This is fine if all the files are read-only files or if there is just one file that is writeable (you place that at the end). However, when there is more than one writeable file, you need to manage the location of the internal files in your system as one or more grow. This becomes a problem and is not usually worth the effort. (If the files have a known bounded size, you can pad the files internally, thus regaining the single file efficiency.)

14.1.3 Cached Filesystems (RAM Disks, tmpfs, cachefs)

Most operating systems provide the ability to map a filesystem into the system memory. This ability can speed up reads and writes to certain files in which you control your target environment. Typically, this technique has been used to speed up the reading and writing of temporary files. For example, some compilers (of languages in general, not specifically Java) generate many temporary files during compilation. If these files are created and written directly to the system memory, the speed of compilation is greatly increased. Similarly, if you have a set of external files that are needed by your application, it is possible to map these directly into the system memory, thus allowing their reads and writes to be speeded up greatly.

But note that these types of filesystems are not persistent. In the same way the system memory of the machine gets cleared when it is rebooted, so these filesystems are removed on reboot. If the system crashes, anything in a memory-mapped filesystem is lost. For this reason, these types of filesystems are usually suitable only for temporary files or read-only versions of disk-based files (such as mapping a CD-ROM into a memory-resident filesystem).

Remember that you do not have the same degree of fine control over these filesystems that you have over your application. A memory-mapped filesystem does not use memory resources as efficiently as working directly from your application. If you have direct control over the files you are reading and writing, it is usually better to optimize this within your application rather than outside it. A memory-mapped filesystem takes space directly from system memory. You should consider whether it would be better to let your application grow in memory instead of letting the filesystem take up that system memory. For multiuser applications, it is usually more efficient for the system to map shared files directly into memory, as a particular file then takes up just one memory location rather than being duplicated in each process. Note that from SDK 1.4, memory-mapped files are directly supported from the java.nio package, as discussed in Chapter 8. Memory-mapped files are slightly different from memory-mapped filesystems. A memory-mapped file uses system resources to read the file into system memory, and that data can then be accessed from Java through the appropriate java.nio buffer. A memory-mapped filesystem does not require the java.nio package and, as far as Java is concerned, files in that filesystem are simply files like any others. The operating system transparently handles the memory mapping.

The creation of memory-mapped filesystems is completely system-dependent, and there is no guarantee that it is available on any particular system (though most modern operating systems do support this feature). On Unix systems, the administrator needs to look at the documentation of the mount command and its subsections on cachefs and tmpfs. Under Windows, you should find details by looking at the documentation on how to set up a RAM disk, a portion of memory mapped to a logical disk drive.

In a similar way, there are products available that precache shared libraries (DLLs) and even executables in memory. This usually means only that an application starts quicker or loads the shared library quicker, and so may not be much help in speeding up a running system (for example, Norton SpeedStart caches DLLs and device drivers in memory on Windows systems).

But you can apply the technique of memory-mapping filesystems directly and quite usefully for applications in which processes are frequently started. Copy the Java distribution and all class files (all JDK, application, and third-party class files) onto a memory-mapped filesystem and ensure that all executions and classload s take place from that filesystem. Since everything (executables, shared libraries, class files, resources, etc.) is already in memory, the startup time is much faster. Because only the startup (and classloading) time is affected, this technique gives only a small boost to applications that are not frequently starting processes, but can be usefully applied if startup time is a problem.

14.1.4 Disk Fragmentation

When files are stored on disk, the bytes in the files are not necessarily stored contiguously: their storage depends on file size and contiguous space available on the disk. This noncontiguous disk storage is called fragmentation. Any particular file may have some chunks in one place, and a pointer to the next chunk that may be quite a distance away on the disk.

Hard disks tend to get fragmented over time. This fragmentation delays both reads from files (including loading applications into computer memory on startup) and writes to files. This delay occurs because the disk header must wind on to the next chunk with each fragmentation, and this takes time.

For optimum performance on any system, it is a good idea to periodically defragment the disks. This reunites files that have been split up so that the disk heads do not spend so much time searching for data once the file-header locations have been identified, thus speeding up data access. Defragmenting may not be effective on all systems, however.

14.1.5 Disk Sweet Spots

Most disks have a location from which data is transferred faster than from other locations. Usually, the closer the data is to the outside edge of the disk, the faster it can be read from the disk. Most hard disks rotate at constant angular speed. This means that the linear speed of the disk under a point is faster the farther away the point is from the center of the disk. Thus, data at the edge of the disk can be read from (and written to) at the fastest possible rate commensurate with the maximum density of data storable on disk.

This location with faster transfer rates is usually termed the disk sweet spot. Some (commercial) utilities provide mapped access to the underlying disk and allow you to reorganize files to optimize access. On most server systems, the administrator has control over how logical partitions of the disk apply to the physical layout, and how to position files to the disk sweet spots. Experts for high-performance database systems sometimes try to position the index tables of the database as close as possible to the disk sweet spot. These tables consist of relatively small amounts of data that affect the performance of the system in a disproportionately large way, so that any speed improvement in manipulating these tables is significant.

Note that some of the latest operating systems are beginning to include "awareness" of disk sweet spots, and attempt to move executables to sweet spots when defragmenting the disk. You may need to ensure that the defragmentation procedure does not disrupt your own use of the disk sweet spot.