Most of these suggestions apply only after a bottleneck has been identified:
Ensure that performance tests are run with the same amount of I/O as the expected finished application. Specifically, turn off any extra logging, tracing, and debugging I/O.
Use Runtime.traceMethodCalls( ), when supported, to count I/O calls.
Redefine the I/O classes to count I/O calls if necessary.
Include logging statements next to all basic I/O calls in the application.
Parallelize I/O by splitting data into multiple files.
Execute I/O in a background thread.
Avoid the filesystem file-growing overhead by preallocating files.
Try to minimize the number of I/O calls.
Buffer to reduce the number of I/O operations by increasing the amount of data transfer each I/O operation executes.
Cache to replace repeated I/O operations with much faster memory or local disk access.
Avoid or reduce I/O calls in loops.
Replace System.out and System.err with customized PrintStream classes to control console output.
Use logger objects for tight control in specifying logging destinations.
Try to eliminate duplicate and unproductive I/O statements.
Keep files open and navigate around them rather than repeatedly opening and closing the files.
Consider optimizing the Java byte-to-char (and char-to-byte) conversion.
Handle serializing explicitly, rather than using default serialization mechanisms.
Use transient fields to avoid serialization.
Use the java.io.Externalizable interface if overriding the default serialization routines.
Use change logs for small changes, rather than reserializing the whole object.
Minimize the work done in the no-arg constructor.
Consider partitioning objects into multiple sets and serializing each set concurrently in different threads.
Use lazy initialization to move or spread the deserialization overhead to other times.
Consider indexing an object table for selective access to stored serialized objects.
Optimize network transfers by transferring only the data and objects needed, and no more.
Cluster serialized objects that are used together by putting them into the same file.
Put objects next to each other if they are required together.
Consider using an object-storage system (such as an object database) if your object-storage requirements are at all sophisticated.
Use compression when the overhead of compression is outweighed by the benefit of reducing I/O.
Avoid compression when the system has a heavily loaded CPU.
Consider using "intelligent" I/O classes that can decide to use compression on the fly.
Consider searching directly against compressed data without decompressing.
NIO provides I/O mechanisms mainly targeted at high-performance servers, but is also of use in other situations.
Use nonblocking SocketChannels to connect asynchronously to servers.
Nondirect Buffers provide an efficient mechanism for converting arrays of one primitive data type to another primitive data type.
Direct Buffers provide options for optimizing I/O, especially when using multiple Buffers with scatter-gather I/O operations.
High-performance scalable servers should use NIO multiplexing and asynchronous I/O.