8.7 NIO

SDK 1.4 includes a new set of packages called NIO ( New I/O) in the java.nio package space. The NIO classes hold I/O functionality that is available from most modern operating systems but that was missing from Java. Much of the NIO functionality is needed for highly scalable efficient server technology, but some aspects are useful for many applications, as we'll see in the next few sections. For a good introduction to NIO, consider Learning Java (O'Reilly) as well as a basic introduction in Michael Nygard's excellent article in JavaWorld.^[15]

^[15] Michael Nygard, "Master Merlin's new I/O classes," JavaWorld, September 2001, http://www.javaworld.com/javaworld/jw-09-2001/jw-0907-merlin.html.

8.7.1 Connecting Client Sockets

When you create a client socket to connect to a server, the underlying TCP connection procedure involves a two-phase acknowledgment. The Socket.connect( ) call invokes a blocking procedure, which normally stops the thread from proceeding until the connection is complete.

The NIO package allows you to initiate the socket connection procedure and carry on processing in the thread while the connection is being established. To achieve this, you use a SocketChannel set to nonblocking mode. Then, a call to SocketChannel.connect( ) returns immediately, with the connection attempted in the background. You can use the OP_CONNECT flag with the Selector, or more simply call the SocketChannel.isConnectionPending( ) method at any time to check whether the connection procedure has completed. When the connection is flagged as ready, the connection is finished by calling SocketChannel.finishConnect( ), which essentially tells you either that the connection is established (by returning normally from the method call), or that the connection attempt failed (by throwing an exception).

8.7.2 Nondirect Buffers: Fast Casts for Primitive Arrays

The NIO package introduces Buffer objects, which are essentially collections of primitive data types. For example, FloatBuffer holds a collection of floats. Buffer objects are of two types, direct and nondirect. Direct Buffers are objects that directly wrap a section of system memory. Nondirect buffers essentially wrap a Java array object, e.g., float[ ] arrays for FloatBuffer. Buffers are an integral part of the NIO package, and many NIO objects use Buffers in various ways.

In this section we'll look at an example of using nondirect Buffers independently from the rest of NIO. An application I am familiar with needed to send an array of floats across the network to another Java process and did so by looping through the array of floats, writing each float using DataOutput.writeFloat( ) to the socket and reading each float at the other end. The whole sequence was preceded with the number of floats in the array, like this:

 dataOut.writeInt(floatArray.length);
 for (int i = 0; i < floatArray.length; i++)
   dataOut.writeFloat(floatArray[i]);
 dataOut.flush(  );
 dataOut.close(  );

The read at the other end was simply the reverse procedure:

 int len = dataIn.readInt(  );
 float[  ] floatArray = new float[len];
 for (int i = 0; i < len; i++)
   floatArray[i] = dataIn.readFloat(  );
 dataIn.close(  );

Let's examine this technique for inefficiencies. Using what you have learned from the rest of this chapter, you should immediately see two potential speedups. First, the amount of data written could be compressed. Second, the number of I/O operations is large; fewer I/O operations should make the procedure faster.

Let's deal with compression first. Applying compression would actually slow down the read and write, though the overall transfer might be speeded up if the connection was slow with significant fragmentation. Compression in this situation is heavily network-dependent. I'll assume we are on a LAN, in which case compression would only slow down the transfer.

Next, we can reduce the number of I/O operations. With the current procedure we have one I/O operation executed for every float in the array. The underlying network stack might batch some of the I/O calls, but you cannot rely on that. One obvious way to reduce the number of I/O calls is to treat the array as a single object and use an ObjectInputStream/ObjectOutputStream pair to make the transfer:

 objectOut.writeObject(floatArray);
 objectOut.flush(  );
 objectOut.close(  );

The read at the other end is simply the reverse procedure:

 float[  ] floatArray = (float[  ]) objectIn.readObject(  );
 objectIn.close(  );

Measuring this new procedure, for very large arrays of floats (hundreds of thousands), I obtain times that are more than a hundred times fastera gain of two orders of magnitude. The gain is due to reducing the number of I/O calls from hundreds of thousands to just one.

Unfortunately for the developers of this application, ObjectStreams cannot be reused. That is, you cannot reset an ObjectStream and start reusing it because of initialization data it writes and reads on the stream. ObjectStreams are fine if you are doing all of your stream I/O looped through one ObjectStream. But if you are using other stream-writing procedures on the same stream as the ObjectStreams, you have to be extremely careful, or you need to create and release ObjectStreams for each group of objects sent. And, unfortunately, ObjectStreams have a high creation overhead. The particular application that was transferring the float arrays could not use ObjectStreams, but needed the speed they could have gained from them.

Next we consider Buffers. Buffer objects allow you to treat arrays of one type of primitive data type as an array of another type. This is equivalent to being able to cast an array of data from one data type to another. In this case, we would like to treat a float[ ] array as a byte[ ] array because we can read and write byte[ ] arrays very efficiently in single I/O operations.

Specifically, you can create a ByteBuffer wrapping a byte[ ] array. The byte[ ] array is going to be used for efficient I/O, using the InputStream.write(byte[ ]) and OutputStream.write(byte[ ]) methods. ByteBuffer provides methods to access the ByteBuffer as another type of Buffer, in this case as a FloatBuffer:

 //The byte array for output
 byte[  ] byteArray = new byte[floatArray.length*4];
 //Create a FloatBuffer 'view' on the ByteBuffer that wraps the byte array
 FloatBuffer outFloatBuffer = ByteBuffer.wrap(byteArray).asfloatBuffer(  );
 //Write the array of floats into the byte array. FloatBuffer does this efficiently
 outFloatBuffer.put(floatArray, 0, floatArray.length);
 //And write the length then the byte array
 dataOut.writeInt(floatArray.length);
 dataOut.write(byteArray, 0, floatArray.length*4);
 dataOut.flush(  );
 dataOut.close(  );

The read is very similar:

 int len = dataIn.readInt(  );
 //The byte array for input
 byte[  ] byteArray = new byte[len*4];
 //Create a FloatBuffer 'view' on the ByteBuffer that wraps the byte array
 FloatBuffer inFloatBuffer = ByteBuffer.wrap(byteArray).asfloatBuffer(  );
 float[  ] floatArray = new float[len];
 //Read the data into the byte array.
 dataIn.readFully(byteArray);
 //And copy the array of floats from the byte array. FloatBuffer does this 
efficiently
 inFloatBuffer.get(floatArray, 0, floatArray.length);
 dataIn.close(  );

As a result, we achieve the same speed as the single I/O object transfer, without the need for ObjectStreams.

8.7.3 Direct Buffers: Optimized I/O Operations

Direct Buffers wrap a portion of system memory. They yield optimal I/O efficiency by allowing system I/O operations to operate directly between system memory and an external medium (e.g., the disk or network). In contrast, nondirect Buffers require an extra copy operation to move the data from the Java heap to and from the external medium. The NIO I/O operations are optimized for dealing with direct Buffers. However, note that the "old" I/O (java.io) classes are also optimized, but for operating on Java arrays. The InputStream and OutputStream classes that operate directly on external media (for example, FileInputStream) also require no extra copy operations to move data to and from Java arrays. So we can see that nondirect Buffers are at a disadvantage when compared to both the other options, but it is not obvious which other combination of data structure and I/O operation is the most efficient.

So let's test out the possibilities. I'll use a simple file-copying operation to test the various options. I've chosen file copying because NIO includes an extra operation for enabling file copies in the FileChannel class, which gives us one further optimization option. First, we have the good old java.io technique of reading chunks from the file into a byte[ ] array buffer and writing those chunks out. You should be fairly familiar with this by now:

  public static void explicitBufferInputStreamCopy(String f1, String f2)
    throws Exception
  {
    long time = System.currentTimeMillis(  );
    byte[  ] buffer = new byte[1024*16];
    FileInputStream rdr = new FileInputStream(f1);
    FileOutputStream wrtr = new FileOutputStream(f2);
    int readLen;
    while( (readLen = rdr.read(buffer)) != -1)
    {
      wrtr.write(buffer, 0, readLen);
    }
    rdr.close(  );
    wrtr.close(  );
    time = System.currentTimeMillis(  ) - time;
    System.out.println(" explicitBufferInputStreamCopy time: " + time);
  }

Next, we have the equivalent technique using a direct Buffer and FileChannels. This technique may be unfamiliar, but it is straightforward. We allocate a direct Buffer using the ByteBuffer.allocateDirect( ) method, open the file for reading and writing obtaining the FileChannel objects, then simply repeatedly read into the Buffer and write out the Buffer until the file has been copied. Conceptually, this is exactly the same series of operations as the last method we defined, explicitBufferInputStreamCopy( ).

  public static void directBufferCopy(String f1, String f2)
    throws Exception
  {
    long time = System.currentTimeMillis(  );
    ByteBuffer buffer = ByteBuffer.allocateDirect(16*1024);
    FileChannel rdr = (new FileInputStream(f1)).getChannel(  );
    FileChannel wrtr = (new FileOutputStream(f2)).getChannel(  );
    while( rdr.read(buffer) > 0)
    {
      buffer.flip(  );
      wrtr.write(buffer);
      buffer.clear(  );
    }
    rdr.close(  );
    wrtr.close(  );
    time = System.currentTimeMillis(  ) - time;
    System.out.println(" directBufferCopy time: " + time);
  }

For completeness, I also test using a nondirect Buffer. To use a nondirect Buffer, the only difference from the last method is that ByteBuffer.allocate( ) is used instead of ByteBuffer.allocateDirect( ).

The directBufferCopy( ) method we just defined uses ByteBuffer.allocateDirect( ) to obtain a direct Buffer, but NIO gives us another option to get a direct Buffer. NIO supports the memory mapping of files. The FileChannel.map( ) operation uses the operating system to map a portion of a file or the whole file into system memory. Using this method we can obtain a direct Buffer containing the entire file. In "old" I/O terms, this is equivalent to creating a byte[ ] array buffer the same size as the file, reading the entire file into that byte[ ] array, then writing it out to the new file copy. For "old" I/O, this procedure would normally be less efficient than using a smaller byte[ ] buffer as we did in the explicitBufferInputStreamCopy( ) method, but here we are using operating-system memory mapping, which may make a difference.

  public static void mappedBufferCopy(String f1, String f2)
    throws Exception
  {
    long time = System.currentTimeMillis(  );
    FileChannel rdr = (new FileInputStream(f1)).getChannel(  );
    FileChannel wrtr = (new FileOutputStream(f2)).getChannel(  );
    ByteBuffer buffer = rdr.map(FileChannel.MapMode.READ_ONLY, 0, (int) rdr.size(  ));
    wrtr.write(buffer);
    rdr.close(  );
    wrtr.close(  );
    time = System.currentTimeMillis(  ) - time;
    System.out.println(" mappedBufferCopy time: " + time);
  }

Note that the FileChannel API documentation indicates that the procedure and efficiency of memory-mapping files is highly system-dependent. The API also states:

"For most operating systems, mapping a file into memory is more expensive than reading or writing a few tens of kilobytes of data via the usual read and write methods. From the standpoint of performance it is generally only worth mapping relatively large files into memory."

Finally, as I said earlier, FileChannel also provides transferTo( ) and transferFrom( ) methods. Once again, these methods are intended for maximal efficiency in transferring bytes between FileChannels and other Channels, by using the underlying operating system's filesystem cache. The API states that:

"Bytes can be transferred from a file to some other channel in a way that can be optimized by many operating systems into a very fast transfer directly to or from the filesystem cache."

Using FileChannel.transferTo( ) is relatively simpler than any of the previous methods: you obtain the two FileChannels and then execute transferTo( ). No need for looping, or even reading or writing!

  public static void directTransferCopy(String f1, String f2)
    throws Exception
  {
    long time = System.currentTimeMillis(  );
    FileChannel rdr = (new FileInputStream(f1)).getChannel(  );
    FileChannel wrtr = (new FileOutputStream(f2)).getChannel(  );
    rdr.transferTo(0, (int) rdr.size(  ), wrtr);
    rdr.close(  );
    wrtr.close(  );
    time = System.currentTimeMillis(  ) - time;
    System.out.println(" directTransferCopy time: " + time);
  }

This series of tests is hugely system-dependent. Because NIO operations are much closer to, and more reliant on, the operating system than most other Java classes, we have not only the usual VM variability but also operating-system differences to take into account. Additionally, since we are testing file copying, disk efficiencies also affect these tests. Table 8-4 shows the results of running under one brand of Windows (NT 4), using the 1.4 VM in various modes with various repetitions. I also tested these methods under Windows 98 and Solaris 8. Generally, the NIO results were much more variable than the "old" I/O results. It seems that on average the direct-Buffer copy (directBufferCopy( )) was the fastest operation for the Windows test machines, followed closely by the Stream+byte[ ] copy (explicitBufferInputStreamCopy( )). On Solaris, the FileChannel transfer (directTransferCopy( )) seemed to be the fastest, again followed closely by the Stream+byte[ ] copy (explicitBufferInputStreamCopy( )).

Table 8-4. Time ranges for various file copying techniques
Copy method	Normalized time range
explicitBufferInputStreamCopy	100%-145%
nonDirectBufferCopy	454%-674%
directBufferCopy	67%-241%
MappedBufferCopy	240%-916%
directTransferCopy	238%-514%

Clearly, though, there is no huge advantage to NIO in this situation compared with using the "old" I/O. But bear in mind that the NIO Buffers are not specifically designed to replace the old I/O classes. NIO provides additional capabilities. So, for example, we haven't tested Buffer classes with scatter-gather operations, which work on multiple Buffers simultaneously. For example the ScatteringByteChannel.read(ByteBuffer[ ], int, int) method reads from a Channel directly into multiple Buffers in one I/O operation. Similarly, GatheringByteChannel.write( ) writes the contents of multiple Buffers in one I/O operation. When is this useful? A common example is an HTTP server. When an HTTP server downloads a page (file) to a browser, it writes out a header and then the page. The header itself consists of several different sections that need to be amalgamated. It is efficient to write the parts of the headers to the stream in multiple separate I/O operations, followed by the page body, allowing the network to buffer the response. Unfortunately, this turns out to be suboptimal because you are increasing the I/O operations and allowing the network stack to set the pace. Acme's THTTPD developers ran a series of performance tests of various HTTP servers, and identified that the amount of data sent in the first network packet was crucial to optimal performance:

"Turns out the change that made the difference was sending the response headers and the first load of data as a single packet, instead of as two separate packets. Apparently this avoids triggering TCP's "delayed ACK," a 1/5th second wait to see if more packets are coming in."^[16]

^[16] This quotation can be found at http://www.acme.com/software/thttpd/benchmarks.html.

GatheringByteChannel.write( ) is not the only way to optimize this situation, but it is efficient and avoids the requirement for intermediate buffers that would be necessary with stream I/O.

8.7.4 Multiplexing

Possibly the most important features of NIO from a performance standpoint are the nonblocking channels and the ability to multiplex channels. Multiplexing I/O allows you to handle multiple I/O channels from one thread without having the thread block on any one channel. Without NIO, you have no certain way to know that a read or write will block the thread. The InputStream.available( ) method is the only stream method for determining if a read will not block, and it is not reliable; there is no method at all to determine if stream write would not block. NIO provides the Selector class to reliably determine which I/O channels are ready to operate nonblocking I/O. Currently, NIO does not support the multiplexing of FileChannels (though most operating systems do), so multiplexing with JDK 1.4 is primarily for socket communications.

It is useful to understand nonblocking mode in a little more detail. OutputStream.write( ) has a return signature of void. But at the operating-system level, I/O write operations do return a value, normally the number of bytes that were written by the write call. This is efficient: most calls to an operating-system write send data to some buffer (disk buffer, network stack buffer, filesystem cache, etc.). So any call to write normally fills a buffer. If there are too many bytes being written by the call, then the buffer is filled with those bytes it can take, the remaining bytes are not written, and the number of bytes written is returned. The buffer is then emptied by sending on the data, and it is ready for the next chunk of bytes. The buffer emptying is done at I/O speeds, which is typically several orders of magnitude slower than the write to the in-memory buffer.

Consequently, the Java OutputStream.write( ) doesn't just fill the buffer and return, as it would need to return the number of bytes written. Instead, the buffer is filled, emptied, and so on, until all the bytes have been written. OutputStream.write( ) is actually a looped call to the underlying operating-system write call. Usually this is very convenient. But because the write can block for so long, you need to give it a separate thread of its own until it completes. You are probably familiar with doing this for reads, but it may not have occurred to you that Java writes were in the same category.

NIO writes are much closer to operating-system writes. For example, in all cases, where the data fits into the network buffer, a write to the socket should return immediately. And where there are too many bytes for the buffer, SocketChannel.write( ) still returns immediately if the SocketChannel is in nonblocking mode, returning the number of bytes written to the buffer. The actual network I/O proceeds asynchronously, leaving the Java thread to do other operations. In fact, typically the thread has time for thousands more operations before the buffer is ready to accept more data (i.e., the next write can succeed without blocking). So nonblocking mode gives you asynchronous I/O, and because the thread can execute thousands of operations for each I/O call, this means that one thread can effectively handle thousands of I/O channels simultaneously (in other words, multiplexing). But for effective multiplexing, you also need to know which Channels are ready to be written to or read from. And it is the Selector class that will reliably tell you when any channel is ready to perform its next I/O operation. The Selector class determines from the operating system which subset, from a set of Channels, is ready to perform I/O.

The Selector class differentiates between different types of I/O: there are currently four types. The first two types are where Selector can inform you when any Channel is ready to be read or written to. In addition, client sockets can be trying to connect to a server socket: the Selector can tell you when the connection attempt has completed. And lastly, server sockets can accept new connections: the Selector can tell you when there is a connection pending that will allow the server socket to execute an accept( ) call without blocking.

Note that multiplexed asynchronous I/O does not necessarily make I/O any faster. What you get is the ability to handle many I/O channels in one thread. For most Java applications, which have only a few open I/O streams at one time, there is no need to multiplex because a few extra blocked threads are easily managed by the VM. If you have many threads blocked on I/O, then multiplexing your I/O can significantly reduce your resource requirements. Ten I/O threads are probably okay; a hundred is too many. Multiplexed I/O is a definite requirement for scalable high-performance server applications, but most other applications do not need it.

Working with NIO multiplexing takes a little getting used to. You obtain a Selector using Selector.open( ) , and the equivalent of new ServerSocket(int port) is to obtain an unbound ServerSocketChannel using ServerSocketChannel.open( ) , and then bind it using ServerSocket.bind( ) :

Selector mySelector = Selector.open(  );
ServerSocketChannel serverChannel = ServerSocketChannel.open(  );
serverChannel.socket(  ).bind(new InetSocketAddress(port));

Client SocketChannels are obtained from ServerSocketChannel.accept( ) . It is perfectly possible to multiplex on all client sockets as well as the accept calls of the server socket, but it is also fairly common to have one extra thread dedicated to accepting connections:

while(true)
{
  try
  {
    //This is a blocking call, we operate the accepts in its own
    //separate thread to keep the code a bit cleaner.
    SocketChannel client = ServerSocketChannel.accept(  );
    addToClientList(client);
  }
  catch (Exception e)
  {
    //If it is a problem with the ServerSocketChannel, we
    //may need to close it and restart the ServerSocketChannel
    //Otherwise we should simply log and ignore the error
    ...
  }
}

If you wanted to multiplex the ServerSocketChannel too, it would be:

ServerSocketChannel serverChannel = ServerSocketChannel.open(  );
serverChannel.configureBlocking(false);
serverChannel.socket(  ).bind(new InetSocketAddress(port));
SelectionKey serverChannelAcceptKey = 
  serverChannel.register(mySelector, SelectionKey.OP_ACCEPT);

And you could accept connections by querying the Selector using Selector.selectedKeys( ) to see when the serverChannelAcceptKey was ready. If the key was in the set returned, the ServerSocketChannel could accept a new connection immediately without blocking. Similarly, the client SocketChannels that were created from the ServerSocketChannel.accept( ) call should be registered with a Selector (it doesn't have to be the same selector; you could use one for reads, one for writes, and one for accepts if you prefer). The Selector is then queried for Channels ready to perform I/O.

Links to detailed examples, including full code for a high-performance NIO-based HTTP server, can be found at http://www.JavaPerformanceTuning.com/tips/nio.shtml.