14.4 Network I/O

At the network level, many things can affect performance. The bandwidth (the amount of data that can be carried by the network) tends to be the first culprit checked. Assuming you have determined that bad performance is attributable to the network component of an application, there are more likely causes for the poor performance than network bandwidth. The most likely cause of bad network performance is the application itself and how it is handling distributed data and functionality. I consider distributed-application tuning in several chapters (notably Chapter 12), but this section provides lower-level information to assist you in tuning your application and also considers nonapplication causes of bad performance.

The overall speed of a particular network connection is limited by the slowest link in the connection chain and the length of the chain. Identifying the slowest link is difficult and may not even be consistent: it can vary at different times of the day or for different communication paths. A network communication path can lead from an application through a TCP/IP stack (which adds various layers of headers, possibly encrypting and compressing data as well), then through the hardware interface, through a modem, over a phone line, through another modem, over to a service provider's router, through many heavily congested data lines of various carrying capacities and multiple routers with differing maximum throughputs and configurations, to a machine at the other end with its own hardware interface, TCP/IP stack, and application. A typical web download route is just like this. In addition, there are dropped packets, acknowledgments, retries, bus contention, and so on.

Because so many possible causes of bad network performance are external to an application, one option you can consider including in an application is a network speed-testing facility that reports to the user. This should test the speed of data transfer from the machine to various destinations: to itself, to another machine on the local network, to the Internet service provider, to the target server across the network, and to any other destinations appropriate. This type of diagnostic report can tell your users that they are obtaining bad performance from something other than your application. If you feel that the performance of your application is limited by the actual network communication speed, and not by other (application) factors, this facility will report the maximum possible speeds to your users (and put the blame for poor network performance outside your application, where it belongs).

14.4.1 Latency

Latency is different from the load-carrying capacity (bandwidth) of a network. Bandwidth refers to how much data can be sent down the communication channel for a given period of time (e.g., 64 kilobits per second) and is limited by the link in the communication chain that has the lowest bandwidth. The latency is the amount of time a particular data packet takes to get from one end of the communication channel to the other. Bandwidth tells you the limits within which your application can operate before the performance becomes affected by the volume of data being transmitted. Latency often affects the user's view of the performance even when bandwidth isn't a problem. For example, on a LAN, latency might be 10 milliseconds. In this case, you can ignore latency considerations unless your application is making a large number of transmissions. If your application is making a large number of transmissions, you need to tune the application to reduce the number of transmissions being made. (That 10 ms overhead added to every transmission can add up if you just ignore it and treat the application as if it were not distributed.)

In most cases, especially Internet traffic, latency is an important concern. You can determine the basic round-trip time for data packets from any two machines using the ping utility.^[3] This utility provides a measure of the time it takes a packet of data to reach another machine and be returned. However, the time measure is for a basic underlying protocol packet (ICMP packet) to travel between the machines. If the communication channel is congested and the overlying protocol requires retransmissions (often the case for Internet traffic), one transmission at the application level can actually be equivalent to many round trips.

^[3] ping may not always give a good measure of the round-trip time because ICMP has a low priority in some routers.

If, for instance, the round-trip time is 400 ms (not unusual for an Internet link), this is the basic overhead time for any request sent to a server and the reply to return, without even adding any processing time for the request. If you are using TCP/IP and retransmissions are needed because some packets are dropped (TCP automatically handles this as needed), each retransmission adds another 400 ms to the request response time. If the application is conversational, requiring many data transmissions to be sent back and forth before the request is satisfied, each intermediate transmission adds a minimum of 400 ms of network delay, again without considering TCP retransmissions. The time can easily add up if you are not careful.

It is important to be aware of these limitations. It is often possible to tune the application to minimize the number of transfers by packaging data together, caching, and redesigning the distributed-application protocol to aim for a less conversational mode of operation. At the network level, you need to monitor the transmission statistics (using the ping and netstat utilities and packet sniffers) and consider tuning any network parameters that you have access to in order to reduce retransmissions.

14.4.2 TCP/IP Stacks

The TCP/IP stack is the section of code that is responsible for translating each application-level network request (send, receive, connect, etc.) through the transport layers down to the wire and back up to the application at the other end of the connection. Because the stacks are usually delivered with the operating system and performance-tested before delivery (since a slow network connection on an otherwise fast machine and fast network is pretty obvious), it is unlikely that the TCP/IP stack itself is a performance problem.

Some older versions of Windows TCP/IP stacks, both those delivered with the OS and others, had performance problems, as did some versions of TCP/IP stacks on the Macintosh OS (up to and including System 7.1). Stack performance can be difficult to trace. Because the TCP/IP stack is causing a performance problem, it affects all network applications running on that machine. In the past I have seen isolated machines on a lightly loaded network with an unexpectedly low transfer speed for FTP transfers compared to other machines on the same network. Once you suspect the TCP/IP stack, you need to probe the speed of the stack. Testing the loopback address (127.0.0.0) may be a good starting point, though this address may be optimized by the stack. The easiest way to avoid the problem is to ensure you are using recent versions of TCP/IP stacks.

In addition to the stack itself, stacks include several tuneable parameters. Most of these parameters deal with transmission details beyond the scope of this book. One parameter worth mentioning is the maximum packet size. When your application sends data, the underlying protocol breaks the data into packets that are transmitted. There is an optimal size for packets transmitted over a particular communication channel, and the packet size actually used by the stack is a compromise. Smaller packets are less likely to be dropped, but they introduce more overhead, as data probably has to be broken up into more packets with more header overhead.

If your communication takes place over a particular set of endpoints, you may want to alter the packet sizes. For a LAN segment with no router involved, the packets can be big (e.g., 8KB). For a LAN with routers, you probably want to set the maximum packet size to the size the routers allow to pass unbroken. (Routers can break up the packets into smaller ones; 1500 bytes is the typical maximum packet size and the standard for Ethernet. The maximum packet size is configurable by the router's network administrator.) If your application is likely to be sending data over the Internet and you cannot guarantee the route and quality of routers it will pass through, 500 bytes per packet is likely to be optimal.

14.4.3 Network Bottlenecks

Other causes of slow network I/O can be attributed directly to the load or configuration of the network. For example, a LAN may become congested when many machines are simultaneously trying to communicate over the network. The potential throughput of the network could handle the load, but the algorithms to provide communication channels slow the network, resulting in a lower maximum throughput. A congested Ethernet network has an average throughput approximately one-third the potential maximum throughput. Congested networks have other problems, such as dropped network packets. If you are using TCP, the communication rate on a congested network is much slower as the protocol automatically resends the dropped packets. If you are using UDP, your application must resend multiple copies for each transfer. Dropping packets in this way is common for the Internet. For LANs, you need to coordinate closely with network administrators to alert them to the problems. For single machines connected by a service provider, there are several things you can do. First, there are some commercial utilities available that probe your configuration and the connection to the service provider, suggesting improvements. The phone line to the service provider may be noisier than expected: if so, you also need to speak to the phone line provider. It is also worth checking with the service provider, who should have optimal configurations they can demonstrate.

Dropped packets and retransmissions are a good indication of network congestion problems, and you should be on constant lookout for them. Dropped packets often occur when routers are overloaded and find it necessary to drop some of the packets being transmitted as the router's buffers overflow. This means that the overlying protocol will request the packets to be resent. The netstat utility lists retransmission and other statistics that can identify these sorts of problems. Retransmissions may indicate that the maximum packet size is too large.

14.4.4 DNS Lookups

Looking up network addresses is an often-overlooked cause of bad network performance. When your application tries to connect to a network address such as foo.bar.something.org (e.g., downloading a web page from http://foo.bar.something.org), your application first translates foo.bar.something.org into a four-byte network IP address such as 10.33.6.45. This is the actual address that the network understands and uses for routing network packets. The way this translation works is that your system is configured with some seldom-used files that can specify this translation, and a more frequently used Domain Name System (DNS) server that can dynamically provide you with the address from the given string. DNS translation works as follows:

The machine running the application sends the text string of the hostname (e.g., foo.bar.something.org) to the DNS server.
The DNS server checks its cache to find an IP address corresponding to that hostname. If the server does not find an entry in the cache, it asks its own DNS server (usually further up the Internet domain-name hierarchy) until ultimately the name is resolved. (This may be by components of the name being resolved, e.g., first .org, then something.org, etc., each time asking another machine as the search request is successively resolved.) This resolved IP address is added to the DNS server's cache.
The IP address is returned to the original machine running the application.
The application uses the IP address to connect to the desired destination.

The address lookup does not need to be repeated once a connection is established, but any other connections (within the same session of the application or in other sessions at the same time and later) need to repeat the lookup procedure to start another connection.^[4]

^[4] A session can cache the IP address explicitly after the first lookup, but this needs to be done at the application level by holding on to the InetAddress object.

You can improve this situation by running a DNS server locally on the machine, or on a local server if the application uses a LAN. A DNS server can be run as a "caching only" server that resets its cache each time the machine is rebooted. There would be little point in doing this if the machine used only one or two connections per hostname between successive reboots. For more frequent connections, a local DNS server can provide a noticeable speedup to connections. nslookup is useful for investigating how a particular system does translations.