The ping command tests whether a remote host can be reached from your computer. This simple function is extremely useful for testing the network connection, independent of the application in which the original problem was detected. ping allows you to determine whether further testing should be directed toward the network connection (the lower layers) or the application (the upper layers). If ping shows that packets can travel to the remote system and back, the user's problem is probably in the upper layers. If packets can't make the round trip, lower protocol layers are probably at fault.
Frequently a user reports a network problem by stating that he can't telnet (or ftp, or send email, or whatever) to some remote host. He then immediately qualifies this statement with the announcement that it worked before. In cases like this, where the ability to connect to the remote host is in question, ping is a very useful tool.
Using the hostname provided by the user, ping the remote host. If your ping is successful, have the user ping the host. If the user's ping is also successful, concentrate your further analysis on the specific application that the user is having trouble with. Perhaps the user is attempting to telnet to a host that provides only anonymous ftp. Perhaps the host was down when the user tried his application. Have the user try it again, while you watch or listen to every detail of what he is doing. If he is doing everything right and the application still fails, detailed analysis of the application with snoop and coordination with the remote system administrator may be needed.
If your ping is successful and the user's ping fails, concentrate testing on the user's system configuration, and on those things that are different about the user's path to the remote host when compared to your path to the remote host.
If your ping fails, or the user's ping fails, pay close attention to any error messages. The error messages displayed by ping are helpful guides for planning further testing. The details of the messages may vary from implementation to implementation, but there are only a few basic types of errors:
The remote host's name cannot be resolved by name service into an IP address. The name servers could be at fault (either your local server or the remote system's server), the name could be incorrect, or something could be wrong with the network between your system and the remote server. If you know the remote host's IP address, try to ping that. If you can reach the host using its IP address, the problem is with name service. Use nslookup or dig to test the local and remote servers, and to check the accuracy of the hostname the user gave you.
The local system does not have a route to the remote system. If the numeric IP address was used on the ping command line, re-enter the ping command using the hostname. This eliminates the possibility that the IP address was entered incorrectly, or that you were given the wrong address. If a routing protocol is being used, make sure it is running and check the routing table with netstat. If a static default route is being used, reinstall it. If everything seems fine on the host, check its default gateway for routing problems.
The remote system did not respond. Most network utilities have some version of this message. Some ping implementations print the message "100% packet loss." telnet prints the message "Connection timed out" and sendmail returns the error "cannot connect." All of these errors mean the same thing. The local system has a route to the remote system, but it receives no response from the remote system to any of the packets it sends.
There are many possible causes of this problem. The remote host may be down. Either the local or the remote host may be configured incorrectly. A gateway or circuit between the local host and the remote host may be down. The remote host may have routing problems. Only additional testing can isolate the cause of the problem. Carefully check the local configuration using netstat and ifconfig. Check the route to the remote system with traceroute. Contact the administrator of the remote system and report the problem.
All of the tools mentioned here will be discussed later in this chapter. However, before leaving ping, let's look more closely at the command and the statistics it displays.
The basic format of the ping command on a Solaris system is:[1]
[1] Check your system's documentation. ping varies slightly from system to system. On Linux, the format shown above would be: ping [-c count] [-s packetsize] host.
ping host [packetsize] [count]
The hostname or IP address of the remote host being tested. Use the hostname or address provided by the user in the trouble report.
Defines the size in bytes of the test packets. This field is required only if the count field is going to be used. Use the default packetsize of 56 bytes.
The number of packets to be sent in the test. Use the count field, and set the value low. Otherwise, the ping command may continue to send test packets until you interrupt it, usually by pressing Ctrl-C (^C). Sending excessive numbers of test packets is not a good use of network bandwidth and system resources. Usually five packets are sufficient for a test.
To check that ns.uu.net can be reached from crab, we send five 56-byte packets with the following command:
% ping -s ns.uu.net 56 5
PING ns.uu.net: 56 data bytes
64 bytes from ns.uu.net (137.39.1.3): icmp_seq=0. time=32.8 ms
64 bytes from ns.uu.net (137.39.1.3): icmp_seq=1. time=15.3 ms
64 bytes from ns.uu.net (137.39.1.3): icmp_seq=2. time=13.1 ms
64 bytes from ns.uu.net (137.39.1.3): icmp_seq=3. time=32.4 ms
64 bytes from ns.uu.net (137.39.1.3): icmp_seq=4. time=28.1 ms
----ns.uu.net PING Statistics----
5 packets transmitted, 5 packets received, 0% packet loss
round trip (ms) min/avg/max = 13.1/24.3/32.8
The -s option is included because crab is a Solaris workstation, and we want packet-by-packet statistics. Without the -s option, Sun's ping command prints only a summary line saying "ns.uu.net is alive." Other ping implementations do not require the -s option; they display the statistics by default, as the Linux example below shows:
$ ping -c5 ns.uu.net
PING ns.uu.net (137.39.1.3) from 172.16.12.3 : 56(84) bytes of data.
64 bytes from ns.UU.NET (137.39.1.3): icmp_seq=0 ttl=244 time=98.283 msec
64 bytes from ns.UU.NET (137.39.1.3): icmp_seq=1 ttl=244 time=94.114 msec
64 bytes from ns.UU.NET (137.39.1.3): icmp_seq=2 ttl=244 time=66.565 msec
64 bytes from ns.UU.NET (137.39.1.3): icmp_seq=3 ttl=244 time=24.301 msec
64 bytes from ns.UU.NET (137.39.1.3): icmp_seq=4 ttl=244 time=37.060 msec
--- ns.uu.net ping statistics ---
5 packets transmitted, 5 packets received, 0% packet loss
round trip min/avg/max/mdev = 24.301/64.064/98.283/29.634 ms
Both tests show a good wide area network link to ns.uu.net with no packet loss and a fast response. The round trip between almond and ns.uu.net took an average of only 24.3 milliseconds. A small packet loss, and a round trip time an order of magnitude higher, would not be abnormal for a connection made across a wide area network. The statistics displayed by the ping command can indicate low-level network problems. The key statistics are:
The sequence in which the packets are arriving, as shown by the ICMP sequence number (icmp_seq) displayed for each packet.
How long it takes a packet to make the round trip, displayed in milliseconds after the string time=.
The percentage of packets lost, displayed in a summary line at the end of the ping output.
If the packet loss is high, the response time is very slow, or packets are arriving out of order, there could be a network hardware problem. If you see these conditions when communicating over great distances on a wide area network, there is nothing to worry about. TCP/IP was designed to deal with unreliable networks, and some wide area networks suffer a lot of packet loss. But if these problems are seen on a local area network, they indicate trouble.
On a local network cable segment, the round trip time should be near 0, there should be little or no packet loss, and the packets should arrive in order. If these things are not true, there is a problem with the network hardware. On an Ethernet, the problem could be improper cable termination, a bad cable segment, or a bad piece of "active" hardware, such as a hub, switch, or transceiver. Check the cable with a cable tester as described earlier. Good hubs and switches often have built-in diagnostic software that can be checked. Cheap hubs and transceivers may require the "brute force" method of disconnecting individual pieces of hardware until the problem goes away.
The results of a simple ping test, even if the ping is successful, can help you direct further testing toward the most likely causes of the problem. But other diagnostic tools are needed to examine the problem more closely and find the underlying cause.