D.1 Network Basics

This section briefly introduces the function of the TCP and IP protocols, and explains how IP addresses and ports are used for communication on the Internet.

D.1.1 TCP/IP

The Transmission Control Protocol (TCP) and the Internet Protocol (IP) manage the sending and receiving of messages over the Internet.

The Web is a network application that uses the services of TCP and IP. When a web browser requests a page from a web server, the TCP/IP services provide a virtual connection?a virtual circuit?between the two communicating systems. (The connection is virtual because the Internet doesn't operate like an old telephone network. It doesn't create an actual circuit dedicated to a particular call.)

Once a connection is established and acknowledged, the two systems can communicate by sending messages. These messages can be large, such as the binary representation of an image, and TCP may fragment the data into a series of IP datagrams. An IP datagram is like a postage envelope: it holds all or part of a message and it's labeled with a destination address and other fields that manage its transmission from the sender to the receiver.

Each node in the network runs IP software, and the software moves the datagrams through the network. When an IP node receives a datagram, it inspects the address and other header fields, looks up a table of routing information, and sends it on to the next node. Often these nodes are dedicated routers?systems that form interconnections between networks?but the nodes can also include standard machines. IP datagrams are totally independent of each other: the IP software just moves them from node to node through a network.

TCP software performs the function of gluing the fragments in IP datagrams together at the destination. For example, if a large image is broken into ten parts that are each stored in a datagram, then TCP reassembles those parts into a whole. TCP also makes sure the process is robust: if an IP datagram goes missing, a datagram is corrupted, duplicate datagrams arrive, or datagrams arrive out of order, then TCP looks after re-requesting, throwing away duplicates or erroneous data, or sorting out ordering.

D.1.2 IP Addresses

To allow communication over heterogeneous networks, each with its own addressing standard, every location in a network needs a globally unique IP address. A computer that is connected to the Internet needs at least one IP address; a node that interconnects two networks needs two.

IP (version 4) addresses are 32-bit numbers that are usually represented as a series of four decimal numbers between 0 and 255, separated by a period. An example IP address is 134.148.250.28. Some IP addresses have special meanings. For example, the IP addresses 127.0.0.0 and 127.0.0.1 are reserved for loopback testing on a host. If a connection is to be made from a client to server, both running on the same machine, the address 127.0.0.1 can be used. This address loops back to 127.0.0.0, the localhost.

If you've got your own private network at home or at work, then it's likely you're using addresses such as 192.168.1.1 and 192.168.1.2. Addresses in the range 192.168.0.0 to 192.168.255.255 are reserved for this purpose and are never used on the Internet. (There are also other ranges that are not used on the Internet, but we don't discuss this in detail here.)

It's inconvenient to remember IP addresses. For example, it's much easier for Hugh to remember that hugh.cs.rmit.edu.au is his machine at work than to remember its IP address is 131.170.27.120. For this reason, most IP addresses have one or more equivalent domain names that we use to log in, access web sites, and so on. The mapping of IP addresses to names and back again is managed by domain name servers. When you set up a new domain and host it on a server, the domain name servers responsible for finding the system are usually informed about the mapping between the machine's IP address and your new domain name.

D.1.2.1 Ports

When a virtual connection is set up between two communicating systems, each end is tied to a port. The port is an identifier used by the TCP software rather than an actual physical device, and it allows multiple network connections to be made on one machine by different applications.

When a message is received by the TCP software running on a host computer, the data is sent to the correct application based on the port number. By convention, a well-known port is normally used by a server providing a service that has seen widespread adoption. A list of well-known ports for various applications is maintained by Internet Assigned Number Authority (IANA) and can be found at http://www.iana.org/assignments/port-numbers. For example, the File Transfer Protocol (FTP) uses port 21, and a web server uses port 80.

Systems with TCP/IP software installed have a services file that lists the ports used on that machine. This file is often preconfigured for common applications and is maintained by the system administrator to reflect the actual port usage on the machine. For example, this file is usually /etc/services on a Unix system.