TCP/IP and the Internet

TCP/IP has become the protocol of choice on the Internet—the “network of networks” that evolved from ARPAnet, a packet-switching network that itself evolved from research the U.S. government’s Advanced Research Projects Agency (ARPA) initiated in the 1970s. Subsequently, ARPA acquired a Defense prefix and became DARPA. Under the auspices of DARPA, the TCP/IP protocols emerged as a popular collection of protocols for internetworking—a term used to describe communication among networks.

TCP/IP has flourished for several reasons. A significant reason is that the protocol is open, which means the technical descriptions of the protocol appear in public documents, so anyone can implement TCP/IP on specific hardware and software.

Another, more important, reason for TCP/IP’s success is the availability of sample implementation. Instead of describing network architecture and protocols on paper, each component of the TCP/IP protocol suite began life as a specification with a sample implementation.

Taking Stock of RFCs

The details of each TCP/IP protocol (including TCP and IP, as well as specific service protocols such as SMTP and FTP) are described in documents known as Requests for Comments (RFCs). These documents are freely distributed on the Internet. You can get RFCs from http://www.cis.ohio-state.edu/hypertext/information/rfc.html (click the Index link for a complete index of the RFC or search by keyword). Another good URL for RFCs is http://www.faqs.org/rfcs/.

In fact, the notation used to name Internet resources in a uniform manner is itself documented in an RFC. The notation, known as the Uniform Resource Locator (URL), is described in RFC 1738, “Uniform Resource Locators (URL),” written by, among others, T. Berners-Lee, the originator of the World Wide Web (WWW).

You can think of RFCs as the working papers of the Internet research-and-development community. All Internet standards are published as RFCs. However, many RFCs do not specify any standards; they are informational documents only.

The following are some RFCs you may find interesting:

RFC 768, “User Datagram Protocol (UDP)”
RFC 791, “Internet Protocol (IP)”
RFC 792, “Internet Control Message Protocol (ICMP)”
RFC 793, “Transmission Control Protocol (TCP)”
RFC 854, “TELNET Protocol Specification”
RFC 950, “Internet Standard Subnetting Procedure”
RFC 959, “File Transfer Protocol (FTP)”
RFC 1034, “Domain Names: Concepts and Facilities”
RFC 1058, “Routing Information Protocol (RIP)”
RFC 1112, “Host Extensions for IP Multicasting”
RFC 1155, “Structure and Identification of Management Information for TCP/IP-based Internets”
RFC 1157, “Simple Network Management Protocol (SNMP)”
RFC 1519, “Classless Inter-Domain Routing (CIDR) Assignment and Aggregation Strategy”
RFC 1661, “The Point-to-Point Protocol (PPP)”
RFC 1738, “Uniform Resource Locators (URL)”
RFC 1796, “Not All RFCs Are Standards”
RFC 1855, “Netiquette Guidelines”
RFC 1886, “DNS Extensions to Support IP Version 6”
RFC 1918, “Address Allocation for Private Internets”
RFC 1939, “Post Office Protocol, Version 3 (POP3)”
RFC 1945, “HyperText Transfer Protocol—HTTP/1.0”
RFC 2026, “The Internet Standards Process—Revision 3”
RFC 2028, “The Organizations Involved in the IETF Standards Process”
RFC 2045 through 2049, “Multipurpose Internet Mail Extensions (MIME)” (Parts One through Five)
RFC 2060, “Internet Message Access Protocol—Version 4rev1 (IMAP4)”
RFC 2131, “Dynamic Host Configuration Protocol (DHCP)”
RFC 2146, “U.S. Government Internet Domain Names”
RFC 2151, “A Primer on Internet and TCP/IP Tools and Utilities”
RFC 2305, “A Simple Mode of Facsimile Using Internet Mail”
RFC 2328, “Open Shortest Path First Routing (OSPF) Version 2”
RFC 2368, “The mailto URL scheme”
RFC 2373, “IP Version 6 Addressing Architecture”
RFC 2396, “Uniform Resource Identifiers (URI): Generic Syntax”
RFC 2460, “Internet Protocol, Version 6 (IPv6) Specification”
RFC 2535, “Domain Name System Security Extensions”
RFC 2616, “HyperText Transfer Protocol—HTTP/1.1”
RFC 2660, “The Secure HyperText Transfer Protocol”
RFC 2821, “Simple Mail Transfer Protocol (SMTP)”
RFC 2822, “Internet Message Format”
RFC 2853, “Generic Security Service API Version 2: Java Bindings”
RFC 2854, “The ‘text/html’ Media Type”
RFC 2865, “Remote Authentication Dial In User Service (RADIUS)”
RFC 2870, “Root Name Server Operational Requirements”
RFC 2871, “A Framework for Telephony Routing over IP”
RFC 2900, “Internet Official Protocol Standards”
RFC 2910, “Internet Printing Protocol/1.1: Encoding and Transport”
RFC 2911, “Internet Printing Protocol/1.1: Model and Semantics”
RFC 3013, “Recommended Internet Service Provider Security Services and Procedures”
RFC 3022, “Traditional IP Network Address Translator (Traditional NAT)”
RFC 3076, “Canonical XML Version 1.0”
RFC 3130, “Notes from the State-Of-The-Technology: DNSSEC”
RFC 3174, “US Secure Hash Algorithm 1 (SHA1)”
RFC 3196, “Internet Printing Protocol/1.1: Implementer’s Guide”
RFC 3275, “(Extensible Markup Language) XML-Signature Syntax and Processing”
RFC 3330, “Special-Use IPv4 Addresses”

RFC 3344, “IP Mobility Support for IPv4”

Insider Insight

The RFCs continue to evolve as new technology and techniques emerge. If you work in networking, you should keep an eye on the RFCs to monitor emerging networking protocols. You can check up on the RFCs at http://www.faqs.org/rfcs/.

Understanding IP Addresses

When you have many computers on a network, you need a way to identify each one uniquely. In TCP/IP networking, the address of a computer is known as the IP address. Because TCP/IP deals with internetworking, the address is based on the concepts of a network address and a host address. You might think of the idea of a network address and a host address as having to provide two addresses to identify a computer uniquely:

Network address—Indicates the network on which the computer is located
Host address—Indicates a specific computer on that network

The IP address is a 4-byte (32-bit) value with some of the bits devoted to the network address and the rest used for host address. The convention is to write each byte as a decimal value and to put a dot (.) after each number. Thus, you see network addresses such as 132.250.112.52. This way of writing IP addresses is known as dotted-decimal or dotted-quad notation. Each of the four decimal numbers in the dotted- decimal notation is often referred to as an octet. Note that each decimal number in the dotted-decimal notation must be between 0 and 255 because that’s the range of values a byte can hold.

The bits in an IP address are organized in the following manner:

<Network Address, Host Address>

In other words, a specified number of bits of the 32-bit IP address are used as a network address, and the rest of the bits are interpreted as a host address. The network address identifies the LAN to which your PC is connected and the host address identifies your PC one of many hosts within the LAN. Other PCs in the LAN share the same network address, but have different bits in the host address portion fo their IP addresses.

When IP addresses were initially allocated to organizations, a class system was devised to accommodate networks of various sizes (the network size is the number of computers in that network). Although the network classes are largely ignored nowadays, it is important to understand what the network classes are and how they work because they linger on in the backbone of the Internet. In place of predefined classes, IP networks use classless addressing schemes with network marks to separate out the network address from the host address.

There are five classes of IP addresses, named class A through class E, as shown in Figure 6-3.

Figure 6-3: Classes of IP Addresses.

Of the five address classes, only classes A, B, and C are used for addressing networks and hosts; classes D and E addresses are reserved for special use.

Class A addresses support 126 networks, each with up to 16 million hosts. Although the network address is 7-bit, two values (0 and 127) have special meaning; therefore, you can have only 1 through 126 as Class A network addresses. There can be approximately 2 billion class A hosts.

Class B addresses are for networks with up to 65,534 hosts. There can be at most 16,384 class B networks. All class B networks, taken together, can have approximately 1 billion hosts.

Class C addresses are meant for small organizations. Each class C address allows up to 254 hosts, and there can be approximately 2 million class C networks. Therefore, there can be at most approximately 500 million class C hosts. If you are in a small company, you probably have a class C address. Nowadays, it is customary to aggregate multiple class C addresses into a single block and use them for efficient routing.

All together, class A, B, and C networks can support at most approximately 3.5 billion hosts.

You can tell the class of an IP address by the first number in the dotted-decimal notation, as follows:

Class A addresses: 1.xxx.xxx.xxx through 126.xxx.xxx.xxx
Class B addresses: 128.xxx.xxx.xxx through 191.xxx.xxx.xxx
Class C addresses: 192.xxx.xxx.xxx through 223.xxx.xxx.xxx

Even within the five address classes, the following IP addresses have special meaning:

An address with all zeros in its network portion indicates the local network—the network where the data packet with this IP address originated. Thus, the address 0.0.0.200 means host number 200 on this class C network.
The class A address 127.xxx.xxx.xxx is used for loopback—communications within the same host. Conventionally, 127.0.0.1 is used as the loopback address. Processes that need to communicate through TCP with other processes on the same host use the loopback address to avoid having to send packets out on the network.
Turning on all the bits in any part of the address indicates a broadcast message. The address 128.18.255.255, for example, means all hosts on the class B network 128.18. The address 255.255.255.255 is known as a limited broadcast; all workstations on the current network segment will receive the packet.

Getting IP Addresses for Your Network

If you are setting up an independent network of your own that will be connected to the Internet, you need unique IP addresses for your network. You would typically get a range of IP addresses for your network from the ISP who connects your network to the Internet. You can get the domain name from one of the Internet domain name registration services. For example, for the .com domain, you can obtain domain names from VeriSign located on the Web at http://www.networksolutions.com/. To learn more about domain name and IP address services, point your Web browser to the InterNIC website at http://www.internic.net/.

ISPs typically get their IP address allocation in large blocks from regional Internet registries such as ARIN (American Registry for Internet Numbers, http://www.arin.net/) in the United States, RIPE (Réseaux IP Européens, http://www.ripe.net/) in Europe, and APNIC (Asia Pacific Network Information Centre, http://www.apnic.net/) for the Asia-Pacific region. For more information about IP address allocation services, visit the Internet Assigned Numbers Authority (IANA) website at http://www.iana.org/ipaddress/ip-addresses.htm.

If you don’t plan to connect your network to the Internet, you really don’t need a unique IP address. RFC 1918 (the “Address Allocation for Private Internets” section) provides guidance about what IP addresses you can use within private networks (the term private Internet refers to any network not connected to the Internet). Three blocks of IP addresses are reserved for private Internets:

10.0.0.0 to 10.255.255.255
172.16.0.0 to 172.31.255.255
192.168.0.0 to 192.168.255.255

You can use addresses from these blocks for your private network without having to coordinate with an organization. For example, I (and many others) use the 192.168.0.0 Class C address for a home network. Additionally, the cable/DSl routers use one of these private Class C addresses for the local network interface.

If you have only one public Internet address from your ISP, you can still use a Network Address Translation (NAT) router to connect your network with private IP addresses to the public Internet.

Figuring Out Network Masks

The network mask is an IP address that has 1s in the bits that correspond to the network address, and 0s in all other bit positions. The class of your network address determines the network mask.

If you have a class C address, for example, the network mask is 255.255.255.0. Thus, class B networks have a network mask of 255.255.0.0, and class A networks have 255.0.0.0 as the network mask. Of course, you do not have to use the historical class A, B, or C network masks. Nowadays, you can use any other network mask that’s appropriate for your network address.

Extracting Network Addresses

The network address is the bitwise AND of the network mask with any IP address in your network. If the IP address of a system on your network is 206.197.168.200, and the network mask is 255.255.255.0, the network address is 206.197.168.0. The network address is written with zero bits in the part of the address that’s supposed to be for the host address.

Using Subnets

If your site has a class B address, you get one network number, and that network can have up to 65,534 hosts. Even if you work for a megacorporation that has thousands of hosts, you may want to divide your network into smaller subnetworks (or subnets). If your organization has offices in several locations, for example, you may want each office to be on a separate network. You can do this by taking some bits from the host-address portion of the IP address and assigning those bits to the network address. This procedure is known as defining a subnet mask.

Caution

Do not confuse an IP subnet, which is a logical division of a network, with Ethernet segments, which refer to physical divisions of an Ethernet network.

Essentially, when you define a subnet mask, you add more bits to the default network mask for that address class. If you have a class B network, for example, the default network mask would be 255.255.0.0. Then, if you decide to divide your network into 128 subnetworks, each of which has 512 hosts, you would designate 7 bits from the host address space as the subnet address. Thus, the subnet mask becomes 255.255.254.0.

Using Supernets or CIDR

There are so few class A and B network addresses that they are becoming scarce. Class C addresses are more plentiful, but the proliferation of class C addresses has introduced a unique problem. Each class C address needs an entry in the network routing tables—the tables that contain information about how to locate any network on the Internet. Too many class C addresses means too many entries in the routing tables, which causes the router’s performance to deteriorate. One way to get around this problem is ignore the predefined address classes and let the network address be any number of bits. All you need is for the network mask to figure out which part of the 32-bit IP address is the network address. Based on this idea the Classless Inter-Domain Routing (CIDR)—documented in RFC 1519—was developed to enable routing of contiguous blocks of class C addresses with a single entry in the routing table. CIDR is used in the Internet as the primary mechanism to improve scalability of the Internet routing system.

The basis of CIDR is the idea of supernets—arbitrarily-sized networks created by combining contiguous class C addresses that satisfy some criteria. For example, to create a supernet from two class C networks, the two network addresses must satisfy the following properties:

The network addresses must be consecutive (for example, 198.41.18.0 and 198.41.19.0 are consecutive class C addresses).
The third number of the first network address must be divisible by 2 (for example, the third number of 198.41.18.0 is 18, and 18 is divisible by 2).

Thus, you could combine 198.41.18.0 and 198.41.19.0 into a single block, but you cannot combine 198.41.15.0 with 198.41.16.0 because 15 is not divisible by 2. When you create a supernet of two class C networks, the network can have up to 512 host addresses, and the network mask becomes 255.255.254.0, which leaves 9 bits for the host address.

You can also supernet any number of class C networks in powers of two. The only requirement is that the third number (in the dotted-decimal notation) of the first address must be divisible by the number of networks you are combining. Thus, if you are supernetting eight networks, the third number of the first address must be divisible by 8. Thus, you could supernet the following eight consecutive class C networks:
```
198.41.16.0
198.41.17.0
198.41.18.0
198.41.19.0
198.41.20.0
198.41.21.0
198.41.22.0
198.41.23.0
```
The network mask of this supernet would be 255.255.248.0, which provides for 21 bits of network address and leaves 11 bits for 8x256 = 2,048 host addresses.

Such a network address is written with the notation /21 to indicate that there are 21 bits in the network address.

Learning about IPv6

When the 4-byte IP address was created, the number of addresses seemed to be adequate. By now, however, class A and B addresses are running out, and class C addresses are being depleted at a fast rate. The Internet Engineering Task Force (IETF) recognized the potential for running out of IP addresses in 1991, and work began then on the next-generation IP addressing scheme, named IPng, which will eventually replace the old 4-byte addressing scheme (called IPv4, for IP Version 4).

Several alternative addressing schemes for IPng were proposed and debated. The final contender, with a 128-bit (16-byte) address, was dubbed IPv6 (for IP Version 6). On September 18, 1995, the IETF declared the core set of IPv6 addressing protocols to be an IETF Proposed Standard. By now, there are many RFCs dealing with various aspects of IPv6, from IPv6 over PPP for the transmission of IPv6 packets over Ethernet.

IPv6 is designed to be an evolutionary step from IPv4. The proposed standard provides direct interoperability between hosts using the older IPv4 addresses and any new IPv6 hosts. The idea is that users can upgrade their systems to use IPv6 when they want and that network operators are free to upgrade their network hardware to use IPv6 without affecting current users of IPv4. Sample implementations of IPv6 are being developed for many operating systems, including Linux. For more information about IPv6 in Linux, consult the Linux IPv6 FAQ/HOWTO at http://www.linuxhq.com/IPv6/.

The IPv6 128-bit addressing scheme allows for 170,141,183,460,469,232,000,000,000, 000,000,000,000 unique hosts! That should last us for a while!

Routing TCP/IP Packets

Routing refers to the task of forwarding information from one network to another. Consider the two class C networks 206.197.168.0 and 164.109.10.0. You need a routing device to send packets from one of these networks to the other.

Because a routing device facilitates data exchange between two networks, it has two physical network connections, one on each network. Each network interface has its own IP address, and the routing device essentially passes packets back and forth between the two network interfaces. Figure 6-4 illustrates how a routing device has a physical presence in two networks and how each network interface has its own IP address.

Figure 6-4: A Routing Device Allows Packet Exchange between Two Networks.

The generic term “routing device” can refer to a general-purpose computer with two network interfaces or a dedicated device designed specifically for routing. Such dedicated routing devices are known as routers.

Insider Insight

The generic term “gateway” also refers to any routing device regardless of whether the device is another PC or a router. For good performance (a high packet-transfer rate), you want a dedicated router, whose sole purpose is to route packets of data in a network.

Later, when you learn how to set up a TCP/IP network in Linux, you’ll have to specify the IP address of your network’s gateway. If your Linux system gets its IP address from a DHCP (Dynamic Host Configuration Protocol) server, then that DHCP server can also provide the gateway address.

A single routing device, of course, does not connect all the networks in the world; packets get around in the Internet from one gateway to another. Any network connected to another network has a designated gateway. You can even have specific gateways for specific networks. As you’ll learn, a routing table keeps track of the gateway associated with an external network and the type of physical interface (such as Ethernet or Point-to-Point Protocol over serial lines) for that network. A default gateway gets packets that are addressed to any unknown network.

In your local area network, all packets addressed to another network go to your network’s default gateway, except for those addresses that are explicitly directed elsewhere by static routes. If that gateway is physically connected to the destination network, the story ends there because the gateway can physically send the packets to the destination host. If that gateway does not know the destination network, however, it sends the packets to the next default gateway (the gateway for the other network on which your gateway also “lives”). In this way, packets travel from one gateway to the next until they reach the destination network (or you get an error message saying that the destination network is unreachable).

To send packets around in the network efficiently, routers exchange information (in the form of routing tables), so that each router can have a “map” of the network in its vicinity. Routers exchange information by using a routing protocol from a family of protocols, known as the Interior Gateway Protocol (IGP). A commonly used Interior Gateway Protocol is the Routing Information Protocol (RIP). Another, more recent, Interior Gateway Protocol is the Open Shortest Path First (OSPF) protocol.

In TCP/IP routing, any time a packet passes through a router, it has made what is considered a hop. In RIP, the maximum size of the Internet is 15 hops. A network is considered to be unreachable from your network if a packet does not reach the destination network within 15 hops. In other words, any network more than 15 routers away is considered to be unreachable. The newer OSPF routing protocol uses a different metric for measuring the quality of different network paths; therefore, it can have hops greater than 15.

Within a single network, you don’t need a router as long as you do not use a subnet mask to break the single IP network into several subnets. In that case, however, you have to set up routers to send packets from one subnet to another.

Understanding the Domain Name System (DNS)

You can access any host computer in a TCP/IP network with an IP address. Remembering the IP addresses of even a few hosts of interest, however, is tedious. This fact was recognized from the beginning of TCP/IP, and the association between a hostname and IP address was created. The concept is similar to that of a phone book, in which you can look up a telephone number by searching for a person’s name.

In the early days of the Internet, the association between names and IP addresses was maintained in a text file named HOSTS.TXT at the Network Information Center (NIC), which was located in the Stanford Research Institute (SRI). This file contained the names and corresponding IP addresses of networks, hosts, and routers on the Internet. All hosts on the Internet used to transfer that file by FTP. (Can you imagine all hosts getting a file from a single source in today’s Internet?) As the number of Internet hosts increased, the single file idea became unmanageable. The hosts file was becoming difficult to maintain, and it was hard for all the hosts to update their hosts file in a timely manner. To alleviate the problem, RFC 881 introduced the concept of and plans for domain names in November 1983. Eventually, in 1987 this led to the Domain Name System (DNS) as we know it today (documented in RFCs 1032, 1033, 1034, and 1035).

DNS provides a hierarchical naming system much like your postal address, which you can read as “your name” at “your street address” in “your city” in “your state” in “your country.” If I know your full postal address, I can locate you by starting with your city in your country. Then, I’d locate the street address to find your home, ring the doorbell, and ask for you by name.

DNS essentially provides an addressing scheme for an Internet host that is much like the postal address. The entire Internet is subdivided into several domains, such as gov, edu, com, mil, and net. Each domain is further subdivided into subdomains. Finally, within a subdomain, each host is given a symbolic name. To write a host’s fully qualified domain name (FQDN), string together the hostname, subdomain names, and domain name with dots (.) as separators. Following is the full domain name of a host named ADDLAB in the subdomain NWS within another subdomain NOAA in the GOV domain: ADDLAB.NWS.NOAA.GOV. Note that domain names are not case sensitive. By the way, a single dot (.) represents the topmost level (root) of the domain-name hierarchy.

Figure 6-5 illustrates part of the Internet Domain Name System, showing the location of the host ADDLAB.NWS.NOAA.GOV.

Figure 6-5: Part of the Internet Domain-Name Hierarchy.

For a commercial system in the COM domain, the name of a host might be as simple as REDHAT.COM. Of course, within REDHAT.COM you could have many subdomains such as FTP.REDHAT.COM, MAIL.REDHAT.COM, and so forth.

The convention for the email address of a user on a system is to append an at sign (@) to the user name (the name under which the user logs in) and then append the system’s fully qualified domain name. Thus, refer to the user named webmaster at the host gao.gov as webmaster@GAO.GOV (unlike hostnames, user names are case sensitive).

TCP/IP network applications resolve a hostname to an IP address by consulting a name server, which is another host that’s accessible from your network. If you decide to use the Domain Name System (DNS) on your network, you have to set up a name server in your network or indicate a name server (by an IP address).

Cross Ref

Later sections of this chapter discuss the configuration files /etc/host.conf and /etc/resolv.conf, through which you specify how hostnames are converted to IP addresses. In particular, you specify the IP addresses of a name server in the /etc/resolv.conf file.

If you do not use DNS, you still can have host name-to-IP address mapping through a text file named /etc/hosts. The entries in a typical /etc/hosts file might look like the following example:

# Lines like these are comments
# You must have the localhost line in /etc/hosts file
127.0.0.1       localhost.localdomain localhost
192.168.0.100   lnbp933  lnbp933.local.net
192.168.0.60    lnbp600
192.168.0.200   lnbp200  lnbp200.local.net
192.168.0.40    lnbp400  lnbp400.local.net
192.168.0.25    mac      lnbmac  lnbmac.local.net

As the example shows, the file lists a hostname for each IP address. The IP address and hostnames are different for your system, of course.

Insider Insight

One problem with relying on the /etc/hosts file for name lookup is that you have to replicate this file on each system on your network. This procedure can become a nuisance even in a network that has only five or six systems.