Between autonomous systems, exterior gateway protocols (EGPs) distribute interdomain routing information, or (to be more precise) network layer reachability information (NLRI). The purpose of this approach is to create a loop-free view of the Internet in terms of AS paths and related path attributes. The term EGP refers to both the generic family of exterior routing protocols as well as a particular archaic protocol also called EGP, the ancestor of today's predominant signaling protocol, the Border Gateway Protocol version 4 (BGPv4).
The following subsections introduce general aspects of interdomain EGP routing and gradually concentrate on BGPv4 signaling and operation.
BGPv4: Introductory Thoughts
BGP prefix routes carry multiple attributes, in particular one AS_Path itself, for both loop prevention and administrative granularity. Because of this rich set of attributes, BGP offers extended capabilities for policy-based routing, which is of paramount importance to represent complex policies of interprovider communication. Therefore, BGP is the glue that holds the Internet together. The Internet itself essentially consists of transit autonomous systems and stub autonomous systems (as shown in Figure 10-1).
Figure 10-1. The Architecture of the Internet
Carriers form the heart of the Internet and are classified into tier 1 (no further upstream) and tier 2 carriers that usually interconnnect at commercial exchange points (MAEs, or metropolitan-area exchanges), IXs (Internet exchanges), or NAPs (network access points). Today, these interconnection points are switched Ethernet colocation centers with frequent deployments of route servers, looking-glass access, and connectivity to the Internet Route Registry (IRR).
Peering, upstream, and subscriber agreements govern neighborship relations. A tier 1 carrier is a telco or Internet service provider (ISP) that is at the top of the Internet telecommunication hierarchy and owns its own network cable infrastructure. These are global players such as Cable & Wireless, AT&T, Sprint, and British Telecom, just to mention a few. Tier 1s do not pay anyone for transit; they are paid to provide transit and peer with other tier 1s. Tier 2s typically buy transit from at least one tier 1, while peering with as many tier 2s as they can technically realize and afford. Tier 2s also own their network infrastructure, but they are not big enough to peer with all tier 1s.
In contrast to the IGPs we investigated, which use unicast, multicast, broadcast, and even data-link addresses (Intermediate System-to-Intermediate System, IS-IS) for communication, BGP facilitates the transport protocol TCP port 179 for reliable sessions between neighbors or peers. It is established practice to secure these TCP-connections with MD5 hashes. On UNIX systems, providing MD5 capabilities for TCP connections is a responsibility of the kernel, but such provision is still missing or in experimental stages with regard to the BGP implementations used in this book. Other approaches are the use of firewall chains on Linux or divert sockets/netgraph hooks on BSD operating systems. This communication is intrinsically connection-oriented and monitored via keepalive packets. Two BGP peers run through several steps of a finite state engine until a neighborship becomes established and messages or notifications can be passed back and forth. Then NLRI can be exchanged and ultimately a BGP table (Routing Information Base, RIB) derived.
BGP always places a single best path in the actual routing (forwarding) table. Initially, after peering establishment, the two peering routers exchange their full BGP table (flash update). Later on, only incremental updates are sent, and the related BGP table version number is incremented. The table number is an indicator of topological stability or volatility.
Limitations of IGPs
Why can't we use IGPs throughout the Internet? EGPs serve entirely different purposes than IGPs, both technically and from an administrative point of view (policy enforcement). The global routing table is approaching 130,000 prefixes and consists of myriad nodes (network elements). The increase rate of new prefixes appears to have slowed down, however, most likely due to aggregation improvements, stricter policies, Network Address Translation (NAT) deployments, and improved management. This number cannot be handled with the specialized approaches of IGPs.
IGP strengths turn into limits and weaknesses in the case of managing the vast Internet "playground"; just imagine the Shortest Path First (SPF) flooding, database maintenance and calculation burden, and complicated area topologies with Open Shortest Path First (OSPF); the Routing Information Protocol (RIP) hop-count limit would not get us very far either. However, RIP and BGP share a common approach: They are both distance-vector protocols. BGP is referred to as a path-vector routing protocol because it transports a sequence of AS numbers (ASNs) that identifies the path that the network prefix has traversed, sometimes referred to as an AS tree or path.
The essential idea of the BGP designers was that it is practically impossible to coordinate interconnected realms without a protocol that has rich capabilities to reflect and transport policies and control ingress and egress flows in terms of transit. This is the reason why BGP strongly depends on regular expressions and powerful filtering and tagging capabilities. BGP explicitly does not propagate information about the internal structure of autonomous systems. Remember that the primary design goal of the Internet and its predecessors NFSNET, ARPANET, and MILNET was dynamic recovery from link or node failure. BGP has hooks to accommodate this requirement.
BGP itself intrinsically does not load balance. However, one can tune the egress and ingress behavior to some extent to achieve what is referred to as "pseudo" load/flow balancing later in this chapter. This usually includes cooperation of your peering AS, upstream or downstream provider, or carrier. This is the art of attracting certain traffic at a certain ingress point and directing traffic to certain egress gateways.
Flavors of BGPv4
BGPv4 supports two different types of peering sessions: IBGP (Internal BGP) is used within one and the same AS, and EBGP (External BGP) is used between neighboring autonomous systems.
IBGP is used widely to configure transit autonomous systems and BGP-based Multiprotocol Label Switching (MPLS) virtual private network (VPN) architectures. In the MPLS VPN context, IBGP is referred to as Multiprotocol BGP. BGP is entirely a signaling protocol, even more than OSPF or IS-IS are; in a strict sense, it is incapable of delivering traffic within an AS solely by its own means. For this purpose, it relies on an underlying IGP and static or connected routes to actually forward traffic and resolve next hops.
EBGP is just the formal protocol used between neighboring (directly connected) autonomous systems to exchange aggregated routing information and to reflect macroscopic routing policies on an AS scale.
BGPv4 is a powerful and feature-rich protocol, but not necessarily complicated. To use it fully, you must understand regular expressions, classless interdomain routing (CIDR), and aggregation. Therefore, a complete discussion goes beyond the scope of almost any book. For this reason, the lab section of this chapter predominantly uses Zebra/Quagga and occasionally GateD for demonstration purposes. The BGP configuration of MRTd is almost equivalent, similar to the Cisco IOS architecture, and supports multiple BGP views; it also has the added benefit of being multithreaded. You will read more about BGP later in this chapter.
BGP Message Types
BGP systems use four different types of messages (see Table 10-1). During normal operation, only UPDATE and KEEPALIVE messages are exchanged. OPEN messages govern connection establishment with optional capabilities negotiation. NOTIFICATIONs gracefully terminate the BGP/TCP session in case of malformed information, errors, or manual-session resets.
As described in RFC 3392, "Capabilities Advertisement with BGP-4," capability negotiation was added to the BGPv4 protocol behavior to enable peers to negotiate certain additional capabilities, especially with the success of Multiprotocol BGP extensions. This is done via OPEN/NOTIFICATION messages, as demonstrated in Example 10-1 (highlighted text). When a BGP speaker that supports capability negotiation does not support a particular capability, it should respond with a notification error and a corresponding error subcode. This scheme was introduced to leave the UPDATE message mechanism untouched.
Example 10-1. Packet Capture to Demonstrate Capabilities Negotiation
[root@callisto:~#] tethereal -i eth0 ?V Frame 5 (111 bytes on wire, 111 bytes captured) Arrival Time: May 17, 2003 10:37:28.533785000 Time delta from previous packet: 0.000059000 seconds Time relative to first packet: 0.000442000 seconds Frame Number: 5 Packet Length: 111 bytes Capture Length: 111 bytes Ethernet II, Src: 00:60:08:6a:18:45, Dst: 00:10:5a:d7:93:60 Destination: 00:10:5a:d7:93:60 (3com_d7:93:60) Source: 00:60:08:6a:18:45 (3Com_6a:18:45) Type: IP (0x0800) Internet Protocol, Src Addr: 192.168.14.3 (192.168.14.3), Dst Addr: 192.168.14.1 (192.168 .14.1) Version: 4 Header length: 20 bytes Differentiated Services Field: 0x00 (DSCP 0x00: Default; ECN: 0x00) 0000 00.. = Differentiated Services Codepoint: Default (0x00) .... ..0. = ECN-Capable Transport (ECT): 0 .... ...0 = ECN-CE: 0 Total Length: 97 Identification: 0x064f Flags: 0x04 .1.. = Don't fragment: Set ..0. = More fragments: Not set Fragment offset: 0 Time to live: 1 Protocol: TCP (0x06) Header checksum: 0xd5f3 (correct) Source: 192.168.14.3 (192.168.14.3) Destination: 192.168.14.1 (192.168.14.1) Transmission Control Protocol, Src Port: 34665 (34665), Dst Port: bgp (179), Seq: