Ethernet Channel Bonding

Ethernet channel bonding describes the physical bundling of multiple full-duplex Fast/Giga Ethernet interfaces (usually two or four) to a virtual pipe of multiplied bandwidth. The resulting channel is transparent to Layer 2 configuration issues and can sustain single- or multiple-link failures of the constituting links.

Experimental channel-bonding drivers for Linux and BSD for selected Fast Ethernet NICs are available. Channel bonding proves particularly useful when using quad Fast Ethernet NICs. In the Cisco context, this feature is referred to as Fast/Giga EtherChannel; in the Sun Solaris world, it is known as Fast/Giga Ethernet trunking. It offers some scalability and resilience between 100-Mbps and -Gbps interfaces, especially when the platform architecture (system bus) is not capable of driving full-duplex Gigabit Ethernet interfaces. Channel bonding in the UNIX world is often deployed in context with cluster architecture approaches such as Beowulf (http://www.beowulf.org/software/bonding.html). A FreeBSD kernel patch for channel bonding is available as well (http://people.freebsd.org/~wpaul/FEC/) via the NetGraph facility (see Appendix B, "The FreeBSD Netgraph Facility"). It compiled on my system without a glitch. I cannot offer advice and reports beyond this statement because of equipment constraints. On Linux, you have to enable the bonding driver support in the kernel Network Device Support section. It is essential to compile this as a module!

Interface Cloning

Usually special interfaces (pseudo interfaces) such as tunnels require provisioning at kernel compile-time. However, as a feature of modern UNIX operating systems, these can be added dynamically at runtime when required via, for example, the ifconfig create command sequence on BSD or certain dedicated user-space utilities. This has nothing in common with the interface-cloning approaches used by Cisco IOS Software (cloning from a template). Cloned routes are a different concept as well and are discussed in Chapter 8, "Static Routing Concepts."

ECMP (Equal-Cost Multi-Path)

ECMP is an important requirement to enable per-packet/per-destination (per-flow) multipath traffic balancing over multiple equal-cost interfaces. This is of a different nature than the previously discussed interface bonding and is important with regard to load-balancing/sharing and redundancy architectures. The term equal cost refers to an identical metric from the point of view of involved static or dynamic routing schemes.

In contrast to Cisco routers, UNIX IP stacks intrinsically have no perception of per-destination load sharing and generally act on a per-packet basis if not configured otherwise (such as in policy routing). Cisco IOS Software defaults to per-destination (per-flow) traffic balancing, as does Cisco Express Forwarding (CEF).

Driver Support for LAN/WAN Interface Cards

The following list offers a quick overview of important interface types supported by popular UNIX operating systems:

10/100-Mbps Ethernet and 4/16-Mbps Token Ring adapters have been supported for a long time on all discussed operating systems.
Although an intriguing concept, 100-Mbps Token Ring has never really generated enough customer interest to penetrate the market.
Gigabit Ethernet support is sufficiently available as well, but you must take into consideration the performance capabilities of the gateway's bus architecture to feed traffic to these high-performance full-duplex cards. At the time of this writing, 10Gigabit adapters make no sense on these systems and will be more of a feature of 64-bit architectures.
Fibre Channel adapters are available as well, and they are used primarily to build storage-area networks (SANs).
Wireless network cards (IEEE 802.11B) are supported for the most prominent chipsets, especially the Cisco Aeronet product line, with the newer 802.11G driver support catching up in Linux 2.6.x and FreeBSD 5.x.

An attractive feature of the discussed UNIX operating systems is the option to use various PCI/ISA WAN interface cards. These flavors include clear-channel or channelized E1 synchronous serial adapters, T3 adapters, PRI cards, and ATM interfaces. These cards come in various flavors with regard to clocking, channelized or clear-channel operation, CSU/DSU integration, duplex transmission, and fractional bandwidths. Well-known producers of these adapters are Sangoma, Cyclades, ImageStream, Stallion, Prosum, and Fore Systems (now Marconi). Usually the vendors provide firmware updates, kernel modules, and utilities for BSD, Linux, and sometimes Sun Solaris. Some of these cards are also gaining popularity for use in software private branch exchange (PBX) systems for enterprise fax and telephony services (FXS/FXO/E1/PRI/BRI interfaces).

Encapsulation Support for WAN Interface Cards

The WAN interfaces I am aware of essentially support the following Layer 2 encapsulations:

Frame Relay
X.25
ATM
HDLC
PPP

The supported features with regard to ATM and Frame Relay vary depending on the vendor of these interface cards. I discuss certain aspects in Chapter 4, "Gateway WAN/Metro Interfaces," in a restricted fashion because of limited access to test equipment.

Support for Bridging Interfaces

Running a UNIX workstation in bridging mode offers two interesting possibilities.

The first is the ability to reduce the traffic on the broadcast domain by bridge segmentation. Because of the availability of cheap switches, this is rarely done anymore.

Second, and more interesting, is the ability to add a transparent IP-filtering and traffic-shaping bridge that is nearly impossible to attack from a remote IP address. It is able to inspect all forwarded frames without configured IP addresses on the interfaces; therefore, IP masquerading (Network Address Translation, or NAT) is not possible. It is okay to assign an IP address for administrative purposes, but you must bear in mind that it is the purpose of a bridge to forward all traffic, not just IP datagrams. You can either use protocol types for filtering non-IP protocols or use the blocknonip option of the OpenBSD brconfig(8) utility. Bridging requires that the interfaces be in promiscuous mode; therefore, the NICs will experience heavier load.

Loop protection in the bridging context is crude. Only Linux supports the 802.1D spanning-tree algorithm, but usually there exists no or only rudimentary Spanning Tree Protocol (STP) support. UNIX gateways were never designed to act as bridges or switches in complicated switch hierarchies/topologies. Therefore, you should prevent loops by design and not rely on the bridging code and its crude loop-protection mechanism to prevent disaster.

Linux and BSD-like operating systems support bridging modes on Ethernet-type interfaces. FreeBSD has expanded the bridging concept to support clustering and VLAN trunks. You will learn more about this feature in Chapter 5. Example 3-3 shows an example of enabling bridging support with a single FreeBSD kernel configuration line.

Example 3-3. BSD Kernel Bridging Support


options BRIDGE        # for all BSD OSs

TCP Tuning

The Transport Control Protocol (TCP) is a far more complicated transport protocol than the User Datagram Protocol (UDP) because of its reliable (connection-oriented) character, more complex header, windowing mechanism, and three-way handshaking. Therefore, most IP stacks allow manipulation of TCP behavior to a large extent. This becomes more and more an issue because, unfortunately, several heavy-load protocols such as HTTP are based on TCP segments for transport. Example 3-4 demonstrates several TCP-related kernel configuration options.

Example 3-4. TCP sysctl Parameters


[root@callisto:~#] sysctl -a | grep tcp

net.ipv4.tcp_low_latency = 0

net.ipv4.tcp_frto = 0

net.ipv4.tcp_tw_reuse = 0

net.ipv4.tcp_adv_win_scale = 2

net.ipv4.tcp_app_win = 31

net.ipv4.tcp_rmem = 4096        87380   174760

net.ipv4.tcp_wmem = 4096        16384   131072

net.ipv4.tcp_mem = 48128        48640   49152

net.ipv4.tcp_dsack = 1

net.ipv4.tcp_ecn = 0

net.ipv4.tcp_reordering = 3

net.ipv4.tcp_fack = 1

net.ipv4.tcp_orphan_retries = 0

net.ipv4.tcp_max_syn_backlog = 1024

net.ipv4.tcp_rfc1337 = 0

net.ipv4.tcp_stdurg = 0

net.ipv4.tcp_abort_on_overflow = 0

net.ipv4.tcp_tw_recycle = 0

net.ipv4.tcp_syncookies = 0

net.ipv4.tcp_fin_timeout = 60

net.ipv4.tcp_retries2 = 15

net.ipv4.tcp_retries1 = 3

net.ipv4.tcp_keepalive_intvl = 75

net.ipv4.tcp_keepalive_probes = 9

net.ipv4.tcp_keepalive_time = 7200

net.ipv4.tcp_max_tw_buckets = 180000

net.ipv4.tcp_max_orphans = 8192

net.ipv4.tcp_synack_retries = 5

net.ipv4.tcp_syn_retries = 5

net.ipv4.tcp_retrans_collapse = 1

net.ipv4.tcp_sack = 1

net.ipv4.tcp_window_scaling = 1

net.ipv4.tcp_timestamps = 1

Tunnel Support

The open-source operating systems under consideration offer a large variety of kernel- and user-space tunnel solutions, with or without protocol transparency, and with or without encryption/compression. The most widely known are as follows:

IPSec (standard)
IP-IP (standard)
GRE/Mobile IP (standard)
PPTP (standard)
L2TP (standard)
CIPE (no standard, kernel and user space)
VTun (no standard, user space)
Stunnel (HTTPS) (no standard, user space)

As of this writing, not all of the operating systems support all of these approaches. FreeBSD, for example, only offers early user-space Generic Routing Encapsulation (GRE) support. The safest bet still is to use the same solution for both tunnel endpoints.

What most tunnel solutions have in common is the fact that they reduce the available maximum transmission unit (MTU) size because of encapsulation overhead. You must take this into consideration to prevent fragmentation troubles or breaking path MTU discovery (PMTU).