BGP confederations are another mechanism for controlling the explosion of iBGP meshing. They can be used instead of or in combination with route reflectors. The basic functionality of BGP confederations is to split up the autonomous system into smaller, more manageable, autonomous systems, which are represented as one single autonomous system to BGP peers external to the confederation.
Note
For an in-depth discussion of BGP confederations, refer to the Cisco Press title Internet Routing Architectures, Second Edition, by Bassam Halabi, or RFC 1965, "Autonomous System Confederations for BGP."
By creating smaller autonomous system domains, or sub-ASs, it is possible to restrict the number of iBGP sessions that are required?a full mesh is required only within each sub-AS. A further advantage to this method, which may be useful in very large-scale topologies, is that separate IGPs can be deployed across the service provider backbone, which helps with the scaling of the IGP. This feature is not possible when route reflectors are used, unless the BGP next-hop addresses are leaked across sub-AS boundaries, because you cannot reset the BGP next-hop on a route reflector. (However, you can do this on a confederation boundary router.)
Note
Note that although confederations reduce the size of the iBGP mesh, they make it harder to partition the routing information if the separation of the IGP between sub-AS approach is taken. This is because packets must be forwarded across the confederation boundary, which in an MPLS/VPN environment means that each the edge confederation boundary routers must have access to all VPN routes (unless complex partitioning of the routes between multiple edges is deployed).
To help explain this complex subject, we will consider the topology shown in Figure 12-13. This topology depicts a service provider, Confed.Com, that has three regional POPs, located in San Jose, Paris, and London, connected in a full mesh. All relevant IP address assignments are shown in Table 12-7.
POP |
Site |
Subnet |
---|---|---|
San Jose |
San Jose (Loopback0) |
194.17.1.1/32 |
San Francisco (Loopback0) |
194.17.1.2/32 |
|
Santa Clara (Loopback0) |
194.17.1.3/32 |
|
London |
Reading (Loopback0) |
197.58.27.3/32 |
Heathrow (Loopback0) |
197.58.27.2/32 |
|
London (Loopback0) |
197.58.27.3/32 |
|
Paris |
Paris (Loopback0) |
195.12.14.1/32 |
Chartres (Loopback0) |
195.12.14.3/32 |
|
Lyon (Loopback0) |
195.12.14.2/32 |
Figure 12-13 shows that MPLS and MP-iBGP have been deployed within each regional POP and that a full mesh of eBGP is used to connect each regional POP. Each POP is a separate sub-AS, so our requirement of an iBGP full mesh among all BGP speaking routers is relaxed. The exchange of routes and labels between the sub-ASs will differ, depending of which type of deployment option is taken.
Note
Although a full mesh of eBGP is used to connect each regional POP, it should be noted that within a confederation environment, the eBGP session between sub-AS's differs from normal eBGP. The attributes of any routes advertised across the session are not changed, including the BGP next-hop of the route. In addition, normal iBGP rules apply within each sub-AS.
When confederations are used, we have a couple of choices on how to design and deploy the IGP. Which choice is taken affects the use of the MPLS/VPN architecture and how it functions in this type of environment. Example 12-9 provides configuration for PE-routers San Francisco, San Jose, London, and Reading, which will be used in the examples that follow. For the sake of simplicity, the Paris POP will not be considered within the examples.
hostname Reading ! ip vrf EuroBank rd 1:27 route-target export 100:27 route-target import 100:27 ! interface loopback0 ip address 197.58.27.3 255.255.255.255 ! router bgp 65001 no bgp default ipv4-unicast bgp confederation identifier 100 bgp confederation-peers 65002 neighbor 197.58.27.1 remote-as 65001 neighbor 197.58.27.1 update-source Loopback0 neighbor 197.58.27.1 activate ! address-family ipv4 vrf EuroBank redistribute connected no auto-summary no synchronization exit-address-family ! address-family vpnv4 neighbor 197.58.27.1 activate neighbor 197.58.27.1 send-community extended exit-address-family hostname London ! ip vrf EuroBank rd 1:27 route-target export 100:27 route-target import 100:27 ! interface loopback0 ip address 197.58.27.1 255.255.255.255 ! router bgp 65001 no synchronization no bgp default ipv4-unicast bgp confederation identifier 100 bgp confederation peers 65002 neighbor 197.58.27.3 remote-as 65001 neighbor 197.58.27.3 update-source Loopback0 neighbor 197.58.27.3 activate neighbor 10.1.1.14 remote-as 65002 neighbor 10.1.1.14 activate ! address-family ipv4 vrf EuroBank no auto-summary no synchronization exit-address-family ! address-family vpnv4 neighbor 197.58.27.3 activate neighbor 197.58.27.3 send-community extended neighbor 10.1.1.14 activate neighbor 10.1.1.14 send-community extended exit-address-family hostname San Jose ! ip vrf EuroBank rd 1:27 route-target export 100:27 route-target import 100:27 ! interface loopback0 ip address 194.17.1.1 255.255.255.255 ! router bgp 65002 no synchronization no bgp default ipv4-unicast bgp confederation identifier 100 bgp confederation peers 65001 neighbor 194.17.1.2 remote-as 65002 neighbor 194.17.1.2 update-source Loopback0 neighbor 194.17.1.2 activate neighbor 10.1.1.13 remote-as 65001 neighbor 10.1.1.13 activate ! address-family ipv4 vrf EuroBank no auto-summary no synchronization exit-address-family ! address-family vpnv4 neighbor 194.17.1.2 activate neighbor 194.17.1.2 send-community extended neighbor 10.1.1.13 activate neighbor 10.1.1.13 send-community extended exit-address-family hostname San Francisco ! ip vrf EuroBank rd 1:27 route-target export 100:27 route-target import 100:27 ! interface loopback0 ip address 194.17.1.2 255.255.255.255 ! router bgp 65002 no bgp default ipv4-unicast bgp confederation identifier 100 redistribute connected neighbor 194.17.1.1 remote-as 65002 neighbor 194.17.1.1 update-source Loopback0 neighbor 194.17.1.1 activate ! address-family ipv4 vrf EuroBank redistribute connected no auto-summary no synchronization exit-address-family ! address-family vpnv4 neighbor 194.17.1.1 activate neighbor 194.17.1.1 send-community extended exit-address-family
When the choice is taken to run a single IGP process across the whole BGP confederation, normal iBGP rules apply, so no BGP attributes are changed, including the next-hop for each route that is exchanged between sub-AS boundaries. We have already discussed how an MPLS LSR assigns a label for each internal route that it learns through its IGP. This does not change within a BGP confederation environment, so a label for every BGP next-hop, as assigned by each PE-router within the backbone, should exist. Label swapping can occur to all customer VPN routes.
An example of this type of connectivity can be seen in Figure 12-14. This example also shows the advertisement of a VPN route and the relevant label distribution to obtain connectivity.
Figure 12-14 shows that each sub-AS runs the same IGP process as all other sub-ASs. eBGP is used between sub-ASs, but normal iBGP rules apply across the sub-AS boundaries. This means that the next-hop of any route is not changed and must be reachable by each sub-AS. In the case of MPLS, a label must exist for the BGP next-hop so that packets can be label-switched to the egress LSR for the external destination.
In our example, the Confed.Com San Francisco PE-router receives an update for 195.12.2.0/24 from the EuroBank VPN customer. This update is populated into the EuroBank VRF and is advertised using MP-iBGP to the Confed.Com San Jose PE-router with a next-hop of 194.17.1.2/32 and a VRF label of 11. This route is then advertised across the confederation sub-AS boundary to the Confed.Com London PE-router, with the next-hop and VRF label unchanged. The London router then advertises the route to the Reading PE-router, which installs it into the EuroBank VRF.
When a packet is sent from one EuroBank site to the other, because a label exists for the BGP next-hop of the route (194.17.1.2/32, in our example), the VRF label is prepended/pushed on to the packet and is label-switched across the Confed.Com backbone to the San Francisco PE-router. The packet will arrive at the San Francisco router with a one-level label stack (the top level will have been popped at the Santa Clara P-router); this label will have a value of 11, as originally set by the San Francisco router.
When BGP confederations are deployed and each sub-AS uses its own IGP process, the next-hop for all BGP routes is still unchanged across sub-AS boundaries. This means that the BGP next-hop addresses must be reachable from within each sub-AS, or connectivity will be broken. You might think that by redistributing the BGP next-hop addresses between sub-AS IGP processes, connectivity could be restored. This is certainly the case in a non-MPLS environment. However, in the case of MPLS/VPN, we need to consider an example to understand whether this redistribution will allow connectivity between sub-ASs. Figure 12-15 shows a sample topology with a different IGP process in each sub-AS.
Within Figure 12-15, we can see that the San Jose PE-router learns routes from other PE-routers within its own sub-AS, AS65002, through the use of MP-iBGP. These routes are advertised across the sub-AS boundary MP-iBGP session to the London PE-router with the next-hop and label information unchanged.
The first thing to notice is that the London PE-router is incapable of advertising these routes to other PE-routers because the next-hop for the routes is inaccessible: It belongs to the IGP running within the AS65002 sub-AS. This is because of the requirement within BGP that the next-hop of the route be accessible. To rectify this situation, we could try to redistribute the BGP next-hop addresses for each route across the sub-AS boundaries, or we could configure static routes on the London PE-router. This would allow us to label-switch packets to the egress LSR.
Note
A careful reader might notice that BGP was not mentioned as an option for the distribution of next-hop addresses between sub-AS boundaries. This is because labels are not assigned to BGP routes. Therefore, if this protocol were used to advertise the next-hop addresses from AS65002 to AS65001, label switching would not work?no label would be assigned to the next-hop addresses of VPN routes.
The problem with this approach is that multiple static host routes would be required, or redistribution between IGP processes would need to be configured. It is arguable that the reason to deploy confederations is to help scale the IGP and to hide instability in one POP from other POP sites. If this is the case, then redistribution is not a desirable function. On the other hand, the initial scope of confederations was not to hide the IGP information, but rather to reduce the number of iBGP sessions. If this is the requirement, then redistribution may be an option.
The next option that we have is to set the next-hop of all the VPN routes to the advertising sub-AS router. This would cause all routes to be advertised with a BGP next-hop that pointed to the San Jose router. Figure 12-16 illustrates this.
At first glance, this appears to solve the problem. However, if we consider how the Reading router will forward traffic destined for the EuroBank VPN via the San Francisco PE-router, we can see that a problem exists with this mechanism. When the Reading router received a packet, it would append a two-level label stack. The first label would be the VPN label?in our case, label 11?and the second label would be for the egress PE-router. The egress PE-router is actually the BGP next-hop of the route, so the Reading router would apply a label that corresponded to the San Jose router.
The problem with this approach is that packets would be label-switched to the San Jose router, but that router would not be capable of forwarding the packets because it would not understand the second level label (the VPN label). Therefore, a mechanism is required to allow label exchange to occur between sub-AS boundaries, but without the requirement of injecting IGP information between the sub-ASs. This is achieved by allocating a new stack of labels at the sub-AS boundary when next-hop-self is configured. The act of resetting the next-hop causes the PE-router to assign a new label to represent the route and advertise it outside the region across the MP-eBGP connection between confederation peers. An illustration of this technique can be seen in Figure 12-17.
Note
The label can also be reset by the receiving sub-AS PE-router through use of the next-hop-self command. This has the advantage of not having to keep a /32 route for the confederation peer within the receiving sub-AS.
In Figure 12-17, the San Jose PE-router has assigned a label of 12 to a VPN-IPv4 update that it has sent to the London PE-router. The San Jose router is capable of mapping this label to a two-level label stack to reach the San Francisco PE-router, where the VPN-IPv4 route originated. In this type of topology, if the next-hop of a VPN route is changed, one level of label is used across the sub-AS boundary; this label represents the VPN route as seen by the boundary router. The IGP label that got the packet to the boundary router will have been removed.
On the return path, the boundary router replaces this label with a two-level label stack that consists of the original VPN label, as assigned by the originating PE-router, and an IGP label to carry the packet to the originating PE-router. The subsequent forwarding sequence for the example shown in Figure 12-17 can be seen in Figure 12-18.
Note
Only one label is used between the sub-AS peers shown in Figure 12-18 because the IGP label associated with the BGP next-hop address has a value of 2 (implicit-null) and therefore has been popped by the upstream hop, per the penultimate hop popping rules discussed in Chapter 2.