Exterior Gateway Protocols: EGP and BGPv4
Between autonomous systems, exterior gateway protocols (EGPs) distribute interdomain routing information, or (to be more precise) network layer reachability information (NLRI). The purpose of this approach is to create a loop-free view of the Internet in terms of AS paths and related path attributes. The term EGP refers to both the generic family of exterior routing protocols as well as a particular archaic protocol also called EGP, the ancestor of today's predominant signaling protocol, the Border Gateway Protocol version 4 (BGPv4).
The following subsections introduce general aspects of interdomain EGP routing and gradually concentrate on BGPv4 signaling and operation.
BGPv4: Introductory Thoughts
BGP prefix routes carry multiple attributes, in particular one AS_Path itself, for both loop prevention and administrative granularity. Because of this rich set of attributes, BGP offers extended capabilities for policy-based routing, which is of paramount importance to represent complex policies of interprovider communication. Therefore, BGP is the glue that holds the Internet together. The Internet itself essentially consists of transit autonomous systems and stub autonomous systems (as shown in Figure 10-1).
Carriers form the heart of the Internet and are classified into tier 1 (no further upstream) and tier 2 carriers that usually interconnnect at commercial exchange points (MAEs, or metropolitan-area exchanges), IXs (Internet exchanges), or NAPs (network access points). Today, these interconnection points are switched Ethernet colocation centers with frequent deployments of route servers, looking-glass access, and connectivity to the Internet Route Registry (IRR).
Neighboring Relations
Peering, upstream, and subscriber agreements govern neighborship relations. A tier 1 carrier is a telco or Internet service provider (ISP) that is at the top of the Internet telecommunication hierarchy and owns its own network cable infrastructure. These are global players such as Cable & Wireless, AT&T, Sprint, and British Telecom, just to mention a few. Tier 1s do not pay anyone for transit; they are paid to provide transit and peer with other tier 1s. Tier 2s typically buy transit from at least one tier 1, while peering with as many tier 2s as they can technically realize and afford. Tier 2s also own their network infrastructure, but they are not big enough to peer with all tier 1s.
In contrast to the IGPs we investigated, which use unicast, multicast, broadcast, and even data-link addresses (Intermediate System-to-Intermediate System, IS-IS) for communication, BGP facilitates the transport protocol TCP port 179 for reliable sessions between neighbors or peers. It is established practice to secure these TCP-connections with MD5 hashes. On UNIX systems, providing MD5 capabilities for TCP connections is a responsibility of the kernel, but such provision is still missing or in experimental stages with regard to the BGP implementations used in this book. Other approaches are the use of firewall chains on Linux or divert sockets/netgraph hooks on BSD operating systems. This communication is intrinsically connection-oriented and monitored via keepalive packets. Two BGP peers run through several steps of a finite state engine until a neighborship becomes established and messages or notifications can be passed back and forth. Then NLRI can be exchanged and ultimately a BGP table (Routing Information Base, RIB) derived.
BGP always places a single best path in the actual routing (forwarding) table. Initially, after peering establishment, the two peering routers exchange their full BGP table (flash update). Later on, only incremental updates are sent, and the related BGP table version number is incremented. The table number is an indicator of topological stability or volatility.
Limitations of IGPs
Why can't we use IGPs throughout the Internet? EGPs serve entirely different purposes than IGPs, both technically and from an administrative point of view (policy enforcement). The global routing table is approaching 130,000 prefixes and consists of myriad nodes (network elements). The increase rate of new prefixes appears to have slowed down, however, most likely due to aggregation improvements, stricter policies, Network Address Translation (NAT) deployments, and improved management. This number cannot be handled with the specialized approaches of IGPs.
IGP strengths turn into limits and weaknesses in the case of managing the vast Internet "playground"; just imagine the Shortest Path First (SPF) flooding, database maintenance and calculation burden, and complicated area topologies with Open Shortest Path First (OSPF); the Routing Information Protocol (RIP) hop-count limit would not get us very far either. However, RIP and BGP share a common approach: They are both distance-vector protocols. BGP is referred to as a path-vector routing protocol because it transports a sequence of AS numbers (ASNs) that identifies the path that the network prefix has traversed, sometimes referred to as an AS tree or path.
The essential idea of the BGP designers was that it is practically impossible to coordinate interconnected realms without a protocol that has rich capabilities to reflect and transport policies and control ingress and egress flows in terms of transit. This is the reason why BGP strongly depends on regular expressions and powerful filtering and tagging capabilities. BGP explicitly does not propagate information about the internal structure of autonomous systems. Remember that the primary design goal of the Internet and its predecessors NFSNET, ARPANET, and MILNET was dynamic recovery from link or node failure. BGP has hooks to accommodate this requirement.
BGP itself intrinsically does not load balance. However, one can tune the egress and ingress behavior to some extent to achieve what is referred to as "pseudo" load/flow balancing later in this chapter. This usually includes cooperation of your peering AS, upstream or downstream provider, or carrier. This is the art of attracting certain traffic at a certain ingress point and directing traffic to certain egress gateways.
Flavors of BGPv4
BGPv4 supports two different types of peering sessions: IBGP (Internal BGP) is used within one and the same AS, and EBGP (External BGP) is used between neighboring autonomous systems.
IBGP is used widely to configure transit autonomous systems and BGP-based Multiprotocol Label Switching (MPLS) virtual private network (VPN) architectures. In the MPLS VPN context, IBGP is referred to as Multiprotocol BGP. BGP is entirely a signaling protocol, even more than OSPF or IS-IS are; in a strict sense, it is incapable of delivering traffic within an AS solely by its own means. For this purpose, it relies on an underlying IGP and static or connected routes to actually forward traffic and resolve next hops.
EBGP is just the formal protocol used between neighboring (directly connected) autonomous systems to exchange aggregated routing information and to reflect macroscopic routing policies on an AS scale.
BGPv4 is a powerful and feature-rich protocol, but not necessarily complicated. To use it fully, you must understand regular expressions, classless interdomain routing (CIDR), and aggregation. Therefore, a complete discussion goes beyond the scope of almost any book. For this reason, the lab section of this chapter predominantly uses Zebra/Quagga and occasionally GateD for demonstration purposes. The BGP configuration of MRTd is almost equivalent, similar to the Cisco IOS architecture, and supports multiple BGP views; it also has the added benefit of being multithreaded. You will read more about BGP later in this chapter.
BGP Message Types
BGP systems use four different types of messages (see Table 10-1). During normal operation, only UPDATE and KEEPALIVE messages are exchanged. OPEN messages govern connection establishment with optional capabilities negotiation. NOTIFICATIONs gracefully terminate the BGP/TCP session in case of malformed information, errors, or manual-session resets.
Message | Explanation |
---|---|
OPEN | Exchange connection parameters, session establishment, optional capabilities negotiation |
UPDATE | Routing updates/withdrawals/replacement routes |
NOTIFICATION | Handling error conditions and closing the BGP/TCP session |
KEEPALIVE | BGP speaker monitoring/heartbeat |
Capabilities Negotiation
As described in RFC 3392, "Capabilities Advertisement with BGP-4," capability negotiation was added to the BGPv4 protocol behavior to enable peers to negotiate certain additional capabilities, especially with the success of Multiprotocol BGP extensions. This is done via OPEN/NOTIFICATION messages, as demonstrated in Example 10-1 (highlighted text). When a BGP speaker that supports capability negotiation does not support a particular capability, it should respond with a notification error and a corresponding error subcode. This scheme was introduced to leave the UPDATE message mechanism untouched.
Example 10-1. Packet Capture to Demonstrate Capabilities Negotiation
[root@callisto:~#] tethereal -i eth0 –V
Frame 5 (111 bytes on wire, 111 bytes captured)
Arrival Time: May 17, 2003 10:37:28.533785000
Time delta from previous packet: 0.000059000 seconds
Time relative to first packet: 0.000442000 seconds
Frame Number: 5
Packet Length: 111 bytes
Capture Length: 111 bytes
Ethernet II, Src: 00:60:08:6a:18:45, Dst: 00:10:5a:d7:93:60
Destination: 00:10:5a:d7:93:60 (3com_d7:93:60)
Source: 00:60:08:6a:18:45 (3Com_6a:18:45)
Type: IP (0x0800)
Internet Protocol, Src Addr: 192.168.14.3 (192.168.14.3), Dst Addr: 192.168.14.1 (192.168
.14.1)
Version: 4
Header length: 20 bytes
Differentiated Services Field: 0x00 (DSCP 0x00: Default; ECN: 0x00)
0000 00.. = Differentiated Services Codepoint: Default (0x00)
.... ..0. = ECN-Capable Transport (ECT): 0
.... ...0 = ECN-CE: 0
Total Length: 97
Identification: 0x064f
Flags: 0x04
.1.. = Don't fragment: Set
..0. = More fragments: Not set
Fragment offset: 0
Time to live: 1
Protocol: TCP (0x06)
Header checksum: 0xd5f3 (correct)
Source: 192.168.14.3 (192.168.14.3)
Destination: 192.168.14.1 (192.168.14.1)
Transmission Control Protocol, Src Port: 34665 (34665), Dst Port: bgp (179), Seq:
Internet Exchange Points
The purpose of exchange points is to constitute a regional Internet network segment where ISPs can gather (peer) to exchange local traffic. This measure essentially reduces the number of AS hops that traffic is required to traverse to reach a particular destination prefix. In the old days, this resulted in suboptimal routing to the nearest international network access point (NAP). In the worst case, traffic destined for the same metropolitan network was routed via another continent. Today, with a network of IXs in almost every metropolis of the world, local traffic can be kept local.
The number of AS hops and presence at major exchanges has become a metric often brought up by customers to rate the interconnection quality of carrier services. Modern exchange points provide route servers (and their own AS) to avoid scalability problems with any-to-any peerings. You will read more about this instrument in the section "Route Server and Routing Registries." Therefore, an IX participant only has to set up a peering with the route server.
Historically, exchange points have been known by different names, including the following:
- Metropolitan area exchanges (MAEs) (for example, MAE West)
- Network access points (NAPs)
- Commercial Internet exchanges (CIXs)
- Internet exchanges (IXs) (for example, LINX = London Internet Exchange)
In the United States, these exchange points are usually referred to as MAEs/CIXs; in Europe, IX is used more commonly. NAP (network access/attachment point) is a generic term for a location where one can hook up a BGP speaker to other Internet routers. Participants usually acquire dedicated point-to-point circuits to this exchange point (resembling simple network segments; in general, redundant Ethernet switches).
In general, there are two kinds of Internet exchange points: commercial and noncommercial. ISPs can use exchange points to exchange traffic at a national or international level. At the largest exchange points (usually U.S. MAEs), tier 1 and tier 2 carriers gather. These exchange points offer ATM or switched Ethernet ports up to 1 Gbps as an exchange medium.
NOTE
In the beginning, Fiber Distributed Data Interface (FDDI) rings were the exchange medium of choice.
An exchange point (network segment) often constitutes an AS by itself but does not necessarily have to. In 2001, the Euro-IX was founded, an organization that includes almost all IXs in Europe. Internet exchanges usually provide looking glasses and traffic statistics via web interfaces and unprivileged Telnet access to route servers.
Figure 10-2 shows an example of the Vienna Internet Exchange (VIX) web-based looking-glass interface; Figure 10-3 shows the result of this query. The corresponding traffic statistics of this exchange are shown in Figure 10-4. As an alternative, Telnet access to route servers is a convenient way of grasping the way the world sees your prefixes. This is demonstrated in Example 10-2.
Example 10-2. Exodus Route Server Telnet Access
[root@callisto~#] telnet route-server-eu.exodus.net ####################### route-server-eu.cw.net ######################## ################## European Backbone Route Monitor ################### 166.63.210.40 London 166.63.210.41 London This is the European view of the routes. For a North American view, telnet to route-server.cw.net For an Asian view, telnet to route-server-ap.cw.net This router should be used to see if a route is in CW routing tables. This router sets local-preference, MED, etc. for all routes equally. This router should also be used to verify reachability from CW to other networks. This router should _not_ be used to verify CW backbone routing policy. The best path shown is the current best path _from this router_. For questions about this route server, send email to hno@cw.net ####################### route-server-eu.cw.net ####################### route-server-eu> show ip bgp summary BGP router identifier 212.62.0.13, local AS number 3561 BGP table version is 54207646, main routing table version 54207646 132963 network entries and 265900 paths using 21937959 bytes of memory 48287 BGP path attribute entries using 2317776 bytes of memory 759 BGP rrinfo entries using 28664 bytes of memory 20064 BGP AS-PATH entries using 488332 bytes of memory 663 BGP community entries using 25760 bytes of memory Dampening enabled. 0 history paths, 0 dampened paths BGP activity 987579/854616 prefixes, 3181892/2915992 paths Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd 166.63.210.40 4 3561 12854997 121162 54207588 0 0 12w0d 132937 166.63.210.41 4 3561 12851384 121158 54207595 0 0 12w0d 132963 route-server-eu> show ip bgp BGP table version is 54206927, local router ID is 212.62.0.13 Status codes: s suppressed, d damped, h history, * valid, > best, i - internal Origin codes: i - IGP, e - EGP, ? - incomplete Network Next Hop Metric LocPrf Weight Path *>i3.0.0.0 166.63.210.40 100 0 7018 80 i * i 166.63.210.41 100 0 7018 80 i *>i4.0.0.0 166.63.210.40 100 0 3356 i * i 166.63.210.41 100 0 3356 i * i6.1.0.0/16 166.63.210.41 100 0 701 668 7170 1455 i *>i 166.63.210.40 100 0 701 668 7170 1455 i * i6.2.0.0/22 166.63.210.41 100 0 701 668 7170 1455 i *>i 166.63.210.40 100 0 701 668 7170 1455 i * i6.3.0.0/18 166.63.210.41 100 0 701 668 7170 1455 i *>i 166.63.210.40 100 0 701 668 7170 1455 i * i6.4.0.0/16 166.63.210.41 100 0 701 668 7170 1455 i *>i 166.63.210.40 100 0 701 668 7170 1455 i * i6.5.0.0/19 166.63.210.41 100 0 701 668 7170 1455 i *>i 166.63.210.40 100 0 701 668 7170 1455 i * i6.8.0.0/20 166.63.210.41 100 0 701 668 7170 1455 i *>i 166.63.210.40 100 0 701 668 7170 1455 i * i6.9.0.0/20 166.63.210.41 100 0 701 668 7170 1455 i *>i 166.63.210.40 100 0 701 668 7170 1455 i * i6.10.0.0/15 166.63.210.41 100 0 701 668 7170 1455 i *>i 166.63.210.40 100 0 701 668 7170 1455 i * i6.14.0.0/15 166.63.210.41 100 0 701 668 7170 1455 i *>i 166.63.210.40 100 0 701 668 7170 1455 i *>i12.0.0.0 166.63.210.40 100 0 7018 i * i 166.63.210.41 100 0 7018 i * i12.0.19.0/24 166.63.210.41 100 0 27487 i *>i 166.63.210.40 100 0 27487 i *>i12.0.48.0/20 166.63.210.40 100 0 209 1742 1742 i * i 166.63.210.41 100 0 209 1742 1742 i ...
For further information on registries, look at the following websites:
EBGP and EBGP Multihop
EBGP exchanges routing information between adjacent autonomous systems, whether they are peers (equal standing), upstreams (providers/carriers), or downstreams (customers/subscribers). This exchange occurs via network announcements, and the corresponding routes are referred to as prefixes or aggregates.
The routing software decides based on the ASN following the remote-as statement whether it is a remote AS (EBGP) or a local (IBGP) connection. EBGP neighbors need to be adjacent (directly connected); for IBGP, this is left to the underlying IGP. If the EBGP neighbor is several hops away, the ebgp-multihop neighbor command can satisfy this requirement. This is rather common because EBGP peering sessions are often configured loopback to loopback (recommended), which often results in at least a three-hop distance and improved availability. The ebgp-multihop statement is required on both neighbors.
As you will see, this setup is well suited for load balancing over two EBGP links; in fact, the underlying IGP or static routes perform the load balancing as long as Equal-Cost Multi-Path (ECMP) is supported by the network operating system. (See Figure 10-5 and Example 10-3 highlighted text.) Keep in mind that BGP "pseudo" load balancing at geographically distant egress/ingress points is a completely different and tricky matter (even an "art"). Example 10-3 also demonstrates the use of address aggregation for the example aggregate 192.168.0.0/22.
Example 10-3. EBGP Load-Sharing and EBGP-Multihop Setup
stanley-bgpd# show running-config ... ! router bgp 300 bgp router-id 192.168.40.245 bgp dampening no synchronization network 192.168.40.0/24 neighbor 192.168.3.245 remote-as 400 neighbor 192.168.3.245 description AS-400 neighbor 192.168.3.245 ebgp-multihop 3 neighbor 192.168.3.245 update-source lo1 neighbor 192.168.3.245 soft-reconfiguration inbound neighbor 192.168.3.245 next-hop-self ! ip route 192.168.0.0/22 192.168.40.250 ip route 192.168.0.0/22 192.168.40.254 ... oliver-bgpd# show running-config ... ! router bgp 400 bgp router-id 192.168.3.245 bgp dampening no synchronization network 192.168.0.0/22 neighbor 192.168.40.245 remote-as 300 neighbor 192.168.40.245 description AS-300 neighbor 192.168.40.245 ebgp-multihop 3 neighbor 192.168.40.245 update-source lo1 neighbor 192.168.40.245 soft-reconfiguration inbound neighbor 192.168.40.245 next-hop-self ! ip route 192.168.40.0/24 192.168.40.249 ip route 192.168.40.0/24 192.168.40.253 ...
IBGP Full Mesh, Route Reflectors, and Confederation
In contrast to most fellow authors, I would like to start the BGP discussion with Internal BGP (IBGP). This and all subsequent labs use OSPF as the underlying IGP to provide the necessary connectivity that BGP requires for both session establishment and proper signaling operation. We will start with a manually configured full-mesh IBGP setup and gradually move to a more elegant configuration with the use of peer groups and route reflectors. Confederation as an alternative approach to overcome the scalability limits of full-meshed IBGP is demonstrated as well. In the course of the discussion, we will look at the finite state engine (FSM) and examples of the OPEN and UPDATE messages passed between BGP speakers.
The BGP specifications dictate (for loop-prevention purposes) that a BGP speaker must not advertise prefixes heard from another IBGP speaker to a third IBGP speaker. Because of this convention, you must configure a full-mesh between all IBGP speakers within an AS. Obviously, this approach does not scale. Fortunately, BGPv4 provides two ways to approach this problem—route reflectors and confederation—in various possible setups and combinations:
- Single route reflector (not recommended, single point of failure)
- Clustered route reflectors
- Redundant route reflectors
- Confederation ("EIBGP")