Mitigating the Detrimental Effects of BGP with MPLS Restart
The BGP with the MPLS graceful restart mechanism[5] extends the BGP graceful restart procedure to MPLS. In essence, during the establishment of the pre-restart BGP session, an LSR informs its peers about its capability to preserve the MPLS forwarding state across the BGP restart. When the LSR restarts, its peers retain labels learned from the restarting LSR, mark them stale, but continue to use them for forwarding. After restart, the restarting LSR exchanges label information with peers that allows the MPLS forwarding state to be refreshed. As a result, the forwarding-plane disruption is avoided, as shown in Figure 7-8.
Figure 7-8. BGP with MPLS Graceful Restart Behavior
[View full size image]

BGP with MPLS Graceful Restart Mechanism
BGP with MPLS graceful restart depends on the use of the graceful restart capability.[6] In terms of protocol behavior, the main difference between BGP graceful restart and BGP with MPLS graceful restart lies in the processing of the SAFI field. By examining the SAFI field, a receiver determines whether the sender is capable of preserving an IP or MPLS forwarding state.
For each address family, a separate <AFI, SAFI> pair is required to advertise IP and MPLS capability. For example, if an LSR is capable of preserving both a BGP-derived IPv4 and an MPLS forwarding state, it needs to include two <AFI, SAFI> fields. In short, an LSR can express its capability to preserve IP and MPLS forwarding states across the BGP restart by including the appropriate <AFI, SAFI> fields.
For now, the discussion is restricted to MPLS. By including the graceful restart capability in the Update message and setting the <AFI, SAFI> fields, the LSR indicates the following: "For each address family listed in the graceful restart capability, I am capable of preserving the MPLS forwarding state across the BGP restart. When my BGP control-plane restarts, please do not withdraw labels for these address families, but continue to use those labels for forwarding data packets as normal. After restarting, if I am not able to reestablish the session within the allowed time (restart time), or if I explicitly indicate my inability to preserve the MPLS forwarding state for the previously advertised address families across the restart, please immediately delete the concerned MPLS forwarding state that you have retained across the restart."
As a result of this message exchange, when BGP in the LSR restarts, its peers retain labels exchanged with the restarting LSR. This behavior allows the LSR to restart and reestablish sessions without any disruption. After a session has been reestablished, the LSRs exchange label information, which enables both the restarting LSR and its peers to refresh their stale MPLS forwarding state. From the interactions between a restarting LSR and its peers, it follows that the BGP with MPLS graceful restart mechanism requires support from peers. The details of protocol mechanics are described in the paragraphs that follow.
A BGP speaker advertises its capability to preserve an IP and MPLS forwarding state for a given network layer protocol through the AFI and SAFI fields in the graceful restart capability. For example, a value of 1 in the AFI field indicates IPv4, a value of 2 indicates IPv6, and so forth.[7]
Similarly, a value of 1 in the SAFI indicates a unicast NLRI, and a value of 4 indicates an NLRI with MPLS labels. (See Table 7-1.) For example, to advertise the capability to preserve the forwarding state for unicast IPv4 prefixes, a BGP speaker sets the AFI to 1 and the SAFI to 1. Similarly, for expressing the capability to preserve the MPLS forwarding state for IPv4 prefixes, the AFI field is set to 1 and the SAFI is set to 4. After restart, the restarting speaker sets the Forwarding State (F) bit in an <AFI, SAFI> pair to indicate whether the MPLS forwarding state has been preserved for that address family across the BGP restart. BGP graceful restart capability contains a number of fields and flags such as restart flag (which indicates if BGP has restarted) and restart time (which indicates the estimated time taken by the restarting speaker to reestablish the BGP session).
Table 7-1. Subsequent Address Family Identifier Field Values[8]
SAFI Value Description
0 Reserved
1 NLRI used for unicast forwarding
2 NLRI used for multicast forwarding
3 NLRI used for both unicast and multicast forwarding
4 NLRI with MPLS labels
5127 Unassigned
128255 Private use
When the graceful restart capability is included, but <AFI, SAFI> fields are not present, this indicates that even though the LSR supports the graceful restart procedure, it is incapable of preserving the IP and MPLS forwarding state across the restart. When an LSR can support the BGP with an MPLS graceful restart procedure but is NSF-incapable, such an LSR helps to reduce the negative impact of BGP restart on neighbors. It does not, however, diminish the harmful effects on the MPLS forwarding plane caused by its own BGP restart. This is discussed later in the chapter. The following section briefly describes BGP with MPLS restart protocol behavior for the restarting LSR (the LSR whose BGP has restarted) and helper LSR (peer of the restarting LSR). Before proceeding further.
Behavior of a Restarting LSR
During the initial BGP session establishment, the LSR advertises its capability to preserve the MPLS forwarding state by setting the <AFI, SAFI> fields in the graceful restart capability. In addition, the LSR advertises the estimated time it will take to reestablish the session after a BGP restart. The following discussion assumes that the LSR not only supports BGP with MPLS graceful restart extensions, but the LSR also is capable of preserving IP and MPLS forwarding states across the restart.
After having informed its neighbors about graceful restart capabilities, when the BGP control plane in an LSR restarts, the LSR continues to forward data traffic using the preserved MPLS forwarding state (LFIB). After restarting, for each address family for which the restarting LSR had expressed its capability to preserve the MPLS forwarding state before the restart, the LSR checks whether it was actually able to preserve the concerned forwarding state across the restart.
An LSR needs to preserve MPLS-related or both IP- and MPLS-related forwarding state depending on whether the LSR is an edge or transit LSR. For the sake of the following discussion, assume that the restarting LSR is an edge LSR. Suppose that the restarting LSR has preserved its MPLS forwarding state across the restart. After restarting, it marks the BGP-derived MPLS forwarding state. It continues, however, to use the stale information for forwarding data traffic. To avoid retaining the stale forwarding information indefinitely, the LSR starts the state stale timer. The restarting LSR attempts to reestablish BGP sessions with its peers and exchange the BGP graceful restart capability. The restarting LSR sets the R bit (value 1) to inform peers about its BGP control-plane restart. In addition, for each <AFI, SAFI> pair (with the SAFI = 4 indicating NLRI with label information) for which the LSR was able to preserve the MPLS forwarding state across the restart, it sets the F bit (value 1) to indicate that it has preserved the concerned MPLS forwarding state.
After the BGP session has been reestablished, the restarting LSR receives Update messages from peers and rebuilds its Adj-RIBs-In. Because route selection must wait until the RIB is rebuilt completely, for each address family the restarting LSR defers its route-selection process until it has received the End-Of-RIB marker from all peers. The restarting LSR does not need to wait for End-Of-RIB markers from peers, because those markers have either restarted or are graceful restart-incapable. This is because those peers are either not expected to send or are incapable of sending the End-Of-RIB marker. To avoid waiting for the arrival of End-Of-RIB markers and deferring the route-selection process indefinitely, the restarting router uses a (configurable) delay timer. When the delay timer has expired, the route-selection process is run regardless of the fact that not all the End-Of-RIB markers may have been received. After the restarting LSR has selected the best routes, the LSR updates its Loc-RIB, FIB, and Adj-RIBs-Out and advertises routes and the locally (incoming) assigned labels to its peers.
As discussed previously, depending on the role of the restarting LSR as an edge or transit router, the MPLS forwarding state may consist of <FEC, out label, next hop> for an ingress LSR, <in label, out label, next hop> for a transit LSR, or <in label, FEC, next hop> for an egress LSR. Edge LSRs perform the role of both ingress and egress LSRs and thus contain forwarding state entries in both the first and last format.
After restart, the restarting LSR relearns the outgoing label-to-FEC (NLRI in the Update message) mapping from the nonrestarting LSR and then updates/replaces the corresponding stale entry in the LFIB. Regarding the FEC-to-incoming label mapping, the restarting LSR has two options: re-advertise the same label that was preserved across the restart or allocate a new local label and then advertise the new local label-to-FEC mapping. Each approach has advantages and disadvantages. For example, the first approach has the advantage that FEC-to-incoming label mappings are not changed after the restart, but consume more memory. In contrast, the second approach obviates the need for preserving and managing the FEC-to-incoming label mappings across the LDP restart, but consumes more labels. In particular, if the restarting LSR does not have at least as many unallocated as allocated labels during the restart, the second approach could lead to label depletion. If the restarting LSR chooses the second option, for a short period it may have two incoming labels for an FEC in the LFIB, namely, the stale and the new one. Even though there are two incoming label mappings for an FEC, the FEC will have a single outgoing label and the next hop. Hence, there is no danger of incorrect forwarding. On completion of the restart procedure, pre-restart stale FEC-to-incoming label mappings are deleted (see Figure 7-9).
Figure 7-9. MPLS Forwarding State Across BGP Graceful Restart
[View full size image]

In summary, the restarting LSR relearns the FEC-to-outgoing label mappings from downstream peers. For the incoming label mappings, the restarting LSR either reclaims the pre-restart incoming label mappings (marked stale) from the preserved forwarding or allocates a new incoming label mapping. Suppose a unidirectional LSP is passing through the restarting LSR B, with LSR A and LSR C acting as the upstream and the downstream LSRs for the LSP. Suppose the concerned LSP has an incoming and outgoing label pair of (L1, L2) in LSR A, (L2, L3) in LSR B, and (L3, L4) in LSR C (see Figure 7-10). In this example scenario, LSR B is the restarting LSR, whereas LSR A and C are helpers. After restart, LSR B relearns L3 from LSR C and re-advertises either L2 or a new label to LSR A. The LSR A receives either L2 or a new label and updates the out label for the LSP. Similar concepts apply for LSPs in the other direction.
Figure 7-10. Example of Label-Recovery Procedure
[View full size image]

On completion of the initial update for an address family, the restarting LSR sends an End-Of-RIB marker to all its peers. As mentioned earlier, upon restarting the LSR marks all preserved MPLS forwarding state as stale and starts a stale timer. Because the RIB/LIB has been rebuilt and the decision process run, all valid forwarding states should have been updated in the FIB and the LFIB updated. If some LFIB entries are still marked as stale, these correspond to invalid forwarding states that need to be removed. (A similar procedure applies to the IP forwarding state, but is not included here.) Therefore, when the stale state timer expires, any remaining stale LFIB entries are deleted.
Behavior of Helper LSRs
As soon as an LSR detects the failure of a BGP session to a graceful restart neighbor, it starts the restart timer. For each address family for which the restarting peer had expressed its capability to preserve the MPLS forwarding state across the BGP restart, the helper LSR retains all FEC-to-label mappings learned from the restarting LSR. The helper LSR marks the concerned MPLS forwarding state as stale and starts a stale timer. While the restart timer is running, the helper LSR waits for the session to reestablish and continues to use the stale forwarding information. If the restarting peer does not reestablish the BGP session before the expiration of the restart timer, the helper LSR immediately deletes the stale MPLS forwarding state. However, if the restarting LSR manages to reestablish a BGP session on time, the helper LSR cancels the restart timer and processes the newly received graceful restart capability.
For each address family with the F bit set, the helper LSR continues to use the stale MPLS forwarding state. However, if any of these conditions exist, the helper LSR immediately removes the stale MPLS forwarding state:
• The F bit for an address family is not set.
• The address family is absent in the newly received graceful restart capability.
• The graceful restart capability is not present in the newly received Open message.
To allow the restarting LSR to rebuild the RIB and recover FEC-to-out label mappings, the helper LSR advertises BGP routes and the locally assigned labels to the restarting LSR. When the update for an address family is complete, the helper LSR sends an End-Of-RIB marker. After having received the updates and run the decision process, the restarting LSR in turn advertises new routes and the associated labels information back to the helper LSR. After receiving updates from the restarting LSR, the helper LSR replaces/updates the stale routes and labels (FEC-to-out label mappings). On receipt of the End-Of-RIB marker from the restarting LSR for an address family, the helper LSR runs the decision process and updates the routes for that address family. At this stage, all valid routes and the associated labels for the concerned address family should have been updated. Therefore, when the End-Of-RIB marker is received for an address family, labels that are still marked as stale are immediately removed. To avoid waiting for the arrival of all the End-Of-RIB markers and retaining stale MPLS forwarding state indefinitely, when the stale timer expires, the LSR deletes all stale MPLS forwarding entries.



