Fault Management and Recovery for Carrier Ethernet
Fault Management and Recovery for Carrier Ethernet
An important element of service lifecycle management is the ability to identify, locate and notify service-affecting problems by various network elements, most importantly by the demarcation devices at the service hand-off points. Fault detection and isolation, together with remote test and repair functionalities, are crucial for containing and correcting problems before they escalate. When a service outage is reported, a suite of tests is performed to remotely localize the fault prior to a technician dispatch. This reduces MTTR (mean time to repair) and minimizes the effect on users, while lowering operating expenses by eliminating unnecessary (and expensive) truck rolls and ensuring that technicians are sent to the right location.
Fault Detection, Isolation and Notification
Periodic 802.1ag/Y.1731 CC frames are sent at pre-set intervals to check the status of link and service connectivity for preventive maintenance. When loss of continuity occurs, the originating MEP activates the RDI flag in the next CC message to indicate connectivity problem, notifies the user and sends an alarm to the NMS. This procedure can also be used to initiate an uplink connection switchover. The specific failure point can be located by sending a link trace request for hop-by-hop path tracking, to identify non-responsive MIPs. The link trace test results display the responsive nodes, thus enabling operators to map the service path and dispatch a technician to the right location for a quick repair. Alternatively, the operator can use the 802.1ag loopback test to isolate the problem, by looping successive intermediary points in the path until the fault is identified. In addition to the RDI and AIS fault notification tools, the demarcation device can automatically shut down user ports when error conditions on the network end are detected. This alerts customer equipment on both ends of the link that an alternative route is required.
Diagnostic Loopbacks on the Carrier Ethernet Network
Remote diagnostic loopback tests can be performed in-service, to analyze service connectivity across EVCs without taking the customer link down or affecting untested traffic. For example, a customer service representative (CSR) working a Trouble Ticket relating to a specific EVC can generate an end-to-end flow loopback test between the demarcation devices at headquarters and branch B. The procedure is selective and executed per a variety of flow criteria, including VLAN ID, class of service (P-bit) and source or destination MAC or IP address. This allows the loopback messages to traverse multiple hops, including intermediary switches or bridges, without disrupting the traffic flows that are not being tested.
The receiving device swaps the source and destination MAC or IP addresses of incoming packets prior to looping them back, so as not to create a conflict in the switches or bridges along the path.
Where diagnostic loopbacks are performed according to selective criteria, network operators can allocate a small portion of EVC bandwidth for constant, “Always On” loopbacks, whereby Ethernet circuit validation is executed without the need for signaling the remote device. In addition to simplifying operations, this also eliminates interoperability issues between the demarcation device and Ethernet test sets.
Furthermore, as a single demarcation device supports multiple loopback flows simultaneously, both towards the user and the network, the number of test sets can be significantly reduced to allow multiple independent SLA monitoring instances per demarcation device. For example, the same demarcation device can be used by a wholesale provider, a retail provider or mobile operator and by the customer – each using a different loopback classification key.
Resiliency and Repair of the Carrier Ethernet Network
Service resiliency and protection are critical for ensuring High Availability and speedy restoration in the event of network outages. Without proper redundancy for link and path protection, even brief failures may result in compromised QoE (quality of experience) for the user, due to retransmissions or even loss of service altogether.
The demarcation devices address these issues by providing various tools for ensuring resiliency of the access link, as well as the end-to-end connection. Examples of such tools include link aggregation group (LAG) using IEEE 802.3-2005 LACP (link aggregation control protocol), in which parallel links between a demarcation device and the neighboring PE (provider edge) are bundled to a single virtual link; Ethernet linear protection (G.8031) – also called “EVC protection” – which uses OAM continuity check messages to detect EVC failure and revert to a backup path. In addition, Ethernet Ring Protection Switching (G.8032 ERPS) uses ring topology to ensure traffic resiliency with fast switch-over mechanism.
Return to Carrier Ethernet with RAD products