Optical Layer Protection and Restoration

The optical layer offers two distinct survivability primitives: protection, which pre-allocates a backup path and switches in tens of milliseconds without computation, and restoration, which computes a replacement path on demand using control-plane signalling. Choosing between them — and coordinating with higher layers (OTN, MPLS-TE FRR, IP) — determines whether a fibre cut produces a 50 ms blip, a 200 ms reroute, or a multi-second outage that triggers BGP flaps. This chapter covers ITU-T G.873.1 linear protection (1+1, 1:1, 1:N), ring-based OUPSR/OSNCP, mesh restoration via the optical control plane, and the inter-layer hold-off-timer logic that prevents simultaneous reactions across IP, OTN, and optical.

Concept	What it says
1+1 OMS protection	Dedicated bridged transmit — every signal is duplicated onto two diverse paths, the head-end selector picks the better one. Sub-50 ms switch time, 100 % capacity overhead. The simplest and fastest scheme, and the one chosen for the most critical lambdas.
OUPSR / OSNCP	Optical Unidirectional Path-Switched Ring and Optical Sub-Network Connection Protection. Ring-based 1+1 variants where both directions of the ring carry the same signal and the receiver picks. Familiar from SDH MS-SPRing/UPSR but at the optical (lambda) layer.
Shared mesh restoration	The optical control plane (GMPLS or SDN) computes a replacement wavelength on demand after a failure. Capacity-efficient — backup wavelengths are shared across many primary paths — but switching time is seconds, not milliseconds.
Hold-off timers	The mechanism that prevents IP, OTN, and optical layers from all reacting simultaneously to the same fibre cut. Each layer waits a configured interval before declaring its action; lower layers act first, higher layers stay quiet.

Linear Protection — ITU-T G.873.1

Linear protection assigns one or more backup (“protection”) channels to a working channel along a point-to-point optical path. ITU-T G.873.1 defines three architectures, distinguished by how protection capacity is shared:

Scheme	Bridge	Selector	Working : Protection	Switch time	Capacity overhead	Use case
1+1	Permanent — signal duplicated onto both paths	Head-end	1 : 1	<50 ms	100 %	Mission-critical OMS
1:1	Switched — protection idle until failure	Both ends	1 : 1	<50 ms	100 %, but protection capacity reusable for low-priority extra traffic	Critical, with extra-traffic optimisation
1:N	Switched	Both ends	N working share 1 protection	<50 ms (single failure)	1/N	Cost-optimised, single-failure tolerant only

In 1+1, the transmitter unconditionally splits the signal onto two diverse paths; the receiver continuously monitors both and selects whichever has acceptable performance. There is no signalling on failure — the selector simply switches. Switch time is dominated by failure-detection time (typically 10 ms via Loss-of-Light or G.873.1 Automatic Protection Switching frames) plus selector hardware response.

In 1:1, the protection path is idle by default and may carry “extra traffic” (low-priority lambdas that get pre-empted on failure). On failure, both endpoints signal via APS bytes to switch over.

In 1:N, a single protection channel covers N working channels. Cheap, but a second simultaneous failure within the same protection group is unprotected.

Warning

1:N is single-failure tolerant. In an SRLG (Shared Risk Link Group) scenario where two working paths share a duct or conduit, a single backhoe can take down two working paths simultaneously — the protection channel covers only one. SRLG-aware planning is mandatory before quoting “1:N protected” to a customer.

Ring-Based Optical Protection — OUPSR / OSNCP

Ring topologies dominate metro and regional networks because they offer two physically diverse paths between any pair of nodes via the two rotation directions. OUPSR (Optical Unidirectional Path-Switched Ring) and OSNCP (Optical Sub-Network Connection Protection) are the optical-layer equivalents of SDH UPSR and SNCP — the same architecture applied at the wavelength layer rather than the VC-4 / STS-3c layer.

flowchart LR
    subgraph RING ["OUPSR / OSNCP — both directions used"]
        N1["Node A<br/>TX broadcasts<br/>both directions"]
        N2["Node B"]
        N3["Node C"]
        N4["Node D<br/>RX selects<br/>better path"]
        N1 -->|"clockwise<br/>working"| N2
        N2 -->|"clockwise"| N3
        N3 -->|"clockwise"| N4
        N1 -->|"anticlockwise<br/>protection"| N4
    end
    subgraph MESH ["Shared mesh restoration"]
        M1["Node A"] -->|"primary"| M2["Node B"]
        M2 --> M3["Node C"]
        M3 --> M4["Node D"]
        M1 -.->|"computed on failure<br/>via PCE/GMPLS"| M5["Node E"]
        M5 -.-> M4
    end
    style N1 fill:#378ADD,stroke:#185FA5,color:#fff
    style N4 fill:#378ADD,stroke:#185FA5,color:#fff
    style N2 fill:#7F77DD,stroke:#534AB7,color:#fff
    style N3 fill:#7F77DD,stroke:#534AB7,color:#fff
    style M1 fill:#1D9E75,stroke:#0F6E56,color:#fff
    style M4 fill:#1D9E75,stroke:#0F6E56,color:#fff
    style M2 fill:#BA7517,stroke:#854F0B,color:#fff
    style M3 fill:#BA7517,stroke:#854F0B,color:#fff
    style M5 fill:#D85A30,stroke:#993C1D,color:#fff

Left: ring-based 1+1 — both directions carry the signal, receiver selects. Right: mesh restoration — backup path computed on demand by the control plane.

OUPSR/OSNCP are mechanically simple: each working channel is bridged at the source onto both the clockwise and anticlockwise ring directions; the receiving node has a 2:1 selector that picks the better direction. A fibre cut on the ring takes down only one direction; the receiver switches to the other within 50 ms. The trade-off is identical to 1+1: half the ring capacity is consumed by protection.

Key Insight

OUPSR is a ring-shaped 1+1, not a fundamentally different protection class. The architectural advantage is that ring fibre routing usually delivers genuine SRLG diversity for free — the two ring directions follow physically different ducts. On a linear point-to-point with two parallel fibres, achieving the same diversity requires manual SRLG-aware route engineering.

Shared-Mesh Restoration

Mesh restoration trades switching speed for capacity efficiency. Instead of dedicating a backup wavelength to each primary, a smaller pool of restoration capacity is shared across many primary paths — the assumption being that simultaneous failures requiring restoration of all primaries at once is statistically unlikely.

When a failure is detected, the optical control plane (see 12-optical-control-plane-and-automation) runs path computation, allocates a new wavelength along a diverse route, configures intermediate ROADMs, and signals the new path. The process takes seconds (best case) to tens of seconds (large topologies, congested PCE).

Mesh restoration step	Typical duration	Notes
Failure detection (LoL, BDI, OPM alarm)	10–50 ms	Fastest is direct loss-of-light
Alarm propagation to controller	100–500 ms	Telemetry pipeline latency
Path computation by PCE	100 ms – 5 s	Depends on impairment-aware vs simple shortest-path
Wavelength assignment + intermediate WSS reconfiguration	1–10 s	WSS settling time dominates
End-to-end restoration	2–20 s	Order of magnitude slower than protection

The capacity advantage is real: in a typical mesh, shared restoration achieves equivalent single-failure survivability with roughly 30–50 % protection capacity, versus 100 % for 1+1. For non-real-time-critical traffic (Internet backbone, content distribution), the multi-second restoration window is acceptable; for trading systems and 5G fronthaul, 1+1 remains mandatory.

Protection-Class Comparison

The end-to-end choice is rarely “one scheme” — operators typically deploy a mix per service tier.

Class	Architecture	Switch time	Capacity overhead	Complexity	Best fit
1+1 (linear)	Dedicated dual transmit	<50 ms	100 %	Low	Premium leased lambdas, financial connectivity
1:1 (linear)	Switched, with extra traffic	<50 ms	100 % (with offset)	Medium	Operator core with low-priority overflow
1:N (linear)	Switched, shared protection	<50 ms (single failure)	1/N	Medium	Cost-optimised aggregation, non-critical
OUPSR / OSNCP	Ring 1+1	<50 ms	100 %	Low	Metro / regional rings
Shared mesh	Control-plane restoration	2–20 s	30–50 %	High	Backbone, Internet, CDN backhaul
Unprotected	None	n/a	0 %	None	Best-effort, redundant via IP layer

Rule of Thumb

If the customer SLA cites a specific switch-time number (e.g. “less than 50 ms”), the answer is 1+1. If the SLA cites only annual availability targets (five-nines, four-nines), shared-mesh restoration is usually cheaper for the same headline number — see 14-optical-slas-availability-and-ops-reality.

Multi-Layer Protection Coordination

A modern carrier network has at least three layers capable of reacting to failure: IP / MPLS-TE FRR, OTN protection (G.873.1 SNCP at the OTN layer), and optical. If all three react simultaneously to the same fibre cut, the result is chaos — IP reroutes onto a path that OTN is also trying to switch onto that the optical layer is also restoring. The standard mechanism for ordering reactions is the hold-off timer.

The principle: lower layers act first. The optical layer detects loss-of-light within 10 ms and switches if it has protection capacity. OTN waits 100 ms (its hold-off timer) before declaring its own failure — by which time, if optical protection succeeded, OTN sees the recovered signal and stays quiet. IP / MPLS-TE FRR waits even longer (typically 200–500 ms) before reconverging.

Layer	Detection time	Hold-off (default)	Total reaction
Optical (1+1, OUPSR)	10 ms	0 (no hold-off, lowest layer)	<50 ms
Optical (mesh restoration)	50 ms	0	2–20 s
OTN G.873.1 SNCP	10–50 ms	100 ms	<200 ms
MPLS-TE FRR	50 ms	100–500 ms	<1 s
IP IGP convergence	100–500 ms	200 ms – several s	seconds
BGP convergence	seconds	n/a (highest layer)	tens of seconds

flowchart TD
    EVENT["Fibre cut at T=0"]
    EVENT -->|"+10 ms"| OPT_DETECT["Optical detects LoL<br/>1+1 selector switches?"]
    OPT_DETECT -->|"yes — recovered"| DONE_OPT["Recovered <50 ms<br/>upper layers see no event"]
    OPT_DETECT -->|"no — no protection"| OTN_HOLD["OTN holdoff 100 ms<br/>then SNCP switch"]
    OTN_HOLD -->|"recovered"| DONE_OTN["Recovered <200 ms"]
    OTN_HOLD -->|"no protection"| MPLS_HOLD["MPLS-TE FRR<br/>100-500 ms holdoff"]
    MPLS_HOLD -->|"recovered"| DONE_MPLS["Recovered <1 s"]
    MPLS_HOLD -->|"no FRR"| IGP["IP IGP reconverge<br/>seconds"]
    IGP --> BGP["BGP reconverge<br/>tens of seconds"]
    style EVENT fill:#E24B4A,stroke:#A32D2D,color:#fff
    style OPT_DETECT fill:#1D9E75,stroke:#0F6E56,color:#fff
    style DONE_OPT fill:#378ADD,stroke:#185FA5,color:#fff
    style OTN_HOLD fill:#BA7517,stroke:#854F0B,color:#fff
    style DONE_OTN fill:#378ADD,stroke:#185FA5,color:#fff
    style MPLS_HOLD fill:#7F77DD,stroke:#534AB7,color:#fff
    style DONE_MPLS fill:#378ADD,stroke:#185FA5,color:#fff
    style IGP fill:#D85A30,stroke:#993C1D,color:#fff
    style BGP fill:#D85A30,stroke:#993C1D,color:#fff

Hold-off timer cascade — each layer gives the layer below it time to recover before declaring its own action. Tuning these timers is operator-specific and requires care.

Inter-Layer Protection Escalation Matrix

Failure scope	Optical layer	OTN layer	MPLS-TE layer	IP layer
Single-span fibre cut, optical 1+1 protected	Switch (<50 ms)	Hold off — see recovery, no action	Hold off — see recovery, no action	Hold off — no action
Single-span fibre cut, no optical protection	No action	SNCP switch (<200 ms)	Hold off	Hold off
Multi-span SRLG cut affecting both 1+1 paths	Loss of working AND protection	Hold off, then SNCP if path computable	FRR backup tunnel (if pre-computed)	Reconverge (last resort)
Transponder failure (single-card)	No action (lambda gone)	SNCP switch	Hold off	Hold off
ROADM node-wide failure	Mesh restoration if configured	SNCP if alternate OTN path	Bypass via FRR	IGP reconverge
Whole landing station failure (subsea)	Mesh restoration impossible — site-down	n/a	Bypass via diverse cable	IGP reconverge

Warning

Multi-layer hold-off timers are operator-tuned and easy to mis-configure. The classic failure mode is “no layer switches” — every layer is waiting for the layer below it, and every layer below was hoping the layer above would handle it. Hold-off coordination must be reviewed end-to-end every time a new protection scheme is deployed at any layer.

Hard-Wired vs SDN-Driven Restoration

Protection (1+1, OUPSR) is hard-wired in hardware: the selector is a physical switch driven by alarm logic. It is deterministic — the same input always produces the same output, switching time is bounded by hardware response, and there is nothing to debug at the protocol level. The downside is rigidity: a change in topology, capacity, or risk profile requires re-engineering the protection design.

SDN-driven restoration (a controller listens for alarms, computes a new path, programs WSS configurations, and verifies the result) is flexible. The same control plane that builds a new circuit also restores a failed one — operators can change strategy in software, prioritise differently per customer, and account for impairments and current channel loading.

Dimension	Hard-wired protection	SDN-driven restoration
Switching time	Deterministic, <50 ms	Variable, 2–20 s
Capacity overhead	High (1+1 = 100 %)	Low (mesh ~30 %)
Topology flexibility	Re-engineering required	Changes are configuration
Failure modes	Hardware only	Hardware + software + controller availability
Debug surface	Alarm bytes, hardware logs	Multi-system trace across controller, NEs, telemetry pipeline
Best fit	Premium, low-volume, real-time	Commodity bulk capacity, Internet backbone

Rule of Thumb

Use hard-wired 1+1 / OUPSR for premium services and SDN restoration for bulk capacity. The economic boundary is roughly: services billed per-end-point at five-nines availability and sub-100 ms switch SLA → 1+1; services billed per-Gbps with four-nines availability → mesh restoration.

References

Standards (ITU-T)

ITU-T G.873.1 — Optical Transport Network — Linear protection (03/2022). https://www.itu.int/rec/T-REC-G.873.1
ITU-T G.873.2 — ODUk shared ring protection. https://www.itu.int/rec/T-REC-G.873.2
ITU-T G.808.1 — Generic protection switching — Linear trail and subnetwork protection. https://www.itu.int/rec/T-REC-G.808.1
ITU-T G.872 — Architecture of optical transport networks (10/2017). https://www.itu.int/rec/T-REC-G.872
ITU-T G.798 — Characteristics of optical transport network hierarchy equipment functional blocks (12/2017). https://www.itu.int/rec/T-REC-G.798

Standards (IETF)

RFC 4090 — Fast Reroute Extensions to RSVP-TE for LSP Tunnels (05/2005). https://www.rfc-editor.org/rfc/rfc4090
RFC 4427 — Recovery (Protection and Restoration) Terminology for Generalized Multi-Protocol Label Switching (GMPLS) (03/2006). https://www.rfc-editor.org/rfc/rfc4427
RFC 4428 — Analysis of Generalized Multi-Protocol Label Switching (GMPLS)-based Recovery Mechanisms (03/2006). https://www.rfc-editor.org/rfc/rfc4428
RFC 6163 — Framework for GMPLS and Path Computation Element (PCE) Control of Wavelength Switched Optical Networks (WSONs) (04/2011). https://www.rfc-editor.org/rfc/rfc6163

Books

R. Ramaswami, K. N. Sivarajan, G. H. Sasaki, Optical Networks: A Practical Perspective, 3rd ed., Morgan Kaufmann, 2009.
V. Alwayn, Optical Network Design and Implementation, Cisco Press, 2004.

Transport Network Guide

Explorer

Ch 11 — Optical Layer Protection and Restoration

Optical Layer Protection and Restoration

Linear Protection — ITU-T G.873.1

Ring-Based Optical Protection — OUPSR / OSNCP

Shared-Mesh Restoration

Protection-Class Comparison

Multi-Layer Protection Coordination

Inter-Layer Protection Escalation Matrix

Hard-Wired vs SDN-Driven Restoration

See Also

References

Standards (ITU-T)

Standards (IETF)

Books

Graph View

Table of Contents

Backlinks