Optical Layer Protection and Restoration
The optical layer offers two distinct survivability primitives: protection, which pre-allocates a backup path and switches in tens of milliseconds without computation, and restoration, which computes a replacement path on demand using control-plane signalling. Choosing between them — and coordinating with higher layers (OTN, MPLS-TE FRR, IP) — determines whether a fibre cut produces a 50 ms blip, a 200 ms reroute, or a multi-second outage that triggers BGP flaps. This chapter covers ITU-T G.873.1 linear protection (1+1, 1:1, 1:N), ring-based OUPSR/OSNCP, mesh restoration via the optical control plane, and the inter-layer hold-off-timer logic that prevents simultaneous reactions across IP, OTN, and optical.
| Concept | What it says |
|---|---|
| 1+1 OMS protection | Dedicated bridged transmit — every signal is duplicated onto two diverse paths, the head-end selector picks the better one. Sub-50 ms switch time, 100 % capacity overhead. The simplest and fastest scheme, and the one chosen for the most critical lambdas. |
| OUPSR / OSNCP | Optical Unidirectional Path-Switched Ring and Optical Sub-Network Connection Protection. Ring-based 1+1 variants where both directions of the ring carry the same signal and the receiver picks. Familiar from SDH MS-SPRing/UPSR but at the optical (lambda) layer. |
| Shared mesh restoration | The optical control plane (GMPLS or SDN) computes a replacement wavelength on demand after a failure. Capacity-efficient — backup wavelengths are shared across many primary paths — but switching time is seconds, not milliseconds. |
| Hold-off timers | The mechanism that prevents IP, OTN, and optical layers from all reacting simultaneously to the same fibre cut. Each layer waits a configured interval before declaring its action; lower layers act first, higher layers stay quiet. |
Linear Protection — ITU-T G.873.1
Linear protection assigns one or more backup (“protection”) channels to a working channel along a point-to-point optical path. ITU-T G.873.1 defines three architectures, distinguished by how protection capacity is shared:
| Scheme | Bridge | Selector | Working : Protection | Switch time | Capacity overhead | Use case |
|---|---|---|---|---|---|---|
| 1+1 | Permanent — signal duplicated onto both paths | Head-end | 1 : 1 | <50 ms | 100 % | Mission-critical OMS |
| 1:1 | Switched — protection idle until failure | Both ends | 1 : 1 | <50 ms | 100 %, but protection capacity reusable for low-priority extra traffic | Critical, with extra-traffic optimisation |
| 1:N | Switched | Both ends | N working share 1 protection | <50 ms (single failure) | 1/N | Cost-optimised, single-failure tolerant only |
In 1+1, the transmitter unconditionally splits the signal onto two diverse paths; the receiver continuously monitors both and selects whichever has acceptable performance. There is no signalling on failure — the selector simply switches. Switch time is dominated by failure-detection time (typically 10 ms via Loss-of-Light or G.873.1 Automatic Protection Switching frames) plus selector hardware response.
In 1:1, the protection path is idle by default and may carry “extra traffic” (low-priority lambdas that get pre-empted on failure). On failure, both endpoints signal via APS bytes to switch over.
In 1:N, a single protection channel covers N working channels. Cheap, but a second simultaneous failure within the same protection group is unprotected.
Warning
1:N is single-failure tolerant. In an SRLG (Shared Risk Link Group) scenario where two working paths share a duct or conduit, a single backhoe can take down two working paths simultaneously — the protection channel covers only one. SRLG-aware planning is mandatory before quoting “1:N protected” to a customer.
Ring-Based Optical Protection — OUPSR / OSNCP
Ring topologies dominate metro and regional networks because they offer two physically diverse paths between any pair of nodes via the two rotation directions. OUPSR (Optical Unidirectional Path-Switched Ring) and OSNCP (Optical Sub-Network Connection Protection) are the optical-layer equivalents of SDH UPSR and SNCP — the same architecture applied at the wavelength layer rather than the VC-4 / STS-3c layer.
flowchart LR subgraph RING ["OUPSR / OSNCP — both directions used"] N1["Node A<br/>TX broadcasts<br/>both directions"] N2["Node B"] N3["Node C"] N4["Node D<br/>RX selects<br/>better path"] N1 -->|"clockwise<br/>working"| N2 N2 -->|"clockwise"| N3 N3 -->|"clockwise"| N4 N1 -->|"anticlockwise<br/>protection"| N4 end subgraph MESH ["Shared mesh restoration"] M1["Node A"] -->|"primary"| M2["Node B"] M2 --> M3["Node C"] M3 --> M4["Node D"] M1 -.->|"computed on failure<br/>via PCE/GMPLS"| M5["Node E"] M5 -.-> M4 end style N1 fill:#378ADD,stroke:#185FA5,color:#fff style N4 fill:#378ADD,stroke:#185FA5,color:#fff style N2 fill:#7F77DD,stroke:#534AB7,color:#fff style N3 fill:#7F77DD,stroke:#534AB7,color:#fff style M1 fill:#1D9E75,stroke:#0F6E56,color:#fff style M4 fill:#1D9E75,stroke:#0F6E56,color:#fff style M2 fill:#BA7517,stroke:#854F0B,color:#fff style M3 fill:#BA7517,stroke:#854F0B,color:#fff style M5 fill:#D85A30,stroke:#993C1D,color:#fff
Left: ring-based 1+1 — both directions carry the signal, receiver selects. Right: mesh restoration — backup path computed on demand by the control plane.
OUPSR/OSNCP are mechanically simple: each working channel is bridged at the source onto both the clockwise and anticlockwise ring directions; the receiving node has a 2:1 selector that picks the better direction. A fibre cut on the ring takes down only one direction; the receiver switches to the other within 50 ms. The trade-off is identical to 1+1: half the ring capacity is consumed by protection.
Key Insight
OUPSR is a ring-shaped 1+1, not a fundamentally different protection class. The architectural advantage is that ring fibre routing usually delivers genuine SRLG diversity for free — the two ring directions follow physically different ducts. On a linear point-to-point with two parallel fibres, achieving the same diversity requires manual SRLG-aware route engineering.
Shared-Mesh Restoration
Mesh restoration trades switching speed for capacity efficiency. Instead of dedicating a backup wavelength to each primary, a smaller pool of restoration capacity is shared across many primary paths — the assumption being that simultaneous failures requiring restoration of all primaries at once is statistically unlikely.
When a failure is detected, the optical control plane (see 12-optical-control-plane-and-automation) runs path computation, allocates a new wavelength along a diverse route, configures intermediate ROADMs, and signals the new path. The process takes seconds (best case) to tens of seconds (large topologies, congested PCE).
| Mesh restoration step | Typical duration | Notes |
|---|---|---|
| Failure detection (LoL, BDI, OPM alarm) | 10–50 ms | Fastest is direct loss-of-light |
| Alarm propagation to controller | 100–500 ms | Telemetry pipeline latency |
| Path computation by PCE | 100 ms – 5 s | Depends on impairment-aware vs simple shortest-path |
| Wavelength assignment + intermediate WSS reconfiguration | 1–10 s | WSS settling time dominates |
| End-to-end restoration | 2–20 s | Order of magnitude slower than protection |
The capacity advantage is real: in a typical mesh, shared restoration achieves equivalent single-failure survivability with roughly 30–50 % protection capacity, versus 100 % for 1+1. For non-real-time-critical traffic (Internet backbone, content distribution), the multi-second restoration window is acceptable; for trading systems and 5G fronthaul, 1+1 remains mandatory.
Protection-Class Comparison
The end-to-end choice is rarely “one scheme” — operators typically deploy a mix per service tier.
| Class | Architecture | Switch time | Capacity overhead | Complexity | Best fit |
|---|---|---|---|---|---|
| 1+1 (linear) | Dedicated dual transmit | <50 ms | 100 % | Low | Premium leased lambdas, financial connectivity |
| 1:1 (linear) | Switched, with extra traffic | <50 ms | 100 % (with offset) | Medium | Operator core with low-priority overflow |
| 1:N (linear) | Switched, shared protection | <50 ms (single failure) | 1/N | Medium | Cost-optimised aggregation, non-critical |
| OUPSR / OSNCP | Ring 1+1 | <50 ms | 100 % | Low | Metro / regional rings |
| Shared mesh | Control-plane restoration | 2–20 s | 30–50 % | High | Backbone, Internet, CDN backhaul |
| Unprotected | None | n/a | 0 % | None | Best-effort, redundant via IP layer |
Rule of Thumb
If the customer SLA cites a specific switch-time number (e.g. “less than 50 ms”), the answer is 1+1. If the SLA cites only annual availability targets (five-nines, four-nines), shared-mesh restoration is usually cheaper for the same headline number — see 14-optical-slas-availability-and-ops-reality.
Multi-Layer Protection Coordination
A modern carrier network has at least three layers capable of reacting to failure: IP / MPLS-TE FRR, OTN protection (G.873.1 SNCP at the OTN layer), and optical. If all three react simultaneously to the same fibre cut, the result is chaos — IP reroutes onto a path that OTN is also trying to switch onto that the optical layer is also restoring. The standard mechanism for ordering reactions is the hold-off timer.
The principle: lower layers act first. The optical layer detects loss-of-light within 10 ms and switches if it has protection capacity. OTN waits 100 ms (its hold-off timer) before declaring its own failure — by which time, if optical protection succeeded, OTN sees the recovered signal and stays quiet. IP / MPLS-TE FRR waits even longer (typically 200–500 ms) before reconverging.
| Layer | Detection time | Hold-off (default) | Total reaction |
|---|---|---|---|
| Optical (1+1, OUPSR) | 10 ms | 0 (no hold-off, lowest layer) | <50 ms |
| Optical (mesh restoration) | 50 ms | 0 | 2–20 s |
| OTN G.873.1 SNCP | 10–50 ms | 100 ms | <200 ms |
| MPLS-TE FRR | 50 ms | 100–500 ms | <1 s |
| IP IGP convergence | 100–500 ms | 200 ms – several s | seconds |
| BGP convergence | seconds | n/a (highest layer) | tens of seconds |
flowchart TD EVENT["Fibre cut at T=0"] EVENT -->|"+10 ms"| OPT_DETECT["Optical detects LoL<br/>1+1 selector switches?"] OPT_DETECT -->|"yes — recovered"| DONE_OPT["Recovered <50 ms<br/>upper layers see no event"] OPT_DETECT -->|"no — no protection"| OTN_HOLD["OTN holdoff 100 ms<br/>then SNCP switch"] OTN_HOLD -->|"recovered"| DONE_OTN["Recovered <200 ms"] OTN_HOLD -->|"no protection"| MPLS_HOLD["MPLS-TE FRR<br/>100-500 ms holdoff"] MPLS_HOLD -->|"recovered"| DONE_MPLS["Recovered <1 s"] MPLS_HOLD -->|"no FRR"| IGP["IP IGP reconverge<br/>seconds"] IGP --> BGP["BGP reconverge<br/>tens of seconds"] style EVENT fill:#E24B4A,stroke:#A32D2D,color:#fff style OPT_DETECT fill:#1D9E75,stroke:#0F6E56,color:#fff style DONE_OPT fill:#378ADD,stroke:#185FA5,color:#fff style OTN_HOLD fill:#BA7517,stroke:#854F0B,color:#fff style DONE_OTN fill:#378ADD,stroke:#185FA5,color:#fff style MPLS_HOLD fill:#7F77DD,stroke:#534AB7,color:#fff style DONE_MPLS fill:#378ADD,stroke:#185FA5,color:#fff style IGP fill:#D85A30,stroke:#993C1D,color:#fff style BGP fill:#D85A30,stroke:#993C1D,color:#fff
Hold-off timer cascade — each layer gives the layer below it time to recover before declaring its own action. Tuning these timers is operator-specific and requires care.
Inter-Layer Protection Escalation Matrix
| Failure scope | Optical layer | OTN layer | MPLS-TE layer | IP layer |
|---|---|---|---|---|
| Single-span fibre cut, optical 1+1 protected | Switch (<50 ms) | Hold off — see recovery, no action | Hold off — see recovery, no action | Hold off — no action |
| Single-span fibre cut, no optical protection | No action | SNCP switch (<200 ms) | Hold off | Hold off |
| Multi-span SRLG cut affecting both 1+1 paths | Loss of working AND protection | Hold off, then SNCP if path computable | FRR backup tunnel (if pre-computed) | Reconverge (last resort) |
| Transponder failure (single-card) | No action (lambda gone) | SNCP switch | Hold off | Hold off |
| ROADM node-wide failure | Mesh restoration if configured | SNCP if alternate OTN path | Bypass via FRR | IGP reconverge |
| Whole landing station failure (subsea) | Mesh restoration impossible — site-down | n/a | Bypass via diverse cable | IGP reconverge |
Warning
Multi-layer hold-off timers are operator-tuned and easy to mis-configure. The classic failure mode is “no layer switches” — every layer is waiting for the layer below it, and every layer below was hoping the layer above would handle it. Hold-off coordination must be reviewed end-to-end every time a new protection scheme is deployed at any layer.
Hard-Wired vs SDN-Driven Restoration
Protection (1+1, OUPSR) is hard-wired in hardware: the selector is a physical switch driven by alarm logic. It is deterministic — the same input always produces the same output, switching time is bounded by hardware response, and there is nothing to debug at the protocol level. The downside is rigidity: a change in topology, capacity, or risk profile requires re-engineering the protection design.
SDN-driven restoration (a controller listens for alarms, computes a new path, programs WSS configurations, and verifies the result) is flexible. The same control plane that builds a new circuit also restores a failed one — operators can change strategy in software, prioritise differently per customer, and account for impairments and current channel loading.
| Dimension | Hard-wired protection | SDN-driven restoration |
|---|---|---|
| Switching time | Deterministic, <50 ms | Variable, 2–20 s |
| Capacity overhead | High (1+1 = 100 %) | Low (mesh ~30 %) |
| Topology flexibility | Re-engineering required | Changes are configuration |
| Failure modes | Hardware only | Hardware + software + controller availability |
| Debug surface | Alarm bytes, hardware logs | Multi-system trace across controller, NEs, telemetry pipeline |
| Best fit | Premium, low-volume, real-time | Commodity bulk capacity, Internet backbone |
Rule of Thumb
Use hard-wired 1+1 / OUPSR for premium services and SDN restoration for bulk capacity. The economic boundary is roughly: services billed per-end-point at five-nines availability and sub-100 ms switch SLA → 1+1; services billed per-Gbps with four-nines availability → mesh restoration.
See Also
- 04-otn-sdh-and-network-design
- 07-roadm-architectures-and-photonic-switching
- 12-optical-control-plane-and-automation
- 13-optical-test-measurement-and-commissioning
References
Standards (ITU-T)
- ITU-T G.873.1 — Optical Transport Network — Linear protection (03/2022). https://www.itu.int/rec/T-REC-G.873.1
- ITU-T G.873.2 — ODUk shared ring protection. https://www.itu.int/rec/T-REC-G.873.2
- ITU-T G.808.1 — Generic protection switching — Linear trail and subnetwork protection. https://www.itu.int/rec/T-REC-G.808.1
- ITU-T G.872 — Architecture of optical transport networks (10/2017). https://www.itu.int/rec/T-REC-G.872
- ITU-T G.798 — Characteristics of optical transport network hierarchy equipment functional blocks (12/2017). https://www.itu.int/rec/T-REC-G.798
Standards (IETF)
- RFC 4090 — Fast Reroute Extensions to RSVP-TE for LSP Tunnels (05/2005). https://www.rfc-editor.org/rfc/rfc4090
- RFC 4427 — Recovery (Protection and Restoration) Terminology for Generalized Multi-Protocol Label Switching (GMPLS) (03/2006). https://www.rfc-editor.org/rfc/rfc4427
- RFC 4428 — Analysis of Generalized Multi-Protocol Label Switching (GMPLS)-based Recovery Mechanisms (03/2006). https://www.rfc-editor.org/rfc/rfc4428
- RFC 6163 — Framework for GMPLS and Path Computation Element (PCE) Control of Wavelength Switched Optical Networks (WSONs) (04/2011). https://www.rfc-editor.org/rfc/rfc6163