Optical Layer Protection and Restoration

The optical layer offers two distinct survivability primitives: protection, which pre-allocates a backup path and switches in tens of milliseconds without computation, and restoration, which computes a replacement path on demand using control-plane signalling. Choosing between them — and coordinating with higher layers (OTN, MPLS-TE FRR, IP) — determines whether a fibre cut produces a 50 ms blip, a 200 ms reroute, or a multi-second outage that triggers BGP flaps. This chapter covers ITU-T G.873.1 linear protection (1+1, 1:1, 1:N), ring-based OUPSR/OSNCP, mesh restoration via the optical control plane, and the inter-layer hold-off-timer logic that prevents simultaneous reactions across IP, OTN, and optical.

ConceptWhat it says
1+1 OMS protectionDedicated bridged transmit — every signal is duplicated onto two diverse paths, the head-end selector picks the better one. Sub-50 ms switch time, 100 % capacity overhead. The simplest and fastest scheme, and the one chosen for the most critical lambdas.
OUPSR / OSNCPOptical Unidirectional Path-Switched Ring and Optical Sub-Network Connection Protection. Ring-based 1+1 variants where both directions of the ring carry the same signal and the receiver picks. Familiar from SDH MS-SPRing/UPSR but at the optical (lambda) layer.
Shared mesh restorationThe optical control plane (GMPLS or SDN) computes a replacement wavelength on demand after a failure. Capacity-efficient — backup wavelengths are shared across many primary paths — but switching time is seconds, not milliseconds.
Hold-off timersThe mechanism that prevents IP, OTN, and optical layers from all reacting simultaneously to the same fibre cut. Each layer waits a configured interval before declaring its action; lower layers act first, higher layers stay quiet.

Linear Protection — ITU-T G.873.1

Linear protection assigns one or more backup (“protection”) channels to a working channel along a point-to-point optical path. ITU-T G.873.1 defines three architectures, distinguished by how protection capacity is shared:

SchemeBridgeSelectorWorking : ProtectionSwitch timeCapacity overheadUse case
1+1Permanent — signal duplicated onto both pathsHead-end1 : 1<50 ms100 %Mission-critical OMS
1:1Switched — protection idle until failureBoth ends1 : 1<50 ms100 %, but protection capacity reusable for low-priority extra trafficCritical, with extra-traffic optimisation
1:NSwitchedBoth endsN working share 1 protection<50 ms (single failure)1/NCost-optimised, single-failure tolerant only

In 1+1, the transmitter unconditionally splits the signal onto two diverse paths; the receiver continuously monitors both and selects whichever has acceptable performance. There is no signalling on failure — the selector simply switches. Switch time is dominated by failure-detection time (typically 10 ms via Loss-of-Light or G.873.1 Automatic Protection Switching frames) plus selector hardware response.

In 1:1, the protection path is idle by default and may carry “extra traffic” (low-priority lambdas that get pre-empted on failure). On failure, both endpoints signal via APS bytes to switch over.

In 1:N, a single protection channel covers N working channels. Cheap, but a second simultaneous failure within the same protection group is unprotected.

Warning

1:N is single-failure tolerant. In an SRLG (Shared Risk Link Group) scenario where two working paths share a duct or conduit, a single backhoe can take down two working paths simultaneously — the protection channel covers only one. SRLG-aware planning is mandatory before quoting “1:N protected” to a customer.

Ring-Based Optical Protection — OUPSR / OSNCP

Ring topologies dominate metro and regional networks because they offer two physically diverse paths between any pair of nodes via the two rotation directions. OUPSR (Optical Unidirectional Path-Switched Ring) and OSNCP (Optical Sub-Network Connection Protection) are the optical-layer equivalents of SDH UPSR and SNCP — the same architecture applied at the wavelength layer rather than the VC-4 / STS-3c layer.

flowchart LR
    subgraph RING ["OUPSR / OSNCP — both directions used"]
        N1["Node A<br/>TX broadcasts<br/>both directions"]
        N2["Node B"]
        N3["Node C"]
        N4["Node D<br/>RX selects<br/>better path"]
        N1 -->|"clockwise<br/>working"| N2
        N2 -->|"clockwise"| N3
        N3 -->|"clockwise"| N4
        N1 -->|"anticlockwise<br/>protection"| N4
    end
    subgraph MESH ["Shared mesh restoration"]
        M1["Node A"] -->|"primary"| M2["Node B"]
        M2 --> M3["Node C"]
        M3 --> M4["Node D"]
        M1 -.->|"computed on failure<br/>via PCE/GMPLS"| M5["Node E"]
        M5 -.-> M4
    end
    style N1 fill:#378ADD,stroke:#185FA5,color:#fff
    style N4 fill:#378ADD,stroke:#185FA5,color:#fff
    style N2 fill:#7F77DD,stroke:#534AB7,color:#fff
    style N3 fill:#7F77DD,stroke:#534AB7,color:#fff
    style M1 fill:#1D9E75,stroke:#0F6E56,color:#fff
    style M4 fill:#1D9E75,stroke:#0F6E56,color:#fff
    style M2 fill:#BA7517,stroke:#854F0B,color:#fff
    style M3 fill:#BA7517,stroke:#854F0B,color:#fff
    style M5 fill:#D85A30,stroke:#993C1D,color:#fff

Left: ring-based 1+1 — both directions carry the signal, receiver selects. Right: mesh restoration — backup path computed on demand by the control plane.

OUPSR/OSNCP are mechanically simple: each working channel is bridged at the source onto both the clockwise and anticlockwise ring directions; the receiving node has a 2:1 selector that picks the better direction. A fibre cut on the ring takes down only one direction; the receiver switches to the other within 50 ms. The trade-off is identical to 1+1: half the ring capacity is consumed by protection.

Key Insight

OUPSR is a ring-shaped 1+1, not a fundamentally different protection class. The architectural advantage is that ring fibre routing usually delivers genuine SRLG diversity for free — the two ring directions follow physically different ducts. On a linear point-to-point with two parallel fibres, achieving the same diversity requires manual SRLG-aware route engineering.

Shared-Mesh Restoration

Mesh restoration trades switching speed for capacity efficiency. Instead of dedicating a backup wavelength to each primary, a smaller pool of restoration capacity is shared across many primary paths — the assumption being that simultaneous failures requiring restoration of all primaries at once is statistically unlikely.

When a failure is detected, the optical control plane (see 12-optical-control-plane-and-automation) runs path computation, allocates a new wavelength along a diverse route, configures intermediate ROADMs, and signals the new path. The process takes seconds (best case) to tens of seconds (large topologies, congested PCE).

Mesh restoration stepTypical durationNotes
Failure detection (LoL, BDI, OPM alarm)10–50 msFastest is direct loss-of-light
Alarm propagation to controller100–500 msTelemetry pipeline latency
Path computation by PCE100 ms – 5 sDepends on impairment-aware vs simple shortest-path
Wavelength assignment + intermediate WSS reconfiguration1–10 sWSS settling time dominates
End-to-end restoration2–20 sOrder of magnitude slower than protection

The capacity advantage is real: in a typical mesh, shared restoration achieves equivalent single-failure survivability with roughly 30–50 % protection capacity, versus 100 % for 1+1. For non-real-time-critical traffic (Internet backbone, content distribution), the multi-second restoration window is acceptable; for trading systems and 5G fronthaul, 1+1 remains mandatory.

Protection-Class Comparison

The end-to-end choice is rarely “one scheme” — operators typically deploy a mix per service tier.

ClassArchitectureSwitch timeCapacity overheadComplexityBest fit
1+1 (linear)Dedicated dual transmit<50 ms100 %LowPremium leased lambdas, financial connectivity
1:1 (linear)Switched, with extra traffic<50 ms100 % (with offset)MediumOperator core with low-priority overflow
1:N (linear)Switched, shared protection<50 ms (single failure)1/NMediumCost-optimised aggregation, non-critical
OUPSR / OSNCPRing 1+1<50 ms100 %LowMetro / regional rings
Shared meshControl-plane restoration2–20 s30–50 %HighBackbone, Internet, CDN backhaul
UnprotectedNonen/a0 %NoneBest-effort, redundant via IP layer

Rule of Thumb

If the customer SLA cites a specific switch-time number (e.g. “less than 50 ms”), the answer is 1+1. If the SLA cites only annual availability targets (five-nines, four-nines), shared-mesh restoration is usually cheaper for the same headline number — see 14-optical-slas-availability-and-ops-reality.

Multi-Layer Protection Coordination

A modern carrier network has at least three layers capable of reacting to failure: IP / MPLS-TE FRR, OTN protection (G.873.1 SNCP at the OTN layer), and optical. If all three react simultaneously to the same fibre cut, the result is chaos — IP reroutes onto a path that OTN is also trying to switch onto that the optical layer is also restoring. The standard mechanism for ordering reactions is the hold-off timer.

The principle: lower layers act first. The optical layer detects loss-of-light within 10 ms and switches if it has protection capacity. OTN waits 100 ms (its hold-off timer) before declaring its own failure — by which time, if optical protection succeeded, OTN sees the recovered signal and stays quiet. IP / MPLS-TE FRR waits even longer (typically 200–500 ms) before reconverging.

LayerDetection timeHold-off (default)Total reaction
Optical (1+1, OUPSR)10 ms0 (no hold-off, lowest layer)<50 ms
Optical (mesh restoration)50 ms02–20 s
OTN G.873.1 SNCP10–50 ms100 ms<200 ms
MPLS-TE FRR50 ms100–500 ms<1 s
IP IGP convergence100–500 ms200 ms – several sseconds
BGP convergencesecondsn/a (highest layer)tens of seconds
flowchart TD
    EVENT["Fibre cut at T=0"]
    EVENT -->|"+10 ms"| OPT_DETECT["Optical detects LoL<br/>1+1 selector switches?"]
    OPT_DETECT -->|"yes — recovered"| DONE_OPT["Recovered <50 ms<br/>upper layers see no event"]
    OPT_DETECT -->|"no — no protection"| OTN_HOLD["OTN holdoff 100 ms<br/>then SNCP switch"]
    OTN_HOLD -->|"recovered"| DONE_OTN["Recovered <200 ms"]
    OTN_HOLD -->|"no protection"| MPLS_HOLD["MPLS-TE FRR<br/>100-500 ms holdoff"]
    MPLS_HOLD -->|"recovered"| DONE_MPLS["Recovered <1 s"]
    MPLS_HOLD -->|"no FRR"| IGP["IP IGP reconverge<br/>seconds"]
    IGP --> BGP["BGP reconverge<br/>tens of seconds"]
    style EVENT fill:#E24B4A,stroke:#A32D2D,color:#fff
    style OPT_DETECT fill:#1D9E75,stroke:#0F6E56,color:#fff
    style DONE_OPT fill:#378ADD,stroke:#185FA5,color:#fff
    style OTN_HOLD fill:#BA7517,stroke:#854F0B,color:#fff
    style DONE_OTN fill:#378ADD,stroke:#185FA5,color:#fff
    style MPLS_HOLD fill:#7F77DD,stroke:#534AB7,color:#fff
    style DONE_MPLS fill:#378ADD,stroke:#185FA5,color:#fff
    style IGP fill:#D85A30,stroke:#993C1D,color:#fff
    style BGP fill:#D85A30,stroke:#993C1D,color:#fff

Hold-off timer cascade — each layer gives the layer below it time to recover before declaring its own action. Tuning these timers is operator-specific and requires care.

Inter-Layer Protection Escalation Matrix

Failure scopeOptical layerOTN layerMPLS-TE layerIP layer
Single-span fibre cut, optical 1+1 protectedSwitch (<50 ms)Hold off — see recovery, no actionHold off — see recovery, no actionHold off — no action
Single-span fibre cut, no optical protectionNo actionSNCP switch (<200 ms)Hold offHold off
Multi-span SRLG cut affecting both 1+1 pathsLoss of working AND protectionHold off, then SNCP if path computableFRR backup tunnel (if pre-computed)Reconverge (last resort)
Transponder failure (single-card)No action (lambda gone)SNCP switchHold offHold off
ROADM node-wide failureMesh restoration if configuredSNCP if alternate OTN pathBypass via FRRIGP reconverge
Whole landing station failure (subsea)Mesh restoration impossible — site-downn/aBypass via diverse cableIGP reconverge

Warning

Multi-layer hold-off timers are operator-tuned and easy to mis-configure. The classic failure mode is “no layer switches” — every layer is waiting for the layer below it, and every layer below was hoping the layer above would handle it. Hold-off coordination must be reviewed end-to-end every time a new protection scheme is deployed at any layer.

Hard-Wired vs SDN-Driven Restoration

Protection (1+1, OUPSR) is hard-wired in hardware: the selector is a physical switch driven by alarm logic. It is deterministic — the same input always produces the same output, switching time is bounded by hardware response, and there is nothing to debug at the protocol level. The downside is rigidity: a change in topology, capacity, or risk profile requires re-engineering the protection design.

SDN-driven restoration (a controller listens for alarms, computes a new path, programs WSS configurations, and verifies the result) is flexible. The same control plane that builds a new circuit also restores a failed one — operators can change strategy in software, prioritise differently per customer, and account for impairments and current channel loading.

DimensionHard-wired protectionSDN-driven restoration
Switching timeDeterministic, <50 msVariable, 2–20 s
Capacity overheadHigh (1+1 = 100 %)Low (mesh ~30 %)
Topology flexibilityRe-engineering requiredChanges are configuration
Failure modesHardware onlyHardware + software + controller availability
Debug surfaceAlarm bytes, hardware logsMulti-system trace across controller, NEs, telemetry pipeline
Best fitPremium, low-volume, real-timeCommodity bulk capacity, Internet backbone

Rule of Thumb

Use hard-wired 1+1 / OUPSR for premium services and SDN restoration for bulk capacity. The economic boundary is roughly: services billed per-end-point at five-nines availability and sub-100 ms switch SLA → 1+1; services billed per-Gbps with four-nines availability → mesh restoration.

See Also

References

Standards (ITU-T)

  1. ITU-T G.873.1Optical Transport Network — Linear protection (03/2022). https://www.itu.int/rec/T-REC-G.873.1
  2. ITU-T G.873.2ODUk shared ring protection. https://www.itu.int/rec/T-REC-G.873.2
  3. ITU-T G.808.1Generic protection switching — Linear trail and subnetwork protection. https://www.itu.int/rec/T-REC-G.808.1
  4. ITU-T G.872Architecture of optical transport networks (10/2017). https://www.itu.int/rec/T-REC-G.872
  5. ITU-T G.798Characteristics of optical transport network hierarchy equipment functional blocks (12/2017). https://www.itu.int/rec/T-REC-G.798

Standards (IETF)

  1. RFC 4090Fast Reroute Extensions to RSVP-TE for LSP Tunnels (05/2005). https://www.rfc-editor.org/rfc/rfc4090
  2. RFC 4427Recovery (Protection and Restoration) Terminology for Generalized Multi-Protocol Label Switching (GMPLS) (03/2006). https://www.rfc-editor.org/rfc/rfc4427
  3. RFC 4428Analysis of Generalized Multi-Protocol Label Switching (GMPLS)-based Recovery Mechanisms (03/2006). https://www.rfc-editor.org/rfc/rfc4428
  4. RFC 6163Framework for GMPLS and Path Computation Element (PCE) Control of Wavelength Switched Optical Networks (WSONs) (04/2011). https://www.rfc-editor.org/rfc/rfc6163

Books

  1. R. Ramaswami, K. N. Sivarajan, G. H. Sasaki, Optical Networks: A Practical Perspective, 3rd ed., Morgan Kaufmann, 2009.
  2. V. Alwayn, Optical Network Design and Implementation, Cisco Press, 2004.