Cyber resilience in manufacturing is an uptime KPI

Estimated reading time: 9 minutes

Cyber resilience in manufacturing has become an uptime KPI. When systems fail, the question is not what policy do we have. It is how fast can we recover, and can we restart safely on a live line. Many firms invest heavily in prevention. Yet they still struggle when reality hits. The gap is not technical. It is operational. Operators need manual fallback steps they can run. Shift supervisors need restart routines that stop a messy recovery from becoming a safety or quality issue. This article makes cyber resilience practical. It focuses on safe restart, clear ownership, and rehearsals that cut recovery time.

Definition

What is cyber resilience in manufacturing?

Cyber resilience in manufacturing is the ability to restart production safely and quickly after a system failure. It means having manual fallback steps, clear ownership, and a rehearsed recovery routine that operators can run without IT support. The measure is not whether an attack was stopped, but how fast and safely the factory recovered.

What is the difference between OT cybersecurity and cyber resilience?

OT (Operational Technology) cybersecurity protects systems from attack. Cyber resilience covers what happens when an attack gets through. It means fallback steps, restart plans, first-off quality checks, and a shared IT and ops recovery sheet. Most manufacturers have invested in one and not the other.

Infographic contrasting two manufacturing postures on cyber resilience: most manufacturers investing in prevention only versus cyber-resilient operations with rehearsed restart routines, manual fallback procedures, and recovery time measurement. Includes statistics showing manufacturing accounts for 27.7% of all cyberattacks. — The test is not your policy document. It is what your shift supervisor does at 2am. (C) Nick Leeder & Co Limited. 2026.

Why cyber resilience is now an uptime KPI, not just an IT concern

Manufacturing is the most attacked industry on the planet, for the fifth year running. IBM X-Force found the sector took 27.7% of all cyberattacks in 2025 [IBM X-Force Threat Intelligence Index 2025]. When a SCADA [Supervisory Control and Data Acquisition] system goes down, or an MES [Manufacturing Execution System] goes dark, the floor does not stop by itself. People make calls. Those calls decide whether you recover in two hours or two days.

The cost of failing at recovery is real. In August 2025, Jaguar Land Rover suffered a ransomware attack. It halted output across three countries for five weeks and hit more than 5,000 suppliers. The wider UK economy absorbed an estimated £1.9 billion in losses [WEF Global Cybersecurity Outlook 2026]. The attack did not fail on prevention alone. It exposed the gap between IT system recovery and ops restart. That is the gap this article is about.

In the UK, 55% of makers plan to raise cybersecurity spend in 2026, up from 45% the year before [Make UK Executive Survey 2026]. Most of that goes on prevention and detection. Very little lands in ops as rehearsed fallback steps. That is the gap.

Quotable: Cyber resilience fails in recovery, not prevention, when restart and fallback are not standard work.

The COO or VP of Operations cannot wait for IT to call all clear before thinking about restart. By then, the floor is already improvising. Treat restart as a production skill. Own it. Test it before it is needed.

What safe restart means in factory terms

Safe restart, in factory terms, must be specific and runnable. By safe restart, we mean a set of steps that lets a line resume after a system outage. In practice, that means three things: no safety risk, no faulty product released, and no requirement for IT to be on-site.

A safe restart plan should cover four things:

Isolation check: who confirms the system is isolated or back, and what signal clears the way to proceed.
Manual fallback: which steps replace the lost system, who runs them, and what records are kept.
First-off quality check: what checks run on the first units after restart, and what triggers a hold.
Handover and escalation: how the restart is logged, passed to the next shift, and escalated if the line cannot restart in time.

This is standard work. It belongs on the shift handover board, not in a server room file.

Safe restart blueprint showing the four operational steps after a cyber-related system outage in manufacturing: isolation check, manual fallback, first-off quality check, and handover and escalation. — A safe restart is a production procedure, not a technology procedure. Four steps, clear owners, standard work. (C) Nick Leeder & Co Limited. 2026.

The minimum manual fallback operators need

Most plans miss this. OT [Operational Technology] cybersecurity asks which systems to protect. Manual fallback asks which system jobs must a human be able to do without tech support, and how.

A practical fallback minimum includes:

A paper log that tracks output per shift without MES access.
A printed quality check sheet with control limits for key in-process checks.
A laminated settings sheet for each work cell, kept current whenever the standard changes.
A clear note on who to call and what to log when a system goes down.

Leading makers connect an average of 85% of their production and logistics endpoints to their IT and OT systems [WEF Global Lighthouse Network 2026]. That brings real value. It also creates more single points of failure. Manual fallback is the offset.

What this looks like in practice

A mid-market food maker runs three lines on nights. At 01:15, the site MES stops responding. IT is not on-site. The root cause turns out to be a ransomware attempt via a supplier link. The shift supervisor has no printed fallback and no clear steer on whether to keep running.

The supervisor keeps one line going with a handwritten tally. Quality checks happen but are not logged against the batch. When IT restores the system two hours later, there is a gap in the trace data. The batch is held. The line loses four hours, not two, because the restart itself creates a quality hold.

A 45-minute line walk and a one-page fallback plan, agreed in advance, would have kept it to two hours. The problem was not the cyber event. It was the absence of a restart plan with manual logging, a first-off check, and batch record steps.

A practical rehearsal method: tabletop and line walk

A plan that has never been tested is not a plan. It is a guess. You do not need to shut down a line to rehearse. You need two things: a tabletop session and a line walk.

The tabletop takes 90 minutes. Gather the shift supervisor, a maintenance tech, a quality lead, and one IT or OT rep. Walk through one scenario: the MES goes down at shift start. Ask each person what they would do. Do not give the answer. Listen for the gaps. Who does not know what to do? Which step has no owner? Log the gaps. Fix the highest-risk ones first.

The line walk follows. Take the fallback plan onto the floor and try to run it. Can the operator find the paper log form? Does the settings sheet match the current spec? Is the call-out number still live? The line walk shows what works at 2am and what does not. Run this cycle once a quarter on your highest-risk lines.

Common mistakes that extend recovery time

Treating cyber resilience as an IT job only. If ops does not own restart, IT will build a plan that works in a server room but not on a production floor.
Keeping fallback steps in shared drives that need the same network that is down.
Setting restart targets based on IT recovery time rather than ops downtime tolerance. These are often not the same figure.
Running a tabletop once and not repeating it. Staff change. Processes change. A plan that was right 18 months ago may have gaps today.
Not agreeing quality checks before a restarted line can run to rate. Without this, the restart creates a hold by default.

Three measures that tell you if you are ready

Most firms track cybersecurity spend and attack counts. Neither tells you whether you can restart safely. Three measures that do:

Recovery time: from outage to safe restart. Set a target and track it. Your first tabletop gives you the baseline.
Safe restart rate: the share of restart events where the defined steps were followed, tracked per line. Gaps point to training or clarity problems.
Restart defects: quality holds or rework that started during or just after a system restart. This is the clearest sign of a gap in your plan.

Quotable: Uptime improves when safe restart is rehearsed, owned, and built for shifts. Measure recovery time and restart quality, not paperwork.

How to align IT and ops without finger-pointing

IT and ops often have different ideas of what recovery means. IT focuses on system restore. Ops focuses on production restart. Both are right. Neither side builds the other’s needs into their plan by default. Fortinet found that half of OT firms suffered a breach in 2024 to 2025, and noted a clear culture split between IT and OT teams [Fortinet, 2025 State of Operational Technology and Cybersecurity Report]. That gap does not close on its own.

The fix is a joint recovery plan: a single A3 sheet that maps the IT restore steps against the ops restart steps, with clear owners and time-boxes for each handover. When IT says the system is back, ops needs to know what it can and cannot do at that moment. Work it out before the incident, not during it.

Key takeaways

Treat safe restart and manual fallback as standard work. Own it in ops. Keep it on the line.
Build your restart plan around four steps: isolation check, manual fallback, first-off quality check, and handover.
Rehearse quarterly: 90-minute tabletop, 30-minute line walk, gaps fixed.
Measure three things: recovery time, safe restart rate, and restart defects.
Align IT and ops on one A3 recovery sheet. Work out who owns each step before the next incident.

Frequently asked questions

What is cyber resilience in manufacturing?

Cyber resilience in manufacturing is the ability to restart production safely and quickly after a system failure. It means having fallback steps, clear ownership, and a rehearsed routine that operators can run without IT support. The measure is not whether an attack was stopped, but how fast and safely the factory recovered.

What is the difference between OT cybersecurity and cyber resilience?

OT [Operational Technology] cybersecurity protects systems from attack. Cyber resilience covers what happens when an attack gets through. It means fallback steps, restart plans, first-off checks, and a shared IT and ops recovery sheet. Most firms have invested in one and not the other.

How long should a safe restart take after a cyber incident?

There is no single answer, but the right question is: what downtime can your line afford, and does your current restart plan fit inside it? Most sites find actual restart takes two to three times longer than expected, often because the restart creates a quality hold. Set a target and measure it.

Who should own the safe restart decision?

One named role per shift, not a group and not IT. In most mid-market firms, this is the shift supervisor or duty ops manager. Name the role in the plan. Test it in every tabletop. If no one is named for nights and weekends, that is the first gap to close.

Over to you

When your most system-dependent line last went down, how long did it take to restart safely, and what did that expose about your fallback capability?

Discover more from Nick Leeder & Co

Subscribe to get the latest posts sent to your email.

M	T	W	T	F	S	S
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30
31

Cyber resilience in manufacturing is an uptime KPI: how to design a safe restart operators can run

Why cyber resilience is now an uptime KPI, not just an IT concern

What safe restart means in factory terms

The minimum manual fallback operators need

What this looks like in practice

A practical rehearsal method: tabletop and line walk

Common mistakes that extend recovery time

Three measures that tell you if you are ready

How to align IT and ops without finger-pointing

Key takeaways

Frequently asked questions

Over to you

Related

Discover more from Nick Leeder & Co

Why cyber resilience is now an uptime KPI, not just an IT concern

What safe restart means in factory terms

The minimum manual fallback operators need

What this looks like in practice

A practical rehearsal method: tabletop and line walk

Common mistakes that extend recovery time

Three measures that tell you if you are ready

How to align IT and ops without finger-pointing

Key takeaways

Frequently asked questions

Over to you

Share this:

Related

Discover more from Nick Leeder & Co

Discover more from Nick Leeder & Co