By Ashley Rose, Co-founder & CEO of Living Security
Most disaster recovery plans are built on an assumption we rarely write down: when things break, the workforce will hold.
Systems may fail, networks may go down, a building may be inaccessible. But the plan typically expects people to show up, understand what’s happening, make sound decisions, and carry the organization through disruption.
That assumption is now a liability. “The workforce” no longer means only employees and contractors. It also includes automated workflows, bots, and increasingly, AI-driven agents that can initiate actions, approve steps, change configurations, and move data at speed
Disaster recovery and business continuity programs are at an inflection point. If strategy still treats humans as steady operators and automation as a passive tool, it plans for a world that no longer exists in 2026.The next era of resilience will be defined by decision clarity across people and intelligent systems.
‘Human Error’ Is an Incomplete Explanation
Post-incident reports often use the phrase “human error.” But in reality, most failures are caused by predictable conditions such as unclear authority, conflicting priorities, confusing signals, missing context, poor handoffs, tool overload, and processes that break down the moment stress spikes. In disruption, mistakes are not anomalies. They are system-induced outcomes.
What’s changing now is visibility. Organizations can finally see how decision patterns repeat across incidents: where people hesitate, where they guess, where they override controls, where they follow the wrong playbook, and where the system nudges them into the wrong action.
For continuity leaders, that’s the real opportunity. Not to “train harder,” but to redesign the environment so the default choices during a crisis are the right ones.
Training for Compliance Is Not Training for Crisis
Most organizations train people to meet expectations (such as completing a module, acknowledging a policy, passing a quiz) which might satisfy auditors. However, it rarely prepares anyone for real-world disruption. Crisis conditions don’t reward memorization. They reward judgment.
When alarms go off and dashboards contradict each other, people fall back on habit and tribal knowledge. If your organization has never practiced decisions under pressure; who decides, who escalates, what gets prioritized; then your plan is theoretical.
This is why the most resilient organizations have shifted toward experiential learning, where the goal is fewer preventable errors when minutes matter, such as:
- simulations that force cross-functional decisions,
- tabletop exercises that test authority and escalation,
- scenario drills that surface ambiguity before an incident does.
Identity and Access Are Continuity Concerns
Identity and access management is usually treated as a security topic. It’s also a continuity topic–and it’s about to become one of the most important ones.
During disruption, responders need access fast. But “fast” is where organizations get burned:
- accounts that can’t be used because MFA or SSO dependencies are down,
- privileges that are too broad because nobody wants to slow down recovery,
- service accounts and API keys nobody owns, nobody rotates, and nobody can confidently disable,
- emergency access paths that exist on paper but fail in practice.
Continuity planning must include identity planning:
- Break-glass access that actually works (and gets tested).
- Least privilege with speed—rights that are pre-staged for incident roles.
- Clear ownership of human and non-human identities.
- Auditability that survives the incident, not just the steady state.
If you can’t answer “who can do what, right now, and why,” you don’t have resilience—you only have hope.
AI Agents Change the Shape of an Incident
AI agents introduce a new category of operational actors that can execute. When they execute during an incident, three questions become non-negotiable:
- What authority do they have?
If an agent can initiate remediation, quarantine systems, update configurations, or trigger communications, it has effectively delegated decision rights. Those rights need governance just like governing a human role. - How do they escalate?
When uncertainty is high (and it always is in a crisis), systems need to know when to stop and ask for a human decision. “Autonomous” without escalation is pure risk. - Can you audit and shut them down?
Every automated action should be attributable, reviewable, and reversible, with confidence that turning it off won’t create a second failure.
AI agents also expose an uncomfortable truth: humans and machines don’t always optimize for the same thing. A system may pursue speed while leaders prioritize safety. A workflow may enforce policy while the business needs flexibility. If those priorities aren’t resolved before an incident, they will collide during one.
What to Do on Monday: A Practical Framework
If you want your disaster recovery program to reflect reality, not theory, start here.
1) Map your decision choke points
Pick your top three incident scenarios. For each, identify the decisions that determine outcomes:
- containment vs. continuity tradeoffs,
- customer communication triggers,
- access overrides,
- system shutdown thresholds.
Then, name the decision owners and escalation paths clearly.
2) Inventory identities that matter in a crisis (including non-human)
List the accounts, service identities, bots, keys, and agents that can:
- move data,
- change access,
- alter configurations,
- trigger external communications.
Assign ownership. Define rotation. Validate you can disable them quickly without breaking recovery.
3) Test break-glass and incident-role access quarterly
If you only validate emergency access during an incident, it’s not emergency access. It’s a gamble.
Run a drill where key dependencies are down (SSO, MFA, network segmentation). Prove responders can still act—securely.
4) Define agent guardrails like you define human roles
For each AI-driven system, document:
- permitted actions,
- prohibited actions,
- escalation thresholds,
- audit requirements,
- kill-switch procedures,
- and what “safe failure” looks like.
Then test it in a tabletop exercise where priorities conflict.
5) Measure outcomes – not activity
Don’t measure “training completion.” Measure:
- time-to-decision,
- escalation accuracy,
- number of rework cycles,
- avoidable errors,
- and recovery quality (not just speed).
If your metrics don’t reflect decision performance, they won’t predict resilience.
Looking Ahead
Disaster recovery has always evolved with technology: from physical infrastructure to virtualization, from data centers to cloud, from perimeter defenses to zero trust. The evolution of the workforce is the next major shift.
Organizations that recognize this early will recover faster; not because they bought better tools, but because they engineered clarity: who can act, what they can do, and how humans and intelligent systems coordinate when pressure is highest.
The future of disaster recovery won’t be won on tooling alone. It will be won on decision design, putting people and automated systems on the same operating model before the next incident forces the lesson.

