drj logo

"*" indicates required fields

This field is for validation purposes and should be left unchanged.
Name*
Zip Code*
Please enter a number from 0 to 100.
Strength indicator
I agree to the Terms of Service and Privacy Policy*
Yes, of course I want to receive emails from DRJ!

Already have an account? Log in

drj logo

Welcome to DRJ

Already registered user? Please login here

Login Form

Register
Forgot password? Click here to reset

Create new account
(it's completely free). Subscribe

x
Skip to content
Disaster Recovery Journal
  • EN ESPAÑOL
  • SIGN IN
  • SUBSCRIBE
  • THE JOURNAL
    • Why Subscribe to DRJ
    • Digital Edition
    • Article Submission
    • DRJ Annual Resource Directories
    • Article Archives
    • Career Spotlight
  • EVENTS
    • DRJ Spring 2026
    • DRJ Fall 2026 Call for Presentations
    • DRJ Fall 2026
    • DRJ Scholarship
    • Tracey Rice Memorial Scholarship
    • Other Industry Events
    • Schedule & Archive
    • Send Your Feedback
  • WEBINARS
    • Upcoming Webinars
    • On Demand
  • MENTOR PROGRAM
  • RESOURCES
    • New to Business Continuity?
    • White Papers
    • DR Rules and Regs
    • Planning Groups
    • DRJ Glossary of Business Continuity Terms
    • Careers
    • The BCI Partnership
  • ABOUT
    • About DRJ
    • 2026 Media Kit
    • Board and Committees
      • Executive Council Members
      • Editorial Advisory Board
      • Career Development Committee
      • DEI
      • Glossary Committee
      • Rules and Regulations Committee

New Study Reveals 75% of Enterprises Report Double-Digit AI Failure Rates as Fragmented Observability Hits Its Breaking Point

by Jon Seals | March 10, 2026 | | 0 comments

59% of Executives Say Organizations are AI-ready, While 62% of Practitioners Report Fragmented Systems Unfit For Machine-Scale Operations

PALO ALTO, Calif. – Enterprise IT is approaching a structural breaking point. New research released today by Virtana reveals a widening leadership disconnect as AI workloads scale beyond the limits of legacy observability. The “AI Is Breaking Human Managed Operations” report found that while 59% of executives believe their organizations are prepared for AI-scale operations, 62% of practitioners report fragmented systems and persistent visibility gaps. Three in four enterprises report AI job failure rates have already reached double digits, signaling measurable instability as adoption accelerates.

“The data is unambiguous. While executive confidence is rising, operational fragility is rising faster,” said Paul Appleby, CEO of Virtana. “When three-quarters of enterprises report double-digit AI job failure rates and one-third exceed 25%, the operating model is clearly outdated. At enterprise scale, these rates translate into thousands of failed executions per day, driving retries, wasted compute capacity, cascading delays, and escalating operational risk. As AI workloads expand and agentic systems begin operating autonomously, modest failure percentages compound into systemic volatility.”

Executive Confidence Masks Operational Reality for AI-Scale Operations

The research reveals a consistent and widening disconnect between executive confidence and practitioner reality. While 59% of executives believe their platforms are AI-ready, 62% of practitioners report fragmented systems and persistent visibility issues. Less than half are confident current observability tools can handle AI-scale workloads.

The divide is sharpest around cost governance, where a 16-point confidence split separates executives (67%) from the practitioners (47%) who experience the operational reality those investments are meant to address. 

These forces converge to make even modest leadership-practitioner misalignment compound rapidly into systemic instability.

One in Four AI Jobs Fail, Exposing the Breaking Point of Human-Managed Operations

Organizations are experiencing AI failure rates that would be unacceptable in any mission-critical system, with 75% reporting AI job failure rates exceeding 10% and 33% experiencing failure rates above 25%, meaning one in four AI jobs fail.

These failure rates confirm that human-scale operations cannot sustain machine-scale systems. Practitioners recognize this reality: 45% fear they cannot meet AI workload demands with current systems, and 56% cite storage and networking bottlenecks as their top AI constraint.

The challenge extends to containerized environments, where 76% of practitioners experience multiple container-related failures, with more than half encountering three or more simultaneous failures. As applications decompose into distributed systems spanning services, clusters, nodes, storage systems, network paths, and AI services across hybrid and multi-cloud environments, operational models and observability architectures have not kept pace.

“Practitioners are confronting unprecedented complexity as the definition of ‘application’ has fundamentally shifted from discrete code to distributed delivery systems,” said Amitkumar Rathi, Chief Product Officer at Virtana. “Modern applications now span infrastructure, cloud platforms, Kubernetes, storage, networks, data pipelines, and AI workloads operating simultaneously. As organizations race toward AI adoption, these systems are scaling faster than operational models can support, exposing the limits of fragmented observability. At machine scale, teams cannot manage what they cannot see end-to-end, making continuous, real-time system context essential for reliable AI operations.”

This complexity manifests in GPU infrastructure challenges, with 41% of practitioners reporting GPU inefficiency and contention as AI workloads introduce nonlinear scaling, extreme burstiness, and deep cross-domain dependencies that exceed human cognitive capacity. 

The Observability Investment Paradox and the Path Forward with Executive Alignment and Autonomous Operations

Gartner estimates the observability products market will reach approximately $14.2 billion by 2028. Yet despite this substantial investment wave, only 48% of practitioners, the engineers and operators working inside these platforms daily, are confident their current observability tools can handle AI-scale workloads. Enterprises are pouring billions into observability, while the people who must use those tools are sounding alarms. Investment decisions are being made at the executive level without adequate input from the practitioners who experience the operational reality those investments are meant to address.

“With budgets flat and teams not growing, organizations must scale IT through agentic AI, autonomous operations, and unified observability across the full system,” continued Appleby. “You cannot layer AI agents onto fragmented observability and expect reliability. These agents can only reason, decide, and act safely when they operate with full-stack operational context and continuous, real-time correlation across enterprise systems. Without unified observability functioning as an operational control plane, autonomous agents will inherit the same blind spots that plague legacy monitoring, then amplify those failures at machine speed and hyperscale.”

In direct response to these findings, Virtana today announced a new Application Observability offering, purpose-built to address the visibility crisis documented in the research. The new capability addresses the challenges of traditional application performance monitoring by automatically correlating application performance issues across the entire enterprise tech stack, from code and services to infrastructure, networks, and AI platforms, delivering the unified, full-stack context that practitioners report as missing. 

Resources

  • Download the “AI Is Breaking Human-Managed Operations” research report 
  • Learn more at virtana.com
  • Follow Virtana on LinkedIn and X

Research Methodology 

This report is based on an independent global survey of 351 senior IT and technology leaders responsible for enterprise infrastructure, operations, cloud platforms, Kubernetes environments, and AI workloads. Respondents came from organizations with 100 to 10,000+ employees, operating large-scale, business-critical digital environments spanning hybrid, multi-cloud, and on-prem infrastructure, Kubernetes platforms, GPU-accelerated AI workloads, and 24×7 production systems.

About Virtana

Virtana delivers the deepest and broadest observability platform for hybrid and multi-cloud, with full-stack AI observability spanning applications, services, data pipelines, GPUs, CPUs, networks, and storage. Powered by high-fidelity data and agentic AI, Virtana provides unmatched visibility across end-to-end IT services and AI workloads, correlating health, performance, cost, and user impact in real time. With advanced event intelligence and autonomous insight generation, Virtana delivers clarity no other provider can match. Trusted by Global 2000 enterprises and public sector organizations, Virtana helps IT operations and DevOps teams reduce risk, strengthen resilience, improve efficiency, and modernize with confidence across multi-cloud, on-premises, and edge environments. Learn more at virtana.com 

Related Content

  1. The State of Business Continuity Preparedness 2023
  2. Foundation for Automated Enterprise DR Using AI
    Foundation for Automated Enterprise DR Using AI
  3. Why SaaS Is the Digital Backbone of Future-Ready Organizations
    Why SaaS Is the Digital Backbone of Future-Ready Organizations

Recent Posts

Team Cymru Redefines the Threat Feed Category with Total Insights Feed

April 17, 2026

Cork Cyber Tackles the Dirty Data Problem Quietly Undermining MSP Cyber Programs with New Automated Mapping Feature

April 17, 2026

DuploCloud Strengthens Enterprise Trust Position with SOC 2 Type II and ISO/IEC 42001 Milestones

April 16, 2026

Keeper Security Launches Enterprise-Grade Approval Governance and Real-Time Visibility for Endpoint Privilege Management

April 16, 2026

ONEKEY: Vulnerability Management and SBOM Generation Are Key to CRA Compliance

April 16, 2026

Compliance Breakthrough at Dauphin Island leads to CRS Class Improvement

April 15, 2026

Archives

  • April 2026 (39)
  • March 2026 (89)
  • February 2026 (76)
  • January 2026 (61)
  • December 2025 (45)
  • November 2025 (58)
  • October 2025 (78)
  • September 2025 (65)
  • August 2025 (59)
  • July 2025 (70)
  • June 2025 (54)
  • May 2025 (59)
  • April 2025 (91)
  • March 2025 (57)
  • February 2025 (47)
  • January 2025 (73)
  • December 2024 (82)
  • November 2024 (41)
  • October 2024 (87)
  • September 2024 (61)
  • August 2024 (65)
  • July 2024 (48)
  • June 2024 (55)
  • May 2024 (70)
  • April 2024 (79)
  • March 2024 (65)
  • February 2024 (73)
  • January 2024 (66)
  • December 2023 (49)
  • November 2023 (80)
  • October 2023 (67)
  • September 2023 (53)
  • August 2023 (72)
  • July 2023 (45)
  • June 2023 (61)
  • May 2023 (50)
  • April 2023 (60)
  • March 2023 (69)
  • February 2023 (54)
  • January 2023 (71)
  • December 2022 (54)
  • November 2022 (59)
  • October 2022 (66)
  • September 2022 (72)
  • August 2022 (65)
  • July 2022 (66)
  • June 2022 (53)
  • May 2022 (55)
  • April 2022 (60)
  • March 2022 (65)
  • February 2022 (50)
  • January 2022 (46)
  • December 2021 (39)
  • November 2021 (38)
  • October 2021 (39)
  • September 2021 (50)
  • August 2021 (77)
  • July 2021 (63)
  • June 2021 (42)
  • May 2021 (43)
  • April 2021 (50)
  • March 2021 (60)
  • February 2021 (16)
  • January 2021 (554)
  • December 2020 (30)
  • November 2020 (35)
  • October 2020 (48)
  • September 2020 (57)
  • August 2020 (52)
  • July 2020 (40)
  • June 2020 (72)
  • May 2020 (46)
  • April 2020 (59)
  • March 2020 (46)
  • February 2020 (28)
  • January 2020 (36)
  • December 2019 (22)
  • November 2019 (11)
  • October 2019 (36)
  • September 2019 (44)
  • August 2019 (77)
  • July 2019 (117)
  • June 2019 (106)
  • May 2019 (49)
  • April 2019 (47)
  • March 2019 (24)
  • February 2019 (37)
  • January 2019 (12)
  • ARTICLES & NEWS

    • Business Continuity
    • Disaster Recovery
    • Crisis Management & Communications
    • Risk Management
    • Article Archives
    • Industry News

    THE JOURNAL

    • Digital Edition
    • Advertising & Media Kit
    • Submit an Article
    • Career Spotlight

    RESOURCES

    • White Papers
    • Rules & Regulations
    • FAQs
    • Glossary of Terms
    • Industry Groups
    • Business & Resource Directory
    • Business Resilience Decoded
    • Careers

    EVENTS

    • Fall 2026
    • Spring 2026

    WEBINARS

    • Watch Now
    • Upcoming

    CONTACT

    • Article Submission
    • Media Kit
    • Contact Us

    ABOUT DRJ

    Disaster Recovery Journal (DRJ) is the leading resource for business continuity, disaster recovery, crisis management, and risk professionals worldwide. With a global network of more than 138,000 practitioners, DRJ delivers essential insights through two annual conferences, a quarterly digital magazine, weekly webinars, and a rich library of online resources at www.drj.com. Our mission is to empower resilience professionals with the knowledge, tools, and connections they need to protect their organizations in a fast-changing world. Join our community by attending our events, subscribing to our publications, and following us on social media.

    LEARN MORE

    LINKEDIN AND TWITTER

    Disaster Recovery Journal is the leading publication/event covering business continuity/disaster recovery.

    Follow us for daily updates

    LinkedIn

    @drjournal

    Newsletter

    The Journal, right in your inbox.

    Be informed and stay connected by getting the latest in news, events, webinars and whitepapers on Business Continuity and Disaster Recovery.

    Subscribe Now
    Copyright 2026 Disaster Recovery Journal
    • Terms of Use
    • Privacy Policy

    Register to win a Free Pass to DRJ Fall 2026 | Resilience In Motion

    Leave your details below for a chance to win a free pass to DRJ Fall 2026 | Resilience In Motion. The winner will be announced on July 30. Join us for DRJ's 75th Conference!
    Enter Now