drj logo

"*" indicates required fields

This field is for validation purposes and should be left unchanged.
Name*
Zip Code*
Please enter a number from 0 to 100.
Strength indicator
I agree to the Terms of Service and Privacy Policy*
Yes, of course I want to receive emails from DRJ!

Already have an account? Log in

drj logo

Welcome to DRJ

Already registered user? Please login here

Login Form

Register
Forgot password? Click here to reset

Create new account
(it's completely free). Subscribe

x
Skip to content
Disaster Recovery Journal
  • EN ESPAÑOL
  • SIGN IN
  • SUBSCRIBE
  • THE JOURNAL
    • Why Subscribe to DRJ
    • Digital Edition
    • Article Submission
    • DRJ Annual Resource Directories
    • Article Archives
    • Career Spotlight
  • EVENTS
    • DRJ Spring 2026
    • DRJ Fall 2026
    • DRJ Scholarship
    • Tracey Rice Memorial Scholarship
    • Other Industry Events
    • Schedule & Archive
    • Send Your Feedback
  • WEBINARS
    • Upcoming Webinars
    • On Demand
  • MENTOR PROGRAM
  • RESOURCES
    • New to Business Continuity?
    • White Papers
    • DR Rules and Regs
    • Planning Groups
    • DRJ Glossary of Business Continuity Terms
    • Careers
    • The BCI Partnership
  • ABOUT
    • About DRJ
    • 2026 Media Kit
    • Board and Committees
      • Executive Council Members
      • Editorial Advisory Board
      • Career Development Committee
      • DEI
      • Glossary Committee
      • Rules and Regulations Committee

Virtana Unveils the First Full-Stack AI Factory Observability Platform

by Jon Seals | May 20, 2025 | | 0 comments

Unique new capabilities help enterprises tame AI infrastructure complexity, boost resource efficiency, and bring predictability to industrial-scale AI operations

PALO ALTO, Calif. – Virtana, the leader in hybrid infrastructure observability, today announced the launch of Virtana AI Factory Observability (AIFO), a powerful new capability that extends Virtana’s full-stack observability platform to the unique demands of AI infrastructure. With deep, real-time insights into everything from GPU utilization and training bottlenecks to power consumption and cost drivers, AIFO enables enterprises to turn complex, compute-intensive AI environments into scalable, efficient, and accountable operations. This launch strengthens Virtana’s position as the industry’s broadest and deepest observability platform, spanning AI, infrastructure, and applications across hybrid and multi-cloud environments.

“AI has the potential to be as transformative as the steam engine or the printing press—but only if enterprises can operationalize it at scale,” said Paul Appleby, CEO of Virtana. “Right now, too many teams are flying blind when it comes to AI infrastructure. Virtana AIFO gives them the visibility and control they need to treat AI not as an experiment, but as a core, strategic part of the business.”

Following a surge in enterprise investment and industry focus on scalable AI Factory infrastructure from ecosystem leaders like NVIDIA, Virtana is the first to deliver a full-stack observability solution purpose-built for AI Factory operations. As organizations move from AI pilots to production, demand is growing rapidly for platforms that go beyond surface-level monitoring to deliver deep, correlated insights across infrastructure, models, and cost drivers.

Industry analysts have identified this shift as a key trend. AI is no longer a research initiative; it is becoming an operational foundation for business. Virtana’s AI Factory Observability (AIFO) directly addresses this evolution, helping enterprises treat AI infrastructure with the same level of visibility, discipline, and accountability as traditional IT.

As an official NVIDIA partner, Virtana integrates natively with NVIDIA GPU platforms to deliver in-depth telemetry, including memory utilization, thermal behavior, and power metrics, providing precise, vendor-validated insight into the most performance-critical components of the AI Factory. This deep integration delivers accurate, actionable intelligence at enterprise scale.

“AI workloads introduce an entirely different set of infrastructure challenges—from GPU saturation and training bottlenecks to unpredictable cost spikes,” said Amitkumar Rathi, Senior Vice President of Engineering, Product, and Support at Virtana. “We designed AIFO to address these realities head-on. It gives teams deep, correlated visibility across the full AI stack, enabling them to optimize performance, reduce waste, and scale AI with confidence.”

With this launch, Virtana directly addresses the growing infrastructure challenges that stand in the way of scalable AI success. As enterprises accelerate investments in AI, many are encountering hidden inefficiencies: idle GPUs that inflate costs, training jobs that fail without explanation, and inference pipelines that stall due to underlying storage or network issues. AIFO is purpose-built to solve these problems, delivering real-time visibility and correlated insights across every layer of the AI infrastructure stack. The result is greater control over performance, spend, and scale—turning AI from a high-risk initiative into a high-impact capability.

Purpose-Built Observability for AI Infrastructure

Unlike traditional monitoring tools built for general IT workloads, Virtana AI Factory Observability (AIFO) is purpose-built to meet the demands of AI operations. It continuously collects telemetry across GPUs, CPUs, memory, network, and storage and then correlates that data with training and inference pipelines to provide clear and actionable insights.

Core capabilities include:

  • GPU Performance Monitoring – Tracks per-GPU metrics such as memory, utilization, thermal load, and power draw across multiple vendors.
  • Distributed Training Visibility – Identifies bottlenecks, synchronization issues, and stragglers across multi-node jobs.
  • Infrastructure-to-AI Mapping – Correlates model-level performance directly to hardware-level behavior, including network and storage dependencies.
  • Power and Cost Analytics – Exposes inefficiencies such as thermal throttling, idle GPU time, and overprovisioning resources.
  • Root Cause Analysis – Diagnoses training failures and inference slowdowns faster by pinpointing the most likely infrastructure causes.

All capabilities are accessible via Virtana’s Global View dashboard, which unifies telemetry across hybrid and containerized AI environments—on-premises, cloud, or both.

Proven Results from Enterprise Deployments

AIFO is already delivering measurable results in production AI environments across multiple industries. Operational outcomes include:

  • 40% reduction in idle GPU time, improving resource utilization and reducing infrastructure costs.
  • 60% faster mean time to resolution (MTTR) for AI-related incidents
  • 50% decrease in false alerts, reducing operational noise and accelerating response
  • 15% improvement in power efficiency, supporting sustainability goals.

Available Now, Built for What’s Next

Virtana AI Factory Observability (AIFO) is now generally available as a fully integrated capability within the Virtana Platform. Purpose-built for the demands of modern AI infrastructure, AIFO scales effortlessly from early-stage test environments to enterprise-grade AI factories. This launch, together with Virtana’s recent acquisition of Zenoss, further extends the company’s leadership in delivering the deepest, and broadest observability platform across applications, infrastructure, and AI workloads in hybrid and multi-cloud environments.

Additionally, Virtana’s recent acquisition of Zenoss expands the platform’s event intelligence and service-centric observability capabilities, allowing customers to correlate AI model performance with broader application behavior and infrastructure health. Together, these advancements deepen Virtana’s ability to help enterprises manage the full complexity of AI operations in the most demanding environments.

This launch coincides with Virtana’s presence at Dell Technologies World 2025, where the company is showcasing AIFO in booth #262 and offering live demonstrations of its observability capabilities for GPU-intensive environments.

To read the blog post, visit https://www.virtana.com/blog/ai-factories-are-breaking-traditional-infrastructure-heres-how-were-fixing-it

To learn more or request a personalized demo, visit virtana.com.

About Virtana

Virtana is the leader in observability for hybrid infrastructure. The AI-powered Virtana Platform delivers a unified view across applications, services, and underlying infrastructure, correlating user impact, service dependencies, performance bottlenecks, and cost drivers in real time. Trusted by Global 2000 enterprises, Virtana helps IT, operations, and platform teams improve efficiency, reduce risk, and make faster, AI-driven decisions across complex, dynamic environments. Learn more at virtana.com.

Related Content

  1. Disaster Recovery Journal
    Exercising IT Disaster Recovery Plans
  2. Disaster Recovery Journal
    Virtana Extends AI Factory Observability to the Dell AI Factory
  3. Disaster Recovery Journal
    Virtana Unveils System-Aware MCP Server, Advancing Industry Shift from Fragmented Application Monitoring to End-to-End Enterprise AI Operations

Recent Posts

Rubrik Offers Unified Cloud and On-Premises Cyber Resilience Solution for MEDITECH Healthcare Customers

May 20, 2026

Keyfactor Attains FedRAMP Moderate Authorization

May 20, 2026

KnowBe4 Report Reveals Success in the Era of Agentic AI Demands a Cybersecurity Culture-First Approach

May 20, 2026

SIOS Technology Returns with Season 2 of ‘Don’t Fail Me Now,’ Spotlighting IT Resilience in Action

May 20, 2026

Ambient.ai Expands Ambient Access Intelligence with New Agentic Capabilities: Infrastructure Diagnostics, and Real-Time Perimeter Visibility

May 20, 2026

Keeper Security Launches ServiceNow Workflow Integration for Governed, Self-Service Vault Management

May 20, 2026

Archives

  • May 2026 (54)
  • April 2026 (70)
  • March 2026 (89)
  • February 2026 (76)
  • January 2026 (61)
  • December 2025 (45)
  • November 2025 (58)
  • October 2025 (78)
  • September 2025 (65)
  • August 2025 (59)
  • July 2025 (70)
  • June 2025 (54)
  • May 2025 (59)
  • April 2025 (91)
  • March 2025 (57)
  • February 2025 (47)
  • January 2025 (73)
  • December 2024 (82)
  • November 2024 (41)
  • October 2024 (87)
  • September 2024 (61)
  • August 2024 (65)
  • July 2024 (48)
  • June 2024 (55)
  • May 2024 (70)
  • April 2024 (79)
  • March 2024 (65)
  • February 2024 (73)
  • January 2024 (66)
  • December 2023 (49)
  • November 2023 (80)
  • October 2023 (67)
  • September 2023 (53)
  • August 2023 (72)
  • July 2023 (45)
  • June 2023 (61)
  • May 2023 (50)
  • April 2023 (60)
  • March 2023 (69)
  • February 2023 (54)
  • January 2023 (71)
  • December 2022 (54)
  • November 2022 (59)
  • October 2022 (66)
  • September 2022 (72)
  • August 2022 (65)
  • July 2022 (66)
  • June 2022 (53)
  • May 2022 (55)
  • April 2022 (60)
  • March 2022 (65)
  • February 2022 (50)
  • January 2022 (46)
  • December 2021 (39)
  • November 2021 (38)
  • October 2021 (39)
  • September 2021 (50)
  • August 2021 (77)
  • July 2021 (63)
  • June 2021 (42)
  • May 2021 (43)
  • April 2021 (50)
  • March 2021 (60)
  • February 2021 (16)
  • January 2021 (554)
  • December 2020 (30)
  • November 2020 (35)
  • October 2020 (48)
  • September 2020 (57)
  • August 2020 (52)
  • July 2020 (40)
  • June 2020 (72)
  • May 2020 (46)
  • April 2020 (59)
  • March 2020 (46)
  • February 2020 (28)
  • January 2020 (36)
  • December 2019 (22)
  • November 2019 (11)
  • October 2019 (36)
  • September 2019 (44)
  • August 2019 (77)
  • July 2019 (117)
  • June 2019 (106)
  • May 2019 (49)
  • April 2019 (47)
  • March 2019 (24)
  • February 2019 (37)
  • January 2019 (12)
  • ARTICLES & NEWS

    • Business Continuity
    • Disaster Recovery
    • Crisis Management & Communications
    • Risk Management
    • Article Archives
    • Industry News

    THE JOURNAL

    • Digital Edition
    • Advertising & Media Kit
    • Submit an Article
    • Career Spotlight

    RESOURCES

    • White Papers
    • Rules & Regulations
    • FAQs
    • Glossary of Terms
    • Industry Groups
    • Business & Resource Directory
    • Business Resilience Decoded
    • Careers

    EVENTS

    • Fall 2026
    • Spring 2026

    WEBINARS

    • Watch Now
    • Upcoming

    CONTACT

    • Article Submission
    • Media Kit
    • Contact Us

    ABOUT DRJ

    Disaster Recovery Journal (DRJ) is the leading resource for business continuity, disaster recovery, crisis management, and risk professionals worldwide. With a global network of more than 138,000 practitioners, DRJ delivers essential insights through two annual conferences, a quarterly digital magazine, weekly webinars, and a rich library of online resources at www.drj.com. Our mission is to empower resilience professionals with the knowledge, tools, and connections they need to protect their organizations in a fast-changing world. Join our community by attending our events, subscribing to our publications, and following us on social media.

    LEARN MORE

    LINKEDIN AND TWITTER

    Disaster Recovery Journal is the leading publication/event covering business continuity/disaster recovery.

    Follow us for daily updates

    LinkedIn

    @drjournal

    Newsletter

    The Journal, right in your inbox.

    Be informed and stay connected by getting the latest in news, events, webinars and whitepapers on Business Continuity and Disaster Recovery.

    Subscribe Now
    Copyright 2026 Disaster Recovery Journal
    • Terms of Use
    • Privacy Policy

    Register to win a Free Pass to DRJ Fall 2026 | Resilience In Motion

    Leave your details below for a chance to win a free pass to DRJ Fall 2026 | Resilience In Motion. The winner will be announced on July 30. Join us for DRJ's 75th Conference!
    Enter Now