drj logo

"*" indicates required fields

Name*
Zip Code*
Please enter a number from 0 to 100.
Strength indicator
I agree to the Terms of Service and Privacy Policy*
Yes, of course I want to receive emails from DRJ!
This field is for validation purposes and should be left unchanged.

Already have an account? Log in

drj logo

Welcome to DRJ

Already registered user? Please login here

Login Form

Register
Forgot password? Click here to reset

Create new account
(it's completely free). Subscribe

x
DRJ Fall 2025 Dallas Show
Skip to content
Disaster Recovery Journal
  • EN ESPAÑOL
  • SIGN IN
  • SUBSCRIBE
  • THE JOURNAL
    • Why Subscribe to DRJ
    • Digital Edition
    • Article Submission
    • DRJ Annual Resource Directories
    • Article Archives
    • Career Spotlight
  • EVENTS
    • DRJ Fall 2025
    • DRJ Spring 2025
    • DRJ Scholarship
    • Other Industry Events
    • Schedule & Archive
    • Send Your Feedback
  • WEBINARS
    • Upcoming Webinars
    • On Demand
  • MENTOR PROGRAM
  • DRJ ACADEMY
    • DRJ Academy
    • Beginner’s Guide to BC
  • RESOURCES
    • New to Business Continuity?
    • White Papers
    • DR Rules and Regs
    • Planning Groups
    • Business Resilience Decoded
    • DRJ Glossary of Business Continuity Terms
    • Careers
  • ABOUT
    • Advertise with DRJ
    • DEI
    • Board and Committees
      • Executive Council Members
      • Editorial Advisory Board
      • Career Development Committee
      • Glossary Committee
      • Rules and Regulations Committee
  • Podcast

Virtana Unveils the First Full-Stack AI Factory Observability Platform

by Jon Seals | May 20, 2025 | | 0 comments

Unique new capabilities help enterprises tame AI infrastructure complexity, boost resource efficiency, and bring predictability to industrial-scale AI operations

PALO ALTO, Calif. – Virtana, the leader in hybrid infrastructure observability, today announced the launch of Virtana AI Factory Observability (AIFO), a powerful new capability that extends Virtana’s full-stack observability platform to the unique demands of AI infrastructure. With deep, real-time insights into everything from GPU utilization and training bottlenecks to power consumption and cost drivers, AIFO enables enterprises to turn complex, compute-intensive AI environments into scalable, efficient, and accountable operations. This launch strengthens Virtana’s position as the industry’s broadest and deepest observability platform, spanning AI, infrastructure, and applications across hybrid and multi-cloud environments.

“AI has the potential to be as transformative as the steam engine or the printing press—but only if enterprises can operationalize it at scale,” said Paul Appleby, CEO of Virtana. “Right now, too many teams are flying blind when it comes to AI infrastructure. Virtana AIFO gives them the visibility and control they need to treat AI not as an experiment, but as a core, strategic part of the business.”

Following a surge in enterprise investment and industry focus on scalable AI Factory infrastructure from ecosystem leaders like NVIDIA, Virtana is the first to deliver a full-stack observability solution purpose-built for AI Factory operations. As organizations move from AI pilots to production, demand is growing rapidly for platforms that go beyond surface-level monitoring to deliver deep, correlated insights across infrastructure, models, and cost drivers.

Industry analysts have identified this shift as a key trend. AI is no longer a research initiative; it is becoming an operational foundation for business. Virtana’s AI Factory Observability (AIFO) directly addresses this evolution, helping enterprises treat AI infrastructure with the same level of visibility, discipline, and accountability as traditional IT.

As an official NVIDIA partner, Virtana integrates natively with NVIDIA GPU platforms to deliver in-depth telemetry, including memory utilization, thermal behavior, and power metrics, providing precise, vendor-validated insight into the most performance-critical components of the AI Factory. This deep integration delivers accurate, actionable intelligence at enterprise scale.

“AI workloads introduce an entirely different set of infrastructure challenges—from GPU saturation and training bottlenecks to unpredictable cost spikes,” said Amitkumar Rathi, Senior Vice President of Engineering, Product, and Support at Virtana. “We designed AIFO to address these realities head-on. It gives teams deep, correlated visibility across the full AI stack, enabling them to optimize performance, reduce waste, and scale AI with confidence.”

With this launch, Virtana directly addresses the growing infrastructure challenges that stand in the way of scalable AI success. As enterprises accelerate investments in AI, many are encountering hidden inefficiencies: idle GPUs that inflate costs, training jobs that fail without explanation, and inference pipelines that stall due to underlying storage or network issues. AIFO is purpose-built to solve these problems, delivering real-time visibility and correlated insights across every layer of the AI infrastructure stack. The result is greater control over performance, spend, and scale—turning AI from a high-risk initiative into a high-impact capability.

Purpose-Built Observability for AI Infrastructure

Unlike traditional monitoring tools built for general IT workloads, Virtana AI Factory Observability (AIFO) is purpose-built to meet the demands of AI operations. It continuously collects telemetry across GPUs, CPUs, memory, network, and storage and then correlates that data with training and inference pipelines to provide clear and actionable insights.

Core capabilities include:

  • GPU Performance Monitoring – Tracks per-GPU metrics such as memory, utilization, thermal load, and power draw across multiple vendors.
  • Distributed Training Visibility – Identifies bottlenecks, synchronization issues, and stragglers across multi-node jobs.
  • Infrastructure-to-AI Mapping – Correlates model-level performance directly to hardware-level behavior, including network and storage dependencies.
  • Power and Cost Analytics – Exposes inefficiencies such as thermal throttling, idle GPU time, and overprovisioning resources.
  • Root Cause Analysis – Diagnoses training failures and inference slowdowns faster by pinpointing the most likely infrastructure causes.

All capabilities are accessible via Virtana’s Global View dashboard, which unifies telemetry across hybrid and containerized AI environments—on-premises, cloud, or both.

Proven Results from Enterprise Deployments

AIFO is already delivering measurable results in production AI environments across multiple industries. Operational outcomes include:

  • 40% reduction in idle GPU time, improving resource utilization and reducing infrastructure costs.
  • 60% faster mean time to resolution (MTTR) for AI-related incidents
  • 50% decrease in false alerts, reducing operational noise and accelerating response
  • 15% improvement in power efficiency, supporting sustainability goals.

Available Now, Built for What’s Next

Virtana AI Factory Observability (AIFO) is now generally available as a fully integrated capability within the Virtana Platform. Purpose-built for the demands of modern AI infrastructure, AIFO scales effortlessly from early-stage test environments to enterprise-grade AI factories. This launch, together with Virtana’s recent acquisition of Zenoss, further extends the company’s leadership in delivering the deepest, and broadest observability platform across applications, infrastructure, and AI workloads in hybrid and multi-cloud environments.

Additionally, Virtana’s recent acquisition of Zenoss expands the platform’s event intelligence and service-centric observability capabilities, allowing customers to correlate AI model performance with broader application behavior and infrastructure health. Together, these advancements deepen Virtana’s ability to help enterprises manage the full complexity of AI operations in the most demanding environments.

This launch coincides with Virtana’s presence at Dell Technologies World 2025, where the company is showcasing AIFO in booth #262 and offering live demonstrations of its observability capabilities for GPU-intensive environments.

To read the blog post, visit https://www.virtana.com/blog/ai-factories-are-breaking-traditional-infrastructure-heres-how-were-fixing-it

To learn more or request a personalized demo, visit virtana.com.

About Virtana

Virtana is the leader in observability for hybrid infrastructure. The AI-powered Virtana Platform delivers a unified view across applications, services, and underlying infrastructure, correlating user impact, service dependencies, performance bottlenecks, and cost drivers in real time. Trusted by Global 2000 enterprises, Virtana helps IT, operations, and platform teams improve efficiency, reduce risk, and make faster, AI-driven decisions across complex, dynamic environments. Learn more at virtana.com.

Related Content

  1. Disaster Recovery Journal
    Exercising IT Disaster Recovery Plans
  2. Disaster Recovery Journal
    Virtana Acquires Zenoss to Deliver the Industry’s Deepest and Broadest Observability Platform
  3. Disaster Recovery Journal
    Cisco Launches Full-Stack Observability Platform

Recent Posts

HackerOne Launches Technology Alliance Program to Advance AI-Powered Security Ecosystem and Customer Innovation

June 16, 2025

Cayosoft Awarded Multi-Year Contract with Internal Revenue Service to Manage Microsoft Identity Environment

June 16, 2025

Volvo Penta and Central Power Expand Industrial Power Support Across the Midwest

June 16, 2025

Cyolo Unveils Major New Capabilities, Expanding Secure Remote Access Coverage for OT and Cyber-Physical Systems

June 16, 2025

Qualys Expands Public Sector Footprint with Opening of Washington, D.C. Office

June 11, 2025

How AI Can Cut Costs for Small Businesses—Without Job Cuts

June 11, 2025

Archives

  • June 2025 (31)
  • May 2025 (59)
  • April 2025 (91)
  • March 2025 (57)
  • February 2025 (47)
  • January 2025 (73)
  • December 2024 (82)
  • November 2024 (41)
  • October 2024 (87)
  • September 2024 (61)
  • August 2024 (65)
  • July 2024 (48)
  • June 2024 (55)
  • May 2024 (70)
  • April 2024 (79)
  • March 2024 (65)
  • February 2024 (73)
  • January 2024 (66)
  • December 2023 (49)
  • November 2023 (80)
  • October 2023 (67)
  • September 2023 (53)
  • August 2023 (72)
  • July 2023 (45)
  • June 2023 (61)
  • May 2023 (50)
  • April 2023 (60)
  • March 2023 (69)
  • February 2023 (54)
  • January 2023 (71)
  • December 2022 (54)
  • November 2022 (59)
  • October 2022 (66)
  • September 2022 (72)
  • August 2022 (65)
  • July 2022 (66)
  • June 2022 (53)
  • May 2022 (55)
  • April 2022 (60)
  • March 2022 (65)
  • February 2022 (50)
  • January 2022 (46)
  • December 2021 (39)
  • November 2021 (38)
  • October 2021 (39)
  • September 2021 (50)
  • August 2021 (77)
  • July 2021 (63)
  • June 2021 (42)
  • May 2021 (43)
  • April 2021 (50)
  • March 2021 (60)
  • February 2021 (16)
  • January 2021 (554)
  • December 2020 (30)
  • November 2020 (35)
  • October 2020 (48)
  • September 2020 (57)
  • August 2020 (52)
  • July 2020 (40)
  • June 2020 (72)
  • May 2020 (46)
  • April 2020 (59)
  • March 2020 (46)
  • February 2020 (28)
  • January 2020 (36)
  • December 2019 (22)
  • November 2019 (11)
  • October 2019 (36)
  • September 2019 (44)
  • August 2019 (77)
  • July 2019 (117)
  • June 2019 (106)
  • May 2019 (49)
  • April 2019 (47)
  • March 2019 (24)
  • February 2019 (37)
  • January 2019 (12)
  • ARTICLES & NEWS

    • Business Continuity
    • Disaster Recovery
    • Crisis Management & Communications
    • Risk Management
    • Article Archives
    • Industry News

    THE JOURNAL

    • Digital Edition
    • Advertising & Media Kit
    • Submit an Article
    • Career Spotlight

    RESOURCES

    • White Papers
    • Rules & Regulations
    • FAQs
    • Glossary of Terms
    • Industry Groups
    • Business & Resource Directory
    • Business Resilience Decoded
    • Careers

    EVENTS

    • Fall 2025
    • Spring 2025

    WEBINARS

    • Watch Now
    • Upcoming

    CONTACT

    • Article Submission
    • Media Kit
    • Contact Us

    ABOUT DRJ

    Disaster Recovery Journal is the industry’s largest resource for business continuity, disaster recovery, crisis management, and risk management, reaching a global network of more than 138,000 professionals. Offering weekly webinars, the latest industry news, rules and regulations, podcasts, the industry’s only official mentoring program, a quarterly magazine, and two annual live conferences, DRJ is leading the way to keep professionals up-to-date and connected in an ever-changing world.

    LEARN MORE

    LINKEDIN AND TWITTER

    Disaster Recovery Journal is the leading publication/event covering business continuity/disaster recovery.

    Follow us for daily updates

    LinkedIn

    @drjournal

    Newsletter

    The Journal, right in your inbox.

    Be informed and stay connected by getting the latest in news, events, webinars and whitepapers on Business Continuity and Disaster Recovery.

    Subscribe Now
    Copyright 2025 Disaster Recovery Journal
    • Terms of Use
    • Privacy Policy