drj logo

"*" indicates required fields

This field is for validation purposes and should be left unchanged.
Name*
Zip Code*
Please enter a number from 0 to 100.
Strength indicator
I agree to the Terms of Service and Privacy Policy*
Yes, of course I want to receive emails from DRJ!

Already have an account? Log in

drj logo

Welcome to DRJ

Already registered user? Please login here

Login Form

Register
Forgot password? Click here to reset

Create new account
(it's completely free). Subscribe

x
Skip to content
Disaster Recovery Journal
  • EN ESPAÑOL
  • SIGN IN
  • SUBSCRIBE
  • THE JOURNAL
    • Why Subscribe to DRJ
    • Digital Edition
    • Article Submission
    • DRJ Annual Resource Directories
    • Article Archives
    • Career Spotlight
  • EVENTS
    • DRJ Spring 2026
    • DRJ Fall 2026 Call for Presentations
    • DRJ Fall 2026
    • DRJ Scholarship
    • Tracey Rice Memorial Scholarship
    • Other Industry Events
    • Schedule & Archive
    • Send Your Feedback
  • WEBINARS
    • Upcoming Webinars
    • On Demand
  • MENTOR PROGRAM
  • RESOURCES
    • New to Business Continuity?
    • White Papers
    • DR Rules and Regs
    • Planning Groups
    • DRJ Glossary of Business Continuity Terms
    • Careers
    • The BCI Partnership
  • ABOUT
    • About DRJ
    • 2026 Media Kit
    • Board and Committees
      • Executive Council Members
      • Editorial Advisory Board
      • Career Development Committee
      • DEI
      • Glossary Committee
      • Rules and Regulations Committee

AI Coding Agents Are Blind. New Research from Causal Dynamics Lab Gives Them Sight, Outperforming Claude Code and Codex in Key Benchmarks

by Jon Seals | May 5, 2026 | | 0 comments

AI coding agents are operating blind in production. Causal Dynamics Lab’s new research explains why, and their flagship product Cielara Code beat both Claude Code (Opus-4.6) and OpenAI Codex (GPT-5.4) across three independent benchmarks at the hardest part of agent work: finding the right place to make a change

SAN FRANCISCO – AI coding agents are shipping code faster than teams can verify what that code will do in production.These code changes look correct in review and pass checks, but trigger unpredictable failures once they interact with real dependencies, policy constraints, runtime state, and infrastructure topology. Causal Dynamics Lab believes the root cause is not the models. The problem is that agents can’t see the systems they are changing.

The 2025 DORA report tied AI coding tool adoption to a 7.2 percent decline in deployment stability. AWS CTO Werner Vogels calls the dynamic Verification Debt. Today, Causal Dynamics Lab released new research introducing a 6-layer causal ontology and a code causality graph designed to give coding agents “sight” into how production systems actually behave. Their flagship product, Cielara Code, uses this approach to validate changes before deployment, replacing brute-force search with structural navigation and pre-deployment simulation.

Causal Dynamics Lab instrumented native coding agents across thousands of sessions and logged every tool call. The distribution was lopsided. 56.8 percent of all agent actions were file reads. 24.2 percent were grep. Less than 1 percent were actual edits. The agents weren’t struggling to generate patches. They were struggling to find the right files. The pattern sharpened with complexity. When the ground truth fix spanned more than six files, agent recall dropped from 0.579 to 0.143, and failed trajectories consumed four times the compute of successful ones.

“Every coding agent today navigates by grep. That is the equivalent of a surgeon operating without imaging. We built Cielara Code to give agents sight: a causal model of the production environment that makes the reasoning behind every change explicit and verifiable,” said Hasibul Haque, CEO of Causal Dynamics Lab.

As session length and codebase size grow, general-purpose agents lose structural context and degrade into brute-force search. A publicly documented Claude Code regression (GitHub issue #42796) is a visible example of the same dynamic at scale. The underlying issue is architectural: current agents ingest code as flat text and have no representation of how files depend on each other, how functions call each other, or how changes propagate through the system.

Cielara Code is designed to fill that gap before failures reach production. At the core is a Production World Model that maps a company’s production environment into a 6-layer causal graph: what the code does, why it was built, who owns it, how it is constrained, where it runs, and what actually happened at runtime. A runtime failure can be traced back to the commit that introduced the change, the developer who approved it, and the intent behind the change. Before an agent begins exploration, Cielara constructs a Code Dependency Causal Graph indexing four relationship types so the agent navigates structure instead of scanning files sequentially.

Across three independent benchmarks, Cielara Code beat both Claude Code (Opus-4.6) and OpenAI Codex (GPT-5.4) at the hardest part of agent work: finding the right place to make a change. Overall localization accuracy hit 0.774, versus 0.738 for Claude Code and 0.707 for Codex. On MULocBench (1,033 issues across 46 repositories), Cielara reached 0.752 recall@5 versus 0.727 for Claude Code, and cut mean task time from 141.84 to 128.62 seconds. The result: fewer wrong-file edits, fewer failed runs, and 30 to 40 percent lower compute cost per task.

Cielara’s REASONARA is the causal memory layer that makes this practical at enterprise scale. Rather than stuffing an entire codebase into a prompt every time, REASONARA stores the production world as a graph-structured causal memory holding 125M+ tokens of effective context, retrieving only what matters for the question at hand. A single lookup typically uses 1,000 to 2,500 tokens versus 23,000 to 115,000 for full-context approaches. It can save up to 98% of token consumption compared to full context reasoning.  On independent benchmarks, REASONARA scores 94 percent on UltraDomain, 92 percent on LoCoMo, 73 percent on LoCoMo-plus, and 87.4 percent on LongMemEval, running five to eight times faster than Codex high reasoning mode. The roadmap targets a one-billion-token context window.

Causal Dynamics Lab positions Cielara Code as a verification layer for existing AI coding agents, making their output production-safe without replacing them. Currently, 11 Fortune 100 and over 40 Fortune 500 companies use Cielara Code on their codebase.

“Board and auditor expectations for proactive risk management have risen sharply. Leaders now demand evidence that security can anticipate risks from rapid AI and automation, rather than depending on post-incident response,” says the CISO of one of the largest law firms in the USA and a current customer of Cielara Code. 

Phillip Miller, Vice President, Global Chief Information Security Officer, H&R Block added: “Enterprises need solutions to problems they cannot solve with people alone. Cielera’s technology is a generational leap towards the original promise of AI: tackling complexity 7×24 with acquired knowledge, deep reasoning, and unbeatable accuracy. For engineering teams, this means a single engine to discover faults in real-world deployments (including legacy, cloud) and provide clear resolution steps. When I wrote, Hacking Success, I described a world where AI needs strong, directive policy (not rules / guardrails) to be safe and effective. Information Security lags behind the innovation curve, as most options rely on legacy thinking including posture, gateways, and logging. Enterprises now have an option to leverage Cielera’s models to oversee deployments of AI agents, models, and their supporting infrastructure.”

The team’s expertise is deeply rooted in the very challenges they aim to solve. CEO Hasibul Haque led platform engineering at Uber during its hyper-growth phase, and CTO Ryan Turner is a former Uber Staff Engineer and a CNCF SPIRE maintainer. Their research is guided by Dr. Xuchao Zhang (ex-Microsoft Research) and Dr. Liang Zhao (Emory, with over 200 publications), a collaboration supported by a formal R&D partnership with Emory’s AI Lab.

Matt Fisher, Former Co-Founder and CTO of Daydream and Adjunct Professor Brown University added: “AI has already changed how people access information. The next step is changing how people make decisions. Instead of only asking what is true right now, teams should be able to explore what could happen next, compare possible paths, and understand the consequences of action before committing. That move from answers to simulation is a powerful shift, and it is where the Casual Dynamic Lab is focused.”

Looking ahead, the Production World Model is designed as a foundation, not a feature. Cielara Code and REASONARA are the first products to be deployed on top of it; over time, Causal Dynamics Lab plans to extend into full causal simulation of proposed changes across code, infrastructure, policy, and runtime. The company expects the decision record to become a permanent reasoning layer of the enterprise stack: one any AI agent can query before changing the systems that keep production running.

Related Content

  1. CISOs: Anchor AI Security Budgets in Risk, Not Fear
    CISOs: Anchor AI Security Budgets in Risk, Not Fear
  2. Disaster Recovery Journal
    Causal Dynamics Lab Outperforms Anthropic & OpenAI in Multiple Coding Tests
  3. self-healing DR-as-code
    Building Resilience with Self-Healing DR-as-Code Pipelines

Recent Posts

Oasis Security Reveals Cross-Origin WebSocket Hijack in Cline’s Kanban Server

May 7, 2026

Pit Launches with $16 Million Led by Andreessen Horowitz to Bring AI-Native Software to Enterprise Operations

May 7, 2026

ICBA Names New ThinkTECH Accelerator Cohort to Help Community Banks Tackle Innovation Priorities

May 7, 2026

KnowBe4 Announces Strategic Partnership with Secure Code Warrior to Deliver Interactive Secure Coding Training

May 6, 2026

Sysdig Introduces the Industry’s First Headless Cloud Security Platform Built for AI Agents

May 6, 2026

Keeper Security Research Reveals 89% of IT Leaders Struggle to Manage Growing Identity Footprint Amid AI Expansion

May 6, 2026

Archives

  • May 2026 (20)
  • April 2026 (70)
  • March 2026 (89)
  • February 2026 (76)
  • January 2026 (61)
  • December 2025 (45)
  • November 2025 (58)
  • October 2025 (78)
  • September 2025 (65)
  • August 2025 (59)
  • July 2025 (70)
  • June 2025 (54)
  • May 2025 (59)
  • April 2025 (91)
  • March 2025 (57)
  • February 2025 (47)
  • January 2025 (73)
  • December 2024 (82)
  • November 2024 (41)
  • October 2024 (87)
  • September 2024 (61)
  • August 2024 (65)
  • July 2024 (48)
  • June 2024 (55)
  • May 2024 (70)
  • April 2024 (79)
  • March 2024 (65)
  • February 2024 (73)
  • January 2024 (66)
  • December 2023 (49)
  • November 2023 (80)
  • October 2023 (67)
  • September 2023 (53)
  • August 2023 (72)
  • July 2023 (45)
  • June 2023 (61)
  • May 2023 (50)
  • April 2023 (60)
  • March 2023 (69)
  • February 2023 (54)
  • January 2023 (71)
  • December 2022 (54)
  • November 2022 (59)
  • October 2022 (66)
  • September 2022 (72)
  • August 2022 (65)
  • July 2022 (66)
  • June 2022 (53)
  • May 2022 (55)
  • April 2022 (60)
  • March 2022 (65)
  • February 2022 (50)
  • January 2022 (46)
  • December 2021 (39)
  • November 2021 (38)
  • October 2021 (39)
  • September 2021 (50)
  • August 2021 (77)
  • July 2021 (63)
  • June 2021 (42)
  • May 2021 (43)
  • April 2021 (50)
  • March 2021 (60)
  • February 2021 (16)
  • January 2021 (554)
  • December 2020 (30)
  • November 2020 (35)
  • October 2020 (48)
  • September 2020 (57)
  • August 2020 (52)
  • July 2020 (40)
  • June 2020 (72)
  • May 2020 (46)
  • April 2020 (59)
  • March 2020 (46)
  • February 2020 (28)
  • January 2020 (36)
  • December 2019 (22)
  • November 2019 (11)
  • October 2019 (36)
  • September 2019 (44)
  • August 2019 (77)
  • July 2019 (117)
  • June 2019 (106)
  • May 2019 (49)
  • April 2019 (47)
  • March 2019 (24)
  • February 2019 (37)
  • January 2019 (12)
  • ARTICLES & NEWS

    • Business Continuity
    • Disaster Recovery
    • Crisis Management & Communications
    • Risk Management
    • Article Archives
    • Industry News

    THE JOURNAL

    • Digital Edition
    • Advertising & Media Kit
    • Submit an Article
    • Career Spotlight

    RESOURCES

    • White Papers
    • Rules & Regulations
    • FAQs
    • Glossary of Terms
    • Industry Groups
    • Business & Resource Directory
    • Business Resilience Decoded
    • Careers

    EVENTS

    • Fall 2026
    • Spring 2026

    WEBINARS

    • Watch Now
    • Upcoming

    CONTACT

    • Article Submission
    • Media Kit
    • Contact Us

    ABOUT DRJ

    Disaster Recovery Journal (DRJ) is the leading resource for business continuity, disaster recovery, crisis management, and risk professionals worldwide. With a global network of more than 138,000 practitioners, DRJ delivers essential insights through two annual conferences, a quarterly digital magazine, weekly webinars, and a rich library of online resources at www.drj.com. Our mission is to empower resilience professionals with the knowledge, tools, and connections they need to protect their organizations in a fast-changing world. Join our community by attending our events, subscribing to our publications, and following us on social media.

    LEARN MORE

    LINKEDIN AND TWITTER

    Disaster Recovery Journal is the leading publication/event covering business continuity/disaster recovery.

    Follow us for daily updates

    LinkedIn

    @drjournal

    Newsletter

    The Journal, right in your inbox.

    Be informed and stay connected by getting the latest in news, events, webinars and whitepapers on Business Continuity and Disaster Recovery.

    Subscribe Now
    Copyright 2026 Disaster Recovery Journal
    • Terms of Use
    • Privacy Policy

    Register to win a Free Pass to DRJ Fall 2026 | Resilience In Motion

    Leave your details below for a chance to win a free pass to DRJ Fall 2026 | Resilience In Motion. The winner will be announced on July 30. Join us for DRJ's 75th Conference!
    Enter Now