drj logo

"*" indicates required fields

This field is for validation purposes and should be left unchanged.
Name*
Zip Code*
Please enter a number from 0 to 100.
Strength indicator
I agree to the Terms of Service and Privacy Policy*
Yes, of course I want to receive emails from DRJ!

Already have an account? Log in

drj logo

Welcome to DRJ

Already registered user? Please login here

Login Form

Register
Forgot password? Click here to reset

Create new account
(it's completely free). Subscribe

x
Skip to content
Disaster Recovery Journal
  • EN ESPAÑOL
  • SIGN IN
  • SUBSCRIBE
  • THE JOURNAL
    • Why Subscribe to DRJ
    • Digital Edition
    • Article Submission
    • DRJ Annual Resource Directories
    • Article Archives
    • Career Spotlight
  • EVENTS
    • DRJ Fall 2025
    • DRJ Spring 2026
    • DRJ Scholarship
    • Other Industry Events
    • Schedule & Archive
    • Send Your Feedback
  • WEBINARS
    • Upcoming Webinars
    • On Demand
  • MENTOR PROGRAM
  • RESOURCES
    • New to Business Continuity?
    • White Papers
    • DR Rules and Regs
    • Planning Groups
    • Business Resilience Decoded
    • DRJ Glossary of Business Continuity Terms
    • Careers
  • ABOUT
    • Advertise with DRJ
    • Board and Committees
      • Executive Council Members
      • Editorial Advisory Board
      • Career Development Committee
      • DEI
      • Glossary Committee
      • Rules and Regulations Committee
  • Podcast

The Day a Database Permission Change Broke the Internet. A Cloudflare Story.

by Jon Seals | December 1, 2025 | | 0 comments

This post first appeared on the Liquidbase blog.

A single database change briefly broke part of the Internet.

On November 18, the Internet failed in a way few people expected. Traffic through Cloudflare dropped sharply. Websites stalled. Authentication flows froze. Workers KV struggled under the weight of timeouts cascading across the globe. For nearly three hours, a company known for its resilience watched its backbone misfire in slow, rhythmic waves.

The cause was not malicious. It was not a DDoS attack or a routing catastrophe. It was the quiet consequence of a single change to a database’s permissions. A minor adjustment, routine in most organizations, touched a hidden part of Cloudflare’s architecture and awakened a dependency no one had considered dangerous. That subtle shift doubled the size of a configuration file that feeds a machine learning model inside Cloudflare’s Bot Management system. The file exceeded an internal limit deep within Cloudflare’s core proxy. The proxy reacted in the worst possible way and collapsed.

What happened next revealed how modern systems fail today. Nodes that loaded the expanded file went dark. Nodes that loaded the old file continued to serve traffic. The network oscillated, recovering for minutes at a time before failing again, as if trapped between two different realities. Cloudflare’s engineers initially suspected an external attack because the symptoms felt coordinated and hostile. Only later did the true cause emerge. A metadata query began returning more rows than before because new permissions exposed an additional schema. The downstream system that consumed those rows had never been designed to handle that variation.

Once every shard of the ClickHouse cluster adopted the new permissions, every file produced was oversized and every proxy that touched it entered the same panic. Cloudflare froze the file generator, inserted a known good file, restarted core services, and worked through the tail of cascading failures until the last system recovered. By the end of the day, the company published a transparent postmortem that told a story far larger than the outage itself.

Cloudflare is one of the most capable engineering organizations in the world. Their systems are built to survive pressure that would overwhelm most companies. Their teams live in incident response. Their infrastructure is distributed, hardened, and instrumented with extraordinary detail. Yet the event that brought them down started with a quiet change in who could read what inside a database.

This should unsettle everyone who builds or operates modern systems. Today’s architectures are faster, more distributed, and more interdependent than at any other point in digital history. Everything regenerates. Everything adapts. Everything assumes the data beneath it will remain stable. When that foundation shifts, even slightly, the blast radius can reach across continents.

Outages like this are now board level events, not operations incidents. Executives understand that failure at the data layer no longer results in a brief technical interruption. It creates exposure. It undermines trust. It invites questions from regulators and customers who expect reliability even in the face of rapid innovation.

Cloudflare’s outage was not a story about a proxy limit. It was a story about the unseen assumptions that hold modern systems together. A metadata query expected a single view of the world. A downstream component expected a fixed number of features. A global propagation system expected uniformity in the file it distributed. None of those expectations were unreasonable. All of them were reasonable in isolation. Together, they created a perfect storm.

This is the fragility most enterprises underestimate. Many organizations still treat database change as something quieter and less consequential than application code. They wrap it in manual scripts, tribal reviews, and processes held together by institutional memory. They assume that the database is slow and stable. In reality, it has become one of the most dynamic components of modern infrastructure. It shapes ML features, runtime logic, access control, personalization, routing decisions, scoring models, and analytics flows. When a schema, permission rule, or metadata contract shifts unexpectedly, it does not stay contained. It ripples outward into every system that depends on it.

The arrival of AI heightens this risk. Models depend on structured signals. Pipelines depend on predictable metadata. Agents generate SQL that reaches directly into production systems. Automated build systems treat data as a living input. A harmless variation in a table’s shape can distort predictions, corrupt features, and undermine trust in automated reasoning. Modern companies are building AI on top of a data layer that often lacks the same controls, lineage, and governance applied to code.

Cloudflare’s incident showed how dangerous that assumption has become. In most enterprises, the level of visibility Cloudflare has would be considered exceptional. The speed with which they diagnosed and recovered would be nearly impossible. If a routine metadata change can break one of the most sophisticated networks on earth, what does that mean for the organizations that lack Cloudflare’s discipline and tooling?

The lesson from November 18 is not that Cloudflare stumbled. It is that the Internet runs on an increasingly delicate mesh of interconnected systems that depend on the stability of the data beneath them. When the data layer shifts without guardrails, everything above it inherits the risk. Application code will not save you. Infrastructure automation will not save you. Even best in class observability may only help you understand the blast after it has already begun.

The only real path forward is a new level of discipline at the data layer. Databases must be governed with the same rigor applied to application pipelines. Schema and metadata changes must be versioned, validated, and controlled. Drift across environments must be observable. The systems that depend on structured data must be able to trust that the shape of that data will not change without warning. Organizations that fail to adopt this posture will continue to experience failures that appear sudden, unpredictable, and inexplicable, even though the root cause is often simple and internal.

If your database changes are still moving through email threads and ticket queues, you are not governing a critical control point. You are hoping it holds.

Incidents like this will not stop. They will only get stranger and harder to diagnose as AI, automation, and distributed systems stack more logic on top of fragile data contracts. The one thing that can change is whether those contracts are governed or left to chance. On November 18, a database permission change broke the Internet. It is tempting to see this as a one off incident. It is wiser to see it as a preview. This is how modern systems fail now. Not through a single dramatic blow, but through a tiny shift in the layer that everything else assumes is immovable. The next major outage will follow the same pattern. The question is whether the next organization is prepared.

The future of resilience begins with how you govern database change.

About the Author

Ryan McCurdy

VP of Marketing

Ryan brings more than 14 years of experience leading marketing at hyper-growth technology companies. He has built and scaled high-performing marketing organizations across cybersecurity, SaaS, and developer tools, driving revenue growth through a combination of brand storytelling, product marketing, and data-driven demand generation. Prior to joining Liquibase, Ryan held marketing leadership roles at companies including Astronomer, Bolster, Lacework, and Druva. Ryan holds a BA in Film Production from Brooks Institute and an MBA from Walden University.

Related Content

  1. Disaster Recovery Journal
    Change Change Change
  2. Disaster Recovery Journal
    Special Challenges Over Extended Distance
  3. Disaster Recovery Journal
    White Paper: The Internet – “Disaster Recovery Issues & Answers”

Recent Posts

SIOS LifeKeeper v10: Expanding Control and Streamlining HA/DR Management for System Admins

December 8, 2025

AT&T Preps to Power Through Winter: Unmatched Network Readiness for Any Storm

December 5, 2025

Modern IBM i Backup: 7 Key Features That Simplify, Secure, and Scale

December 5, 2025

Supporting the CISO and Security Leaders

December 5, 2025

SecurityBridge Unveils First-of-Its-Kind AI Security Companion for SAP

December 5, 2025

Kovrr Launches AI Risk Governance Suite, Empowering Enterprises to Unlock GenAI Value by Managing Risk Across the Full Lifecycle

December 4, 2025

Archives

  • December 2025 (20)
  • November 2025 (58)
  • October 2025 (78)
  • September 2025 (65)
  • August 2025 (59)
  • July 2025 (70)
  • June 2025 (54)
  • May 2025 (59)
  • April 2025 (91)
  • March 2025 (57)
  • February 2025 (47)
  • January 2025 (73)
  • December 2024 (82)
  • November 2024 (41)
  • October 2024 (87)
  • September 2024 (61)
  • August 2024 (65)
  • July 2024 (48)
  • June 2024 (55)
  • May 2024 (70)
  • April 2024 (79)
  • March 2024 (65)
  • February 2024 (73)
  • January 2024 (66)
  • December 2023 (49)
  • November 2023 (80)
  • October 2023 (67)
  • September 2023 (53)
  • August 2023 (72)
  • July 2023 (45)
  • June 2023 (61)
  • May 2023 (50)
  • April 2023 (60)
  • March 2023 (69)
  • February 2023 (54)
  • January 2023 (71)
  • December 2022 (54)
  • November 2022 (59)
  • October 2022 (66)
  • September 2022 (72)
  • August 2022 (65)
  • July 2022 (66)
  • June 2022 (53)
  • May 2022 (55)
  • April 2022 (60)
  • March 2022 (65)
  • February 2022 (50)
  • January 2022 (46)
  • December 2021 (39)
  • November 2021 (38)
  • October 2021 (39)
  • September 2021 (50)
  • August 2021 (77)
  • July 2021 (63)
  • June 2021 (42)
  • May 2021 (43)
  • April 2021 (50)
  • March 2021 (60)
  • February 2021 (16)
  • January 2021 (554)
  • December 2020 (30)
  • November 2020 (35)
  • October 2020 (48)
  • September 2020 (57)
  • August 2020 (52)
  • July 2020 (40)
  • June 2020 (72)
  • May 2020 (46)
  • April 2020 (59)
  • March 2020 (46)
  • February 2020 (28)
  • January 2020 (36)
  • December 2019 (22)
  • November 2019 (11)
  • October 2019 (36)
  • September 2019 (44)
  • August 2019 (77)
  • July 2019 (117)
  • June 2019 (106)
  • May 2019 (49)
  • April 2019 (47)
  • March 2019 (24)
  • February 2019 (37)
  • January 2019 (12)
  • ARTICLES & NEWS

    • Business Continuity
    • Disaster Recovery
    • Crisis Management & Communications
    • Risk Management
    • Article Archives
    • Industry News

    THE JOURNAL

    • Digital Edition
    • Advertising & Media Kit
    • Submit an Article
    • Career Spotlight

    RESOURCES

    • White Papers
    • Rules & Regulations
    • FAQs
    • Glossary of Terms
    • Industry Groups
    • Business & Resource Directory
    • Business Resilience Decoded
    • Careers

    EVENTS

    • Fall 2025
    • Spring 2026

    WEBINARS

    • Watch Now
    • Upcoming

    CONTACT

    • Article Submission
    • Media Kit
    • Contact Us

    ABOUT DRJ

    Disaster Recovery Journal (DRJ) is the leading resource for business continuity, disaster recovery, crisis management, and risk professionals worldwide. With a global network of more than 138,000 practitioners, DRJ delivers essential insights through two annual conferences, a quarterly digital magazine, weekly webinars, and a rich library of online resources at www.drj.com. Our mission is to empower resilience professionals with the knowledge, tools, and connections they need to protect their organizations in a fast-changing world. Join our community by attending our events, subscribing to our publications, and following us on social media.

    LEARN MORE

    LINKEDIN AND TWITTER

    Disaster Recovery Journal is the leading publication/event covering business continuity/disaster recovery.

    Follow us for daily updates

    LinkedIn

    @drjournal

    Newsletter

    The Journal, right in your inbox.

    Be informed and stay connected by getting the latest in news, events, webinars and whitepapers on Business Continuity and Disaster Recovery.

    Subscribe Now
    Copyright 2025 Disaster Recovery Journal
    • Terms of Use
    • Privacy Policy

    Register to win a Free Pass to DRJ Spring 2026 | The Future Runs on Resilience

    Leave your details below for a chance to win a free pass to DRJ Spring 2026 | The Future Runs on Resilience. The winner will be announced on December 19. Join us for DRJ's 74th Conference!
    Enter Now