Outages in the wild, Cloudflare, Amazon and beyond

rosie · 19 November 2025 13:05

With Cloudflare having an outage yesterday, they published a post-mortem.

2025 feels like it’s been a year of outages.

What have you learned about them?

Have you read any useful articles or blog posts on the topic?

I’ve started creating a collection of outages in the wild to help us learn, I’d love to add more to it.

larsthomsen · 20 November 2025 00:01

Thanks for the link to the postmortem, @rosie! Looks like a test of the permissions tool that was implemented and which caused the whole trouble would have been a good idea!

juanalvarezarquillos · 20 November 2025 09:03

We’re not learning from recent outages. The internet now depends on just a handful of providers (AWS, Cloudflare, etc.), yet many companies overlook the associated risk.
Investing in multi‑cloud strategies—deploying workloads across AWS, Google Cloud, and Azure—reduces reliance on any single vendor, lowers exposure to service disruptions, and strengthens our negotiating position in contracts. By diversifying our infrastructure, we protect our services and improve resilience. Multi-vendor should be the way to go.

Rod · 20 November 2025 11:15

If an incident doesn’t create new probes, we paid tuition and learnt nothing. @martin.hynie

mirza · 20 November 2025 13:53

I think we’ll be seeing more of these, especially with companies that decide to let AI write their code.

mikeharris · 21 November 2025 21:11

I found that Genichi Taguchi’s book “Introduction to Quality Engineering” helped me understand an earlier outage, and I think it is relevant to other outages too: Learning from CrowdStrike with Taguchi – TestAndAnalysis

Topic		Replies	Views
30 Days of DevOps Day 22: Explore Outages 30 Days of Testing devops , 30-days-of-testing	2	674	25 May 2020
Breaking news: Crowdstrike Microsoft IT outage latest - Crowdstrike Windows glitch sparks global chaos as airlines, banks and Sky News go down Discussions security , bugs , in-the-news , outages	30	376	24 September 2024
What are your testing challenges when moving to the cloud? Discussions learning , questioning	7	872	4 March 2024
What is your companies DR (Disaster Recovery) strategy if Github goes down? Discussions strategy , process	9	192	11 February 2025
How will AWS Cloud Practitioner help with QE testing? Discussions learning , career-development , certifications	3	543	5 July 2023

Outages in the wild, Cloudflare, Amazon and beyond

Related topics