Thanks for already sharing the following, @ipstefan:
What else have you read about the outage? What information can we share to learn from? What information can we share about business continuity testing, disaster recovery testing, resilience etc?
Share links and your thoughts on this thread as the story unfolds.
From what I have read, CrowdStrike automatically sent out a faulty channel file. This has caused PC’s running Windows (mainly Win 10 I think) to get stuck in a boot loop / BSoD (Blue screen of death).
There is a workaround to remove the faulty file (but if you have bitlocker you might not be able to get in via safe mode to do the workaround).
CrowdStrike have halted all updates until this is resolved, so if you haven’t been hit by it already, you should be ok…
So we know that the BSOD occurred after a reboot of Windows that was done because the update of Crowdstrike required it. This meant at boot, Windows had a conflict with a Crowdstrike corrupted file or its contents.
So to test it they just had to do the same. Or am I missing something?
What Crowdstrike has done was stop the updates as soon as they found out what they caused.
I heard from people with Crowdstrike installed that their company’s IT department sent them an email to not accept the update if allowed to be installed.
I am wondering next, as a company is it wise to keep auto-update ON for all the employees’ computers for any tool? Or should a check be done first by IT for each update of any essential tool?
As far as I could find using the jobs published online they do their testing in India, for the marketing websites with QA Analysts(write and execute test plans, and cases), and with SDET(automate things) for the apps. An example that might reveal their focus:
They are currently looking for broader solutions as far as I saw.
The renaming of the Crowdstrike corrupted/faulty file might not work for everyone. On top, it’s required generally to be done by IT hands-on for each employee, together with a reboot of the machine.
The conclusion in the NY Times article was clear:
"The outages underscored an uncomfortable reality that software companies face few liabilities for major disruptions and cybersecurity incidents. The economic and legal penalties for such significant outages can be so minimal that companies are not motivated to make more fundamental changes. While a car manufacturer would face stiff penalties for faulty breaks, a software provider can often issue another update and move on.
“Until software companies have to pay a price for faulty products, we will be no safer tomorrow than we are today,” Mr. Parenty said."
About the CS fix:
“The issue has been identified, isolated and a fix has been deployed.” - written by lawyers who don’t understand the issue. The missing part is “fix has to be applied manually to every impacted system”
Did CS just cause the biggest impact in the history of software?
Apparently “Crowdstrike customers account for 298 of the Fortune 500…”
And then to understand some context, I’ve gone through several people comments:
“Thanks to the news - C Suite is down my throat… First thing I got to wake up in the morning was the COO sending me these news articles about the outage and how we should get away from Microsoft.”
“I am sure even the most knowledgeable and resourceful hacking groups couldn’t cause a disruption and damage of this magnitude. And CrowdStrike supposed to save us from the bad guys!
We have hundreds of Windows servers and thousands of Windows workstations affected by this.”
“Wow, I’m a system admin whose vacation started 6 hours ago… My junior admin was not prepared for this”
“Malaysia here, 70% of our laptops are down and stuck in boot, HQ from Japan ordered a company wide shutdown, someone’s getting fireblasted for this shit lmao”
“Here in the Philppines, specifically in my employer, it is like Thanos snapped his fingers. Half of the entire organization are down due to BSOD loop. Started at 2pm and is still ongoing. What a Friday.”
"I was here. Work for local government. 2 of our 4 DC’s in a boot loop, multiple critical servers, workstations etc. a little win was our helpdesk ticketing server went down… Might leave that one on a BSOD "
“I work at a big tech company. We had the president of crowdstrike in our triage call. Shit was going down.”
“Joining the outage party, CS took down 20% of hospital servers. Gonna be a long night”
“Working for a major Fortune 500 global brand. We’re decapitated. Nice going CrowdStrike.”
“This is unprecedented. I manage a large city, all of our computers, police and public safety and bsod. Calltaker and Dispatch computers. People’s lives have been put at risk.”
“We’ll be filing a lawsuit in Ohio at 9AM ET this morning. All systems down.”
“Work in aviation, everything is down :/”
“What a shit show! Entire org and trading entities down here. Half of IT are locked out.”
“I was here. Took down 80% of hospital infra”
“Woke up at 1am randomly and saw messages from the third shift showing me pics of bsods…it’s now 3am and finding out it was crowd strike who we just switched to after a ransomware incident makes me just want to jump off a cliff.”
“Besides banks, this Crowdstrike failure has crippled the U.S. healthcare system. Most hospitals are having at least some system issues. We currently have no access to the drug machines, charting systems, patient info, security systems, telemetry systems, radiology systems, the lab network, and the alarm system that keep folks from stealing babies from the nursery.”
“I just woke up to this and tried to remote into my machine and can’t. It’s going to be a bad day for my 5 man team with 1200 machines across three counties.”
“We have thousands of servers/workstations affected BSOD. Not going to be a fun day. One of the worst fuckups by a company, ever.”
“Cleaning up the shitshow this is causing is going to be a blast. Work in operations for a company of 100k+ and can’t even get on my PC. Hope they roll back quickly or this is going to be a catastrophe”
“I’m just a dumb Truck driver who can’t get loaded because the shipper systems are all down.”
“Man, I didn’t even know what crowdstrike was until tonight when every. single. computer. In my factory went blue.
I spent the last 4 hours deleting this update out of nearly 40 computers and I wasn’t even done, the IT guy finally came in and took over.”
“In Denmark banks, hospitals, as well core personal ID certification which is needed for all purposes is also down”
“20k corporate laptops with bitlocker to be fixed on my company, what a nice weekend for some ppl… Deploy in Friday.”