Any testers doing chaos?

I’m doing research for a blog / talk, and wondered WHO is responsible for initiating chaos experiments in an organization. I see a bunch of Site Reliability Engineering and Operations teams, since they feel the pain of an outage. But then wondered why not move chaos earlier in the development lifecycle where the cost of bugs is lowest - but haven’t yet come across any testers doing chaos tests.

Anybody testers involved with chaos testing at their companies willing to share their perspectives with me?

1 Like

I have definitely been advocating for this actively but I feel like its maybe a couple of years from becoming mainstream. I wrote a blog post with links and images of how complex testing can get once you’re doing it actively in production. I think there is going to be a convergence of roles and testers doing more in production and some companies leveraging cloud/on-premises “service orchestrators” that provide better fault tolerance. I’m a product owner on a SRE-like team and hope in a couple years we can start buying tools like Gremlin to enable us to do gameday experiments.

2 Likes

Testers don’t do chaos. Testers try to expose the hidden chaos with the aim of helping others restore order. :slightly_smiling_face:

This is definitely an open question for me too. When we let chaos engineering be defined closer to exploratory testing then it’s actually easy to see we already do it as software testers. If you’ve ever identified a resiliency you think you’re application should have and then checked it, that’s chaos engineering. The most common example I’d guess would be performance testing.

I am not exactly what you’re asking about as I am a tester but on the platform engineering team which means my title/role is closer to SRE (genuinely find name evolution so entertaining!).

1 Like

I think the word chaos here is used as in the context of Chaos Engineering.

I realised that perhaps an hour after I’d posted - too late to prevent others from having seen it. That’s what happens if you get ten minutes behind the leading edge (I’ve recently been taken off testing to fill in as an interim Technical Author…)