The NetFlix Chaos Monkeys has always intrigued me. Imagine having bots kill elements in your production environment - on purpose to test the resilience⌠âin the wildâ. Releated story:
Usually testing in production is labelled âshift rightâ
on https://www.linkedin.com/pulse/8-reasons-shift-testing-right-lanette-creamer/ Lanette writes aobut Uptime Testing, Release Dress Rehearsal, End to End Workflow Testing, Collecting, Translating, and Isolating Customer Issues, Actionable Analytics, Security Bug Bounty Program, More Automation, Customer Visits
Katrina Clokieâs book âA Practical Guide to Testing in DevOpsâ contains great insights about this subject: see https://leanpub.com/testingindevops
A big +1 for Chaos Monkey. I discovered a bug in the load balancers at Racksapce (Our hosting provider at the time) using it and we ended up switching to another provider because it was so serious.
You already linked to a blog post by Charity Majors, but she gave a great presentation at Strangeloop this past year:
Sheâs a pretty entertaining speaker, and has an interesting focus on observability over metrics and dashboards. She also stresses that sheâs not saying to replace testing in lower environments, just that at large scale, you canât test for the situations youâre likely to encounter in prod.
I remember when I worked for a credit card company, and the sheer joy of âtesting in productionâ, because it meant going around shops with several of our cards, looking at their payment terminal, and going âwhatâs the cheapest thing in the shop ⌠I WILL BUY ITâ.
For me, a huge part of it has always been monitoring - looking at what our users are doing, and understanding the scope of what they do. We use Piwik to help with this. Weâll perform biscuit factory testing (random sampling) to make sure things are as we expect, as well as build up general models of what people do.
But also weâd develop ways and rules about doing our own thing in production. As you can imagine, when spending with credit cards, Iâd come back with a lot of receipts which needed to be filed, and we also had a way of montoring âcards for testingâ for misuse.
BTW - best damn test Iâve ever done in production. I created a credit card for a 18 and 17 year old. One would be able to purchase alcohol, the other was not. Sadly I was not allowed to keep the purchased alcohol.
For a client we have build a âbeta testers programâ where users could sign up to access new features before they became mainstream available. They also were offered an option to be tracked during their visit on the web application as a way to improve the service and discover flaws.
With only 8% of all users signing up for the program, we had detailed end-to-end access flows through the whole web application giving us repeatable usage data that we could use in our automated test tools (like Selenium) and issues that these users generated (404 and 500 errors) were added to the priority issue tracker, with a full detail about which pages they visited, what button or image they clicked on and which components were affected.
We discovered many small but very impacting issues that we would normally not find in our pre-production tests. We also learned that users donât use an application the way developers, testers, business owners or project managers think they will. This type of insight in user behaviour on the production web application thought us many things and made the overal application better for the end-user.
In this thread mentions of Netflix Chaos Monkey were made as well. We consider that to be part of our resilience tests to find flaws in our design and dependencies when parts of our application architecture are no longer or only partial available. Itâs a necessary step to ensure that we fail safely or that we have mechanisms in place for when a dependency is not available. This has thought us to implement heartbeats and deferred execution solutions in our application architecture.
One important piece of advice: when you want to implement such a âbeta programâ, especially with the upcoming GDPR and other privacy regulations, make sure you have an explicit approval from the user as you will gather a lot of information, often sensitive PII. And have people re-confirm their interest every 6 months or year as most people forgot they signed up for the program.
Whether it is performance testing services or web app testing most of the time we get a chance to test pre-release or testing before release.
There are however certain advantages of testing in production.
You get the live feedback of customer reviews.
It helps you get real time user experience.
The performance of app can be monitored while the application is live.
Load testing can be better executed in production.
Beta releases are best versions to get feedback on newly added features.
Live traffic data helps to analyze app and get feedback on which services are most used.
I once worked at a company that did printed products. The complete workflow where the order went through to final printing was only executed on production. That along with true payment processing, not the test mode of payment processing invoked in test environments, and/or where using test credit card numbers that only worked on fake orders in test environments, etc.
As a result some tests were run on production to regression validate the payment processing, and the product printing, which ends up being shipped back to the company or to an employeeâs home address. The test orders were typically random, unpurposeful text and images printed on the product, not customized orders for the employee to make use of while testing the system. I think any actual payments used were refunded by the billing team for these orders, the employer incurred the cost for testing the system. Other times for these production orders, special test promo codes were used to discount the product to zero to not need to bill the order.