Hey all - We create a mobile app for both Android and iOS and are trying to increase our app stability (fewer users that experience crashes in production). What tips, tools, or best practices do you recommend for surfacing and catching new mobile crashes before your customers do? Unfortunately we can’t reproduce most of these crashes in dev and QA, but they are significant enough with our user base that they don’t allow us to meet our (ambitious) stability goals.
What we’ve tried:
- Evaluated our crash trends to create testing best practices to catch more common crash scenarios
- Test on a variety of devices and operating system versions using both physical devices and emulators
- Became obsessed with feature flags to at least mitigate stability issues in prod when new crashes are observed
- We are introducing automated monkey testing tools for functional QA
- Our automation team is incorporating fuzz testing and suboptimal internet connections in our regular test runs. We’ve also talked about monitoring memory leaks but haven’t moved forward on that yet due to effort required.
I’m curious if anyone else has tips for how functional QA or automation can help catch hard-to-repro crashes when they’re introduced instead of waiting for everything to blow up in prod, or if you have general stories on how your companies have improved their stability/uptime for your products.
Have you considered (more) specific logging? To at least get an idea what might cause the crashes.
Not only have typical application log, but you also might consider the states of hardware (memory, internet connection, GPS, general state of hardware).
And is it easy for users to send you the logs?
Maybe with more logs you can get the overall so you can focus your testing better.
To me it looks like you are doing in general a good work.
Maybe you have here a specific error behavior which is hard to catch?
Other than that I only have the idea of multiplying the test budget. If you need to test more devices and situations you need more testers doing so. Those tester not only as executors of scripts but also as sources of new ideas - well trained and paid people.
May I ask how your management is about that? Creates it pressure towards the tester and/or devs for bad? Does it support you?
Looks like, as Sabastian pointed out, a lot of work being done that creates pressure for the team. There may be ways to improve stability through architecture change and layering, but ultimately it can become an uphill battle, and even if you monitor crash statistics, you will always have users using your app in devices and situations where the app will either crash or not be serviceable, which is ultimately the same outcome for a user. Neither of these are easy to measure, only advice I can give is to try to achieve regular release cadence. Because it looks like you are doing all of the things already.