How do you corner the tricky bug?

Greetings everyone,

First, a confession. I’m not a tester; I’m a developer … (Impostor!). :wink:

Imagine a piece of creative software, something like Photoshop for the web browser, where end users have a lot of freedom. This allows for highly variable workflows that, on the surface, seem nearly impossible to account for with pre-planned manual regression testing. The cost of sitting down and mapping every permutation is too great. So when something comes up and we’ve got concrete repro steps, we add a test case and make sure it doesn’t break in future releases. But what do you do when you just can’t make it happen?

Sometimes, despite our best efforts, we can’t get to the root of an issue. Users are in different time zones and geographical locations so we can’t easily look over their shoulders when they don’t know how to make a bug happen, but they know what it looks like when it does happen. In tough cases we’ve stayed up late, recording screen sharing sessions, essentially watching someone work for hours, hoping the thing will break. In rare cases we’ve flown testers or developers to do this in person. Using these techniques we end up figuring it out, but I’d like to know the experts (you!) handle situations like this. I imagine there are cheaper and faster approaches that can be used.

I’ve done research online of course. Here is a sample some of the advice I found:

My takeaways are:

  • Record video
  • Add logging
  • Use fuzzing

We’ve tackled this from the dev end, adding client and server side logging, even a client side exception global handler that makes it easy to copy/paste unhandled exceptions, but a lot of times, the best we have is an artefact in a broken state and we can’t figure out how it got there. We’ve asked our QA team to jump in with exploratory testing and they are a great help in most cases. Sometimes though they can’t figure it out either. I don’t like the idea of an end user recording their own video for hours; Editing and shipping to us for analysis as it seems like too much of a burden on their end. I’m unfamiliar with fuzzing, but that seems to deal more with form field inputs if I understand correctly. In our case users are manipulating layers and graphical objects with the mouse a lot, so it doesn’t seem like a suitable approach. What would you do in this situation?


That’s a bugger of a problem. You are being screwed over by something you may already know as Ashby’s Law of Requisite Variety, or “in a stable system you need as many (or more) control states as system states”. You usually can’t cover these control states exhaustively - or nearly at all. That’s before we get into factors like platform, chipset, room temperature or time of day. It’s infinite. Don’t feel too bad, that’s just the way the world is. This is partly why good testing is hard.

The cheapest thing is probably to find these problems before they get out into the wild. Use good testers who can focus/defocus well and understand the value of galumphing (very basically these mean “find new problems by doing new things in new ways” and “doing something crazy but cheap while testing that shouldn’t matter but oh look sometimes it does”). Encourage wild and crazy play with the system to find new problems.

You should also be looking at collecting data about reported bugs. Where they are, what their root cause is when you can find them… this will give you access to a better “risk catalogue” of things to test for because you’ll know where your weaknesses are. If you know what tends to go wrong you can look for it better. Do 5-whys (or whatever) on important problems found in the wild to adapt your processes, including how your testing gets done.

Reduce the “pre-planned”-ness of your regression testing and replace or accentuate it (where appropriate) with testing that is informed by change risk. Identify what’s changed, what that can affect, and put a decent tester to the job of playing with it. This is obviously somewhat dependent on your development methodology.

As well as logging exceptions could you log state information? Actions that happened before the crash? The resources the program was taking up? Observability is huge in testing, and access to that data could be valuable. What you collect could be informed by what sort of problems that you’re talking about and what’s similar about them. If many problems, after analysis, turn out to be RAM related then you can collect memory data, etc. If your users are manipulating graphics then you could store the mathematical representations of those graphics in a crash file for submission (or just collect them, if you’re the spying type). Look for patterns. As a tester I’d be at peak happiness with a file that described everything a user did and when they did it, and what the system state was at the time - so mentally start from this impossibility and work towards something pragmatic. There are things people leave out of a bug report because they don’t think them important or forgot them (I alt-tabbed out and back in but that shouldn’t matter) - and that’s why video is so cool.

Fuzzing isn’t all about input fields, it’s just about creating randomness. When a tester tries to do the same thing every time they are deliberately avoiding finding anything new - it’s the freedom to play, and act on what we learn, that makes testing exploratory in nature. Repeating oneself should have a good reason to go with it. The factors of a product’s vulnerabilities (the things that, under certain conditions, permit bugs to happen) are varied, and only variety can destroy variety (thanks Ashby). You could fuzz the order that layers are created in an automated test suite or testing tool, for example, or randomise timings between events, or randomise events happening entirely.

Also ask if an non-reproducible bug is worth the effort you’re putting in. Bug investigation is a cost, so fixing bugs is an investment.

Focusing is useful when investigating problems. Simplify the tests, changing one factor at a time, making precise observations in a deterministic way. Return to known states to preserve the integrity of your testing. The idea is to be able to observe and understand the implications of your actions. Tools can help here, as they’re pretty good at repeating themselves in a predictable way. If nothing is found then gradually defocus (change states, change timings, make vague high-level observations - make it difficult for the system to get through your test without falling over in some way).

Hope that helps some, and gives you some jumping off points, but you’re asking a pretty deep question with a ton of answers. Again, don’t worry, it’s just how it is. I’ll follow up on replies where I can.


This response is so well thought out and insightful. I’ve read it several times and learned something with each pass. I was unfamiliar with Requisite Variety. It is applicable to our situation. Thanks for that. I love the idea of a “risk catalogue”. There are definitely areas that are more prone to issues than others. We have added a field in our bug tracker called Area of Effect that we use to give our testers a hint for what could break. We also have a Modules multi-select list that could help here if it were more granular. Lots to chew on here. Thank you so much!

1 Like