You’re testing a login functionality, and occasionally, users face an error after submitting their credentials. However, the bug appears intermittently, and you’re unable to reproduce it consistently.
Question:
What are such bugs called, and how would you approach identifying and resolving them?
What strategies can you use to ensure these types of issues don’t get overlooked during testing?
I would record my screen. At some point it will occur and then you can backtrace what’s happening.
You can also sit together with the developers to debug it together, maybe they can see something specific in the logs. Maybe a service or job is running which might interact with the login system.
I think it’s not an actual problem he has, he’s doing the 30 days of exploratory testing (I could be wrong)
Intermittent bugs, just as you called them in your post.
They are also called “heisenbugs” (spelled “HeisenBugs” or “Heisen Bugs”), from Heisenberg Uncertainty Principle.
You can also call them “hard to reproduce bugs” (although some bugs are hard to reproduce because they are just so much work to set up the environment right) or “flaky bugs” (but “flaky” usually refers to tests, and often there’s an implicit assumption there’s a bug in automation and not a product).
Just be alert and look out for them. If you think something is wrong and you have time, try to reproduce. If you don’t have time right now, write down a note. Get back to reproducing when you have more time. Add more information to your note as you happen to encounter similar issue when doing other things.
Usually when working with issues like that I have two goals: finding reliable ways of reproducing them, and getting an estimate of how common they actually are.
In my experience, intermittent issues in software are relatively rare. But they are more memorable, also because they are more challenging than many other issues.
how would you approach identifying and resolving them?
First of all I need to confirm my observation. It may not be that the problem is intermittent, but that my observations are incorrect or inconsistent. I’m looking to see that certain behaviour can trigger an intermittent effect and that there’s at least some integrity in that inference. I’d begin by examining my basic assumptions and seeing if I’ve made some obvious mistake. Observed problems are via fallible oracles, which are necessarily incomplete and may be partly or completely wrong.
From there, in this case, I’d probably start with live oracles. I’d ask the people around me if they have any ideas. Developers have a wealth of knowledge about the things they build, and may have a simple answer for me. It may be due to incorrect setups. Replicating the issue from different software or hardware may help establish this as a local or global problem. Consider the systems at play, and their interactions.
Then I suppose it’s on to tracing the event, seeing where it occurs and what the error actually is. What is throwing the error, and why? Good logging is invaluable here so we can see the timings, and look for patterns, and compare those patterns to future logs. Good controllability is invaluable to adjust timings and control state more accurately. It could be that we could use progressive mismatch to remove or simulate elements one by one to see if the problem resolves, and that might be one way to see what elements or combination of elements are required to cause the problem. If there are many examples we can look at statistical analysis and look for variance and correlations for clues.
What strategies can you use to ensure these types of issues don’t get overlooked during testing?
Multiple, varied heuristic approaches and techniques, volume/load/stress testing, complex and randomised inputs to reach unusual states, properly observe the product used by users and have errors and warnings be accessible. Consideration of the internals or interface of elements in a system can also prevent problems by exploring risks in them so they don’t surprise us later when functionality that relies on those elements fails in some way.