What is an appropriate response to "why have we only found this bug now?"

It’s something I’ve heard time and time again and also recently asked on our Slack.

How would you respond?

If I had a penny for every time I heard this phrase… Or said it. Its one of the first questions we ask as part of a post-mortem if there’s a bug in production that caused an outage. The question gets asked with the understanding that we want to prevent the bug in the future, and not find where to place blame. The conversation gets a lot easier with that understanding - it may have been a gap in testing, code review, process - doesn’t matter, let’s add some more testing/safeguards to prevent it.

Common responses that I’ve said/heard:

  • It wasn’t a scenario/use-case we tested
  • you’re not supposed to use it that way
  • There’s no unit/automated tests around it
  • There’s tests but they didn’t catch this one edge case
  • There’s tons of tests, but the tests were wrong
  • No customer ever reported it
2 Likes

Something on the lines of “We prioritized finding other bugs instead of this.”

For the extended answer:

Given that we have a limited amount of time available in order for this business to be as successful as possible. We first try to find bugs in the most business critical and system critical areas. Then we work our way towards less and less critical areas until we get to a point where we deem this company spending more time in testing is more expensive than not finding the potential bugs in those areas. All this so we can ship the product as soon as possible and start to get revenue from the value that we have added in the product. This process has the drawback that it will occasionally slip bugs and that is by design since we want to be smart about how we spend our money and that is a risk we are willing to take.

Obviously this is only applicable if this was the case. If us slipping something due to a mess up by test, then the answer should be “That is our mistake and this is what we have changed in order for it not to happen again, this is what we have learned.”

1 Like

In some of my previous testing roles, I have been lucky enough not to hear this question.

Instead, they would ask, “What can we do to prevent this type of issue in the future?”

That, to me, is a much more constructive question.

It implies that somewhere, a mistake was made, but doesn’t assign or ask for blame. The word “we” implies that the whole team should be involved with the solution, and not just a member of the team.

So when the first question (why) is asked, I often respond with the answer to the second (how to prevent).

1 Like

The real answer for me is “it depends”.

I’ve been asked this a few times, and my answers have included “it’s a really rare scenario and we’re just unlucky to have hit it now”, “sorry, I should have thought of that, let me update my regression notes”, “it’s actually not possible to reproduce without a large load, and this application can’t be load tested without a lot of programming work”, and even, “this is how the application was designed to work. Maybe we need to rethink that.”

Usually, like @brian_seg says, the next thing is “What can we all do to prevent this kind of problem slipping through in the future?” In my case it has to be a group effort because I’m the only tester in the building and it’s not possible for any of us to completely test the whole application.

1 Like

I can think of 2 main reasons why a bug might not have been found until now.

  1. The steps to reproduce are complicated or specific.
  2. We weren’t given enough time and resources to test the application enough.

Ultimately, we should not be blaming each other for not finding a bug. Instead, we should be looking at if there is anything we can do to improve so fewer bugs are released to live.

For complicated/specific steps to reproduce, the experience of the tester may be lacking. The only thing we can do here is give the tester the opportunity to improve and gain more experience. We may also look at if there is an over reliance on scripted tested. We could set aside more time for exploratory testing. The diversity of the testers may also be an issue. Different testers, different mind sets, different ways of using the application.

For there not being enough time or resources to run the tests, we shouldn’t use this as an excuse to blame upper management. There is only so much help they can give us. We should make it clear what hasn’t been tested from the beginning so they can make an informed decision whether to release or not. If they haven’t been given all the information, they might not make the right decision. We should also look at ways to optimise our tests. If we know there is not going to be enough time, we should adapt the test plan accordingly so we still cover the most important areas of the application. We could also look at shift left/ shift right methods.

3 Likes

Possible causes:

“well, this comes up after a most recent fix/change in this area just deployed in the last hour…”

Or

“There was some hard coded logic only results in bug after yesterday :scream:

You can read my article from Software Test & Quality Assurance “Tips on Improving Test Coverage”: http://xndev.com/articles/stqa-2010-08.pdf

Appropriate and polite?