Too Many Bugs in Production - What Are We Going to Do? with Melissa Fisher

For our third talk of TestBash Home 2021, @melissafisher takes to the stage to explore how we can better understand our production environments and learn from the bugs we find there.

We’ll use this Club thread to share resources mentioned during the talk and answer any questions we don’t get to during the live session.


Questions we didn’t get to

  1. Matthew Churcher > Glastonbury sells out every year and makes huge profits despite all their IT issues. How do well they or we would succeed more with higher quality investment?

  2. Roman Segador > What is your definition of a bug? When something stops to be work in progress and starts being a bug?

  3. Kashyap Mehta > As a Manual tester, how do you do root cause analyse of a bug if you don’t have enough coding knowledge?

  4. Atiqah Azlan > How often or when you usually review regression test suite

  5. Risko Ruus > When you started testing on feature branches with the devs, how much did that testing differ from the testing you did once the branch got merged into master?

  6. Steven Devonport > Great talk Melissa, how do you balance the activity of preventing bugs versus delivering into production as soon as possible?

  7. Jonathan Hope > Would you create a branch for each defect reported in production?

  8. Conrad Braam > Bugs, is “Priority” versus “Severity/Impact”, for managing bugs better? Or if you have to spend time prioritizing, have you maybe just got too many open bugs?

  9. Paul Naranja > Can you give more details on the workshop “Where your bugs come from”? How did you show that many root causes of bugs are before the actual “testing stage”?

  10. Ellie Lock > At what stage do you start considering whether there is a problem in the development (coding) process as opposed to just QA process?

  11. Matthew Parker > Can a bug become waste?

  12. Olly Fairhall > Did you have to change the culture single-handedly when getting bugs in production fixed or was the team happy to change?

  13. Ben Dowen > What is the best bug that was found, but had insufficient information so was left until it was way too late?

  14. Becca Batchelor > How do you convince your business to pay attention to bugs when they dont cause $$ to come in such as new work for new customers?

  15. Kashyap Mehta > I noticed that when bugs been reported in production, they are rarely included to test suite once reported. How do we encourage QA team to add such edge cases?

  16. Louise Barnes > What testing improvements you implemented, didn’t work?

  17. Penny Howard > Follow up: What bug metrics are you using?

  18. Simon Rigler > Do you have a process for managing “urgent” Prod bugs that have come up when sprints have already started etc?

  19. Julia Shonka > Considering “how many are too many bugs”, don’t you have to also take into account what kind of application it is? Will the bug endanger human life?

Resources mentioned

Flashback Express Recorder FlashBack Express - the best free screen recorder

Club thread How do you prioritise bugs?

1 Like

I’m going to get ON and tackle these questions for the next 30 minutes! We all have our own experiences, perspectives and knowledge to share, so come join on this thread. I invite you to answer the questions or bring in your perspective to this problem.

A great question! Yes, unlikely to affect Glastonbury. That festival has a reputation that goes beyond everything. EPIC festival. Highly recommend going if you ever get a chance. Who it might affect is the live streaming company that provided the service. It was a company called drift live. That live streaming company is affected in terms of reputation. It was ALL over the news. I’m sure they will be able to bounce back, however, if I was looking for a live streaming service, I’d probably look for other options and likely go elsewhere. Glastonbury live stream technical problems overshadow star-packed show | Ents & Arts News | Sky News

I see a bug basically a result of a problem that wasn’t thought about. So, for example, you have a form, you submit it, and it goes through. However, there was a problem that wasn’t discovered here. There were mandatory fields there that the user had to fill out. The user didn’t fill them out because they weren’t forced to fill them out. What would be your definition of a bug @romansegador ? For the second question, probably when you start implementing. Often we don’t think enough about what we’re doing before we start building. However, would like to know your perspective and others on this?

Great question, Kashyap. With the cause and effect technique. What you need to do is understand the consequences of what you are building. So, if you touch this area of the code, the result could be that it affects another area of the product. I would ask the question “what introduced this?” to your team. Was it a result of new feature work? Did they fix a bug, that then caused a knock on effect? I’d also suggest that you ask developers to talk you through the code and what’s happening. They can really help you. Be that Question Asker and dig a bit deeper. Team members are in it together!

All the time! I often ask the question, are they adding value? What have I recently learnt? Do I need to adjust the regression suite? Do you have any view on this @atiqah

Great question @risko Testing on branches was a deep dive to prevent as many issues as possible. Testing on master was more of a light touch, but I’d explore any integration issues with other work that was going on.

Hey @steve_devonport Glad you could make the talk! This is a great question and I’m not sure I have the complete answer. I know it’s all a balance. However, I do fundamentally believe that this whole ‘deliver fast’ approach causes us problems. We are always rushing. Not slowing down. As a result, we are introducing too many bugs. Perhaps the answer is to move away from fast delivery and towards thoughtful-delivery. More deep thinking is required. What do you think? What’s your experience?

Hey @jonathan.hope1 Great question! No I wouldn’t. Also, it depends what it is. Perhaps you have a number of defects in the same area, so you could create a branch and dix them all there. I am curious why you have asked the question. Is branching strategy something you are exploring at work?

@conrad.braam !!! SO glad you could make the talk. Do you think the severity/impact could feed into the priority? Yes, I’d totally agree on your second point. You probably do have too many open bugs with prioritisation. However, what I’d argue is that what you touched on is important. The impact to the customer. If we fix this, does it add value to the customer? If it doesn’t add value to the customer, I’d question spending time over it. Do you have any further thoughts on this?

1 Like

Hey @melissafisher we recently introduced feature branches to our process. So for example we would create a branch to develop and test a “file upload” feature. And once dev and testing and automation (unit+e2e) tasks are complete, and all issues are resolved we would merge to develop. Basically just allowing us to keep develop branch clean so we can release often. However we have discussed whether we should create branches for customer reported defects. I like your suggestion of grouping defects that occur in the same area and fixing them on the one branch, perhaps grouping them via triage. That is worth exploring, so I will suggest this to the team, thanks for the tip!

1 Like
  1. We can try pareto 80-20 rule to get the 20% of root cause that causes 80% disruption and concentrate to mitigate the identified 20
  2. We can try to group the bugs. Again it points to root cause
  3. Root cause analysis should be done jointly with dev, test, data (if any) and environment team
  4. Understand the priority/criticality from Business and focus based on priority
  5. Run triage call
  6. Run a WAR room if things look bad
1 Like

Nothing to add. I think you did go where I was keen to . Only fix bugs that impact the most customers, so know your customer, gauge the impact and fix those first, then look at the remaining bugs.
P2=workaround is possible so we can just document this for now and do P1’s in this hotfix
P3=stuff we should fix
P4=stuff we are unlikely to fix, but impacts the quality perception of the product
P5=stuff we wont fix in the next few sprints - so make sure people agree, this will probably NOT get fixed in reality.

It’s well known that Jira tries to route you to use priority and not to use impact, be aware of the bias it will bring when triaging work.

Hey @melissafisher! I can’t see this talk on the list of TestBash 2021 recordings… will the recording be available soon?

1 Like

Definitely missing I’m sure one of the minions will sort it shortly. @friendlytester may have run out of hard drive space?

1 Like

Hi Simon, I’m sure it will be available soon. Great that you want to watch it! Let me know what you think.

Hi Paul, It was a number of years ago I did the workshop. I think I remember asking the team where in the cycle was the root cause, then we added sticky notes and grouped the bugs under there, like requirements, design, build etc. Lisa and Janet’s agile testing condensed book has a number of good diagrams in there that can help with this. You could, for example, draw out the SDLC and get the team to put the bugs against the area.

Hey @elliel Do you have an example of a problem you have come across that has been hard to distinguish? If so, what was it?

Hey Matthew! I can’t find your name when I try and @ you. Do you mean the bug has no purpose? I’d be interested to hear your thought further.