What's the trickiest bug you have caught using logs?

ansha_batra · 25 March 2025 08:39

Logs never lie, but UI might.

https://www.linkedin.com/posts/ansha-batra_day1-21days21tips-softwaretesting-activity[…]edium=member_ios&rcm=ACoAADTtaRIButisjZEpS34b7BS3cIP6a8hho1Q

A flaky UI can deceive even the best testers, but logs, network calls, and database queries tell the real story.

When in doubt:
Check the logs – Errors often hide where the UI pretends all is well.
Monitor network requests – Spot failed API calls before they become UI glitches.
Inspect the database – Data mismatches can reveal hidden issues.

komalgc · 25 March 2025 11:45

Its usually the time zone related data mismatch , eventually leading to missing data records in reports

conrad.braam · 25 March 2025 11:48

logs, are just one kind of evidence really. Text logs alone are actually quite primitive, you really want events and performance metrics. Things you can filter on. Then you start being able to zero in on weird behaviours. Events and performance data can however lead you to find a bug you were not looking for, so they are not a cheap debugging tool, but they do work in cases where developers just never added any logging in.

Hardest bug using logs alone? Probably a logging bug which only exhibited when I turned off logging, tracked it down to thread synchronisation that the logger was hiding.

steve.green · 25 March 2025 14:47

I can’t take the credit for this one. It was my project, but I had brought in an extremely technical tester, Neil Hudson, to investigate an intermittent fault in Transport for London’s Cycle Hire system (aka Boris Bikes). It happened about 15 years ago, so I don’t think I’m revealing anything sensitive.

Every day, several users would complain that their account history included journeys they had not made. After dozens of such complaints and after the developers failed to even replicate the fault, let alone find the cause and fix it, TfL came to us. And despite all our exploratory testing expertise and experience, we couldn’t replicate it either. So I brought in Neil.

He spent three days wading through the logs and eventually worked out that if two users logged in within one second of each other, their sessions got combined, in which case a transaction on one account could appear in the history of the other account. Importantly, we could show that no one ever got charged for journeys they had not made.

Having reached that hypothesis, it was relatively easy to create an automation script that could replicate the fault reliably. However, it had to be run on the server environment because the “window of opportunity” was so small that differences in propagation time meant that the fault did not always occur when testing through the user interface.

ansha_batra · 26 March 2025 05:15

@komalgc Time zone related data mismatches can be tricky, especially dealing with global users and reporting systems.
Have you come across any interesting ways to prevent or debug these issues?

ansha_batra · 26 March 2025 05:18

Great point! Logs alone can be limiting and events/perf metrics provide a much clearer picture. That logging bug you mentioned sounds like a nightmare!
Turning off logging to uncover a hidden issue is next level debugging.

ansha_batra · 26 March 2025 05:19

Wow Steve, that’s an incredible story. It’s a great reminder to how some of the trickiest bugs hide in timing and concurrency.

komalgc · 26 March 2025 08:24

Implemented Solution : Always store and process timestamps in UTC. And when it comes to debugging when something is scheduled or queried,…log both input time and the converted UTC or DB time, this helps

han_toan_lim · 27 March 2025 08:57

The strangest bug I found using a log was, that the audit log could be manipulated.

The audit log was created by making an XML output, which was translated to normal words. The audit log contained the XML output at the end of the file.

Using XML injection, I could add non existing transactions to the log.

For more information

steve.green · 27 March 2025 20:59

If you find yourself working on a system that involves time zones, the only sensible solution is to get a new job. Time zones are absolutely mental - there are some really good YouTube videos on the topic, by developers who tried to address all the anomalies that occur with time zone projects.

Topic		Replies	Views
Toughest Bug Hunt 🙋 Questions learning , career-development , bugs	5	86	15 March 2025
30 Days of Agile testing, Day 8: Talk to a developer, rather than creating a ticket 📆 30 Days of Testing 30-days-of-testing , agile	2	2272	18 September 2017
30 Days of Testability Day 2: Finding Actions in Logs 📆 30 Days of Testing 30-days-of-testing , testability	2	910	8 March 2019
What's the weirdest production bug you've discovered and/or helped debug? 🙋 Questions collaboration , bug-reporting , debugging , production	8	374	19 February 2025
Tester Talk Tuesday : Debugging Dilemmas! 🙋 Questions learning , bugs , debugging	7	221	31 January 2024

What's the trickiest bug you have caught using logs?

Related topics