What's the trickiest bug you have caught using logs?

Logs never lie, but UI might.

https://www.linkedin.com/posts/ansha-batra_day1-21days21tips-softwaretesting-activity[…]edium=member_ios&rcm=ACoAADTtaRIButisjZEpS34b7BS3cIP6a8hho1Q

A flaky UI can deceive even the best testers, but logs, network calls, and database queries tell the real story.

When in doubt:
Check the logs – Errors often hide where the UI pretends all is well.
Monitor network requests – Spot failed API calls before they become UI glitches.
Inspect the database – Data mismatches can reveal hidden issues.

2 Likes

Its usually the time zone related data mismatch , eventually leading to missing data records in reports

3 Likes

logs, are just one kind of evidence really. Text logs alone are actually quite primitive, you really want events and performance metrics. Things you can filter on. Then you start being able to zero in on weird behaviours. Events and performance data can however lead you to find a bug you were not looking for, so they are not a cheap debugging tool, but they do work in cases where developers just never added any logging in.

Hardest bug using logs alone? Probably a logging bug which only exhibited when I turned off logging, tracked it down to thread synchronisation that the logger was hiding.

2 Likes

I can’t take the credit for this one. It was my project, but I had brought in an extremely technical tester, Neil Hudson, to investigate an intermittent fault in Transport for London’s Cycle Hire system (aka Boris Bikes). It happened about 15 years ago, so I don’t think I’m revealing anything sensitive.

Every day, several users would complain that their account history included journeys they had not made. After dozens of such complaints and after the developers failed to even replicate the fault, let alone find the cause and fix it, TfL came to us. And despite all our exploratory testing expertise and experience, we couldn’t replicate it either. So I brought in Neil.

He spent three days wading through the logs and eventually worked out that if two users logged in within one second of each other, their sessions got combined, in which case a transaction on one account could appear in the history of the other account. Importantly, we could show that no one ever got charged for journeys they had not made.

Having reached that hypothesis, it was relatively easy to create an automation script that could replicate the fault reliably. However, it had to be run on the server environment because the “window of opportunity” was so small that differences in propagation time meant that the fault did not always occur when testing through the user interface.

5 Likes

@komalgc Time zone related data mismatches can be tricky, especially dealing with global users and reporting systems.
Have you come across any interesting ways to prevent or debug these issues?

Great point! Logs alone can be limiting and events/perf metrics provide a much clearer picture. That logging bug you mentioned sounds like a nightmare!
Turning off logging to uncover a hidden issue is next level debugging.

Wow Steve, that’s an incredible story. It’s a great reminder to how some of the trickiest bugs hide in timing and concurrency.

Implemented Solution : Always store and process timestamps in UTC. And when it comes to debugging when something is scheduled or queried,…log both input time and the converted UTC or DB time, this helps

2 Likes

The strangest bug I found using a log was, that the audit log could be manipulated.

The audit log was created by making an XML output, which was translated to normal words. The audit log contained the XML output at the end of the file.

Using XML injection, I could add non existing transactions to the log.

For more information

3 Likes

If you find yourself working on a system that involves time zones, the only sensible solution is to get a new job. Time zones are absolutely mental - there are some really good YouTube videos on the topic, by developers who tried to address all the anomalies that occur with time zone projects.

1 Like