Log-Digging - Featured TestSphere Card

(Beren) #1

One hundred cards. One hundred Test-related concepts.
Here on the club, we’ll feature a card from the TestSphere deck for people to write their stories about every month.

I challenge you:
Take a few minutes to think about your experiences with the featured card.

What bugs have you found that are related? Which ones have you missed?
How have you tackled testing for this concept?
What made it difficult or more easy?
What have you learned? What can others learn from your experience?

Take one of those experiences and put it to prose.
Telling your stories is as valuable to yourself as it is to others.

I’m not known to have much patience with logs. Even though I can see much value in all they record, I rather look for patterns than try to exactly understand what they’re saying.
Thankfully, for people like me, there are wonderful tools to help me. The tool I speak of, was my colleague Geert. (@_TestHeader)
He was the Dynatrace guy (among other roles) at my previous project.
This app does deep recording of what goes on with and around your product.

Whenever I saw something really strange I’d ask him to investigate and ‘drill down’. He didn’t necessarily have to scroll through logs, but he’d look for obvious problems at the time I saw the strange behaviour.
More often than not, he’d supplement my bug reports with important information on the lowest level possible for the developers to better understand what exactly had gone wrong.

Separately, our information might not have meant much. But together, they provided our developers with a good case why to fix the issue, how it was triggered and a start where to look for a solution.

What’s your story?

(Brian) #2

Once, some time ago, I had just started playing with our system logs. At the time, there were two levels of logging, “info” and “debug”. In one test, back when I started playing with the system logs, my system had a catastrophic failure. (As it was hardware and software, think smoke and extensive repairs)

Being a glutton for punishment, I pulled an “identical” system from one of the other testers to repeat the failure. I have it in my head that if a problem can be repeated, it would be easier to fix. So I run what I think was the exact same test, brace myself for the boom, and… Nothing odd happened. The system appeared to be happy as a clam.

Now I’m confused. Two systems which are supposedly identical in hardware and software react to a supposedly identical situation in two different ways. How can that happen? Since I had started playing with the logs, I set out to repeat the test again, this time checking the logs (and having a hand on the power off button to not cause more smoke-and-repairs, it’s not fun having everyone looking at you and asking about the foul stench!). What I found was, our two systems were, in fact, different. Mine had logging set to debug and his had logging set to info. So my system was too slow to react to a necessary change, which snowballed to a huge failure.

The lesson here is that, while logs are awesome tools, sometimes the existence of a log can change the result of your test.