How do you measure the benefit and success of investing in tests and test tooling

(Kim) #1


I hope this is going into the right category.
I am a bit stuck at work. We work on projects and each have different tech stacks, different developers of different levels of experience and different risks.

Some projects are very short and simple, others short and complex and others long term and complex.

I have now been asked to think about “…doing everything we can to ensure client expectations are met especially in terms of test and quality, but at the same time how we measure the success/benefit of our test investment.”

I have seen companies try to do this with bug counts, but considering projects are so different and some have unit tests others do not, the types of discussions projects have vary, I am finding it hard to come up with something measurable across all the projects.

Part of my wanted to throw my toys out of the pram and take the testers on a holiday and that way demonstrate the why we need test, but that will not be beneficial and that side of me never lasts that long anyways. Does anyone have experience in an agency setting with this sort of thing? How have you measured the success of your investment into test?

Thank you!

(Tony) #2

I would start with the support desk…I would want to see what customers are finding that we missed. I would then try to undertake root cause analysis to see where the defect should have been found. In fact that is one metric I use a lot, what stage was the defect introduced vs what stage it was found.

(Kim) #3

So unless we sell a support contract we do not ever see these issues, unless it is one of the very long term projects.
All the short term ones are a delivery and then there is a bit of a delay for the client to find issues but if we close the project I think we re-plan work to pick up any newly found issues.

So this approach to seeing what the customers find will be hard as we do not have visibility. :frowning:

(Jesper) #4

Hi Kim,

A tricky one. I recently came by this list of metrics

Perhaps some of them can be useful, and not easily gamed in your context.

One interesting factor mentioned is how much time is spent testing, As that impacts the number of defects found. A project with defect density of 0 is not better than another with DD5, if the former had less testing time than the later. has some pointers too.

Perhaps you could develop an internal scoring on the projects depending on the testing activities used. +1 for unit test, +3 for 3 amigos etc. The scores could be smileys for all that matter and perhaps even value communication over documentation etc.

If we could just measure the feeling of safety and confidence in the project owner that would be grand

(Kim) #5

Thank you Jesper! Lots of things to try and think about here! I now understand it is a way of our business thinking about testing and how do we define “just enough” testing using a risk based approach.

exactly as you say - if no time is spent to allow for exploration of more tricky or intricate parts of the system and you have no bugs it does not mean it is of high quality.

I like the idea of the scoring system and it could be a way to measure what testing activites we used and if we used them effectively/would time have been better spent elsewhere.

I have lots to read over the weekend! :smiley:

(Gabe Newcomb) #6

I’ve got a few questions for you.

  1. What’s the intended purpose here of measuring the success / benefit of the testing investment?
  2. Is the intention to measure specific testing efforts, approaches, techniques or the overall time spent on testing activities?

For #1, I might push for subjective measurements as much as possible. When so many different contexts are involved (as well as different humans :wink: ), I’m really skeptical about trying to have objective measurements that would actually be useful. Plus, the effort to find objective measurements makes it very easy to pick things that are easy to measure but not necessarily useful at all (consider all of the damn “SMART” goals that many of us need to come up with for personal development plans / reviews.

While I love the idea of having some soft of objective measurements for things like this, I’ve yet to see any that really make sense. For small and specific efforts (#2 above), maybe it’s more possible. Maybe you could put together a mini report of “x people hours spent on this effort, here are the specific bugs or concerns found that wouldn’t have otherwise been found”. As for trying to determine if the investment were worth it, I still think it’s going to be subjective. I have yet to work anywhere where we actually had the financial impact of bugs calculated and shown to us, though I’d love to see that information if it were reasonably easy to get.

So I guess my answer is: find out why these measurements are being requested and whether that concern can be addressed in another way. Second, push for subjective metrics (human responses).

Third, maybe make shit up? :wink:

(Chris) #7

What do you expect a test investment to achieve?

You’d need to know both what you’re trying to do specifically plus that it was your investment in testing that caused it rather than some other factor. Sometimes changing a system can yield results without the change being the entire factor. In one famous case in testing the effects on lighting on the efficacy of office workers the workers were split into two. One was given increased lighting and the other was given decreased lighting. The increased lighting group showed a statistically significant increase in performance… but that performance increase was almost matched by the other group. It wasn’t the lighting alone causing a difference but the fact that attention was being paid to their efforts. If you’re going through with it then you need to know what you’re measuring, how you’re measuring, when you’re measuring (Before and after? When will changes take effect?), but most of all… why.

Proper, rigorous, scientific research that provides values within a reasonable error margin is very, very hard to do and costs thousands of pounds for the smallest studies. A phase 2 clinical trial costs several years and around $20m, and that’s just to check if one drug has any effect at all at doing one thing - not if it’s better than anything else. So you also need to set expectations about what your investigations are capable of achieving. Maybe it’s not going to be pragmatic to expect that you can measure anything that will tell you want you need to know with a degree of accuracy. If your job is to make people feel better about that investment, or give reasons to one management level above you so they can answer difficult questions one level above that, then you can do that without too much cost or effort - if you look for a result you will find it. Collect a bunch or results, cut the negative-looking ones, and put it on a nice template. Knowing the result before you test is a scientific crime, but maybe you don’t actually need solid, reliable results. Ask yourself how accurate and true you want to be versus your desire to get a good outcome. You have a personal interest in a positive result and there’s no need to “prove” when all one needs to do is convince. Come to me with a measurement and I can find 10 ways to make you feel certain of your result and subsequent plan of action and 10 ways to make you doubt that you ever found anything at all.

If you still want something to examine you must consider what is measurable at all. If the goal is to ensure that client expectations are met then you should be sure you have those expectations. If you don’t fully understand and appreciate the expectations of every one of your clients then your goal is technically impossible to measure… or it’s poorly worded. You can’t measure a goal that’s written to be unmeasurable (unless the aim is to fake the results, in which case this is a classic way to pull that off - shoot for vague and you can achieve anything!). You can ask them, of course, before and after the investment, on a 1-5 scale using the exact same question wording and layout, to see if their opinions change. You’ll need to consider that the effect of introducing testers and test process changes will make a system worse before it gets better, with the disruption introduced to teams and working patterns. You’ll need to consider that it will take time to see any effect as the teams reform. You’ll need to consider how the people interested in these measurements will react - and they’ll react to support their own worldview. And after all that cost and effort you may wish you’d known why you bothered…

Consider if this is a political request. You’re given an impossible task - is that so you can lie to them so they can lie to someone else? Maybe they’re going to fire all the testers and want some “evidence” that testing is not a valid practice. I’m not saying that these things are the case, but they are vital contexts that completely change what one might do. If you bring this work to someone they can easily dismiss it as not accurate (for whatever reason) and do whatever they want AND convince themselves they gave it a fair shot because it’s someone else’s fault. The will of people will triumph over data in most situations. Remember that double-blind randomised controlled trials were invented to take personal influence, even subconscious influence, out of the equation as much as possible - the number of “critical defects” in a system can go down by me saying “well it’s technically got a workaround so is it reaaaaaally critical…?” over and over again. Don’t fall into the trap of thinking anyone’s exempt. If you find yourself tempted ask yourself what you’d genuinely do if you found that investing in testing showed no improvement whatsoever in quality, velocity, customer retention, employee retention, support call frequency, team happiness and bug report severity - would you accept the result and become a cabin-dwelling spoon whittler, or question the efficacy of the protocol? Do we decide to downsize our test team, or is it just the way the testers are working that’s causing these numbers? Can we fire a crowd of human beings based on our measurements? If not, what can we do based on our measurements? Whatever we wanted to do anyway? Because when faith in the numbers from an investigation done by someone who knows they are not a professional scientific researcher, and who knows little (enough to be dangerous) about practical metrology and applied inductive methodology (i.e. me), meets real world decisions it’ll be human conscience and consciousness that changes our reality, not some numbers I pulled out of patterns and complexity I don’t understand with meaning that I applied to them out of faith and arrogance rather than the understanding of a whole team of people who do it for a living. I keep in mind the humility required of testing some piece of software that won’t hurt anyone, so I should apply that to numbers that fuel business decisions that might.

So, with that in mind, what do you expect a test investment to achieve?

P.S. There are things that can be measured. I measured our cycle time and tester time when I was reducing a regression suite. The goal was to reduce cycle time and tester time by reducing a regression suite. That’s quite different from the question “is testing worth it?”. I’m trying to offer actionable advice these days, so… maybe ask why they suddenly want measurements for the benefit of the test investment. After all they’re not asking you to measure the benefit of the company’s investment in them. Either they’re making a business decision based on the cost-benefit of testers (e.g. someone important wants to restructure the teams and fire the testers), or who cares and why should you waste your time checking?