A quick poll on regression testing practice

We recently initiated a survey on regression testing practice, of 4 questions. The results are in the infographic below.

Whats your take on it? The survey is still on and we would be happy if you would take it (I will update the results) - http://bit.ly/2k1q3HD

Thanks!

2 Likes

Going for full scientific-peer-review-style honesty in the pursuit of being helpful, hope you’re strapped in!

I can’t answer it.

  1. I don’t know what “testing effort” is, or how I might find out. Therefore I can’t measure a percentage of it. I looked up “effort” and the one that seemed to fit was “strenuous physical or mental exertion”. Between cycling to an office and being involved with some of the most difficult thought work in the industry most of my days in testing were made of effort.
  2. I don’t know what differentiates regression test effort from what I will call “non-regression test effort”. If I perform testing under the presumption that I am not looking for a backslide in perceived quality, and I find a backslide in perceived quality, is that non-regression test effort with a regression or regression test effort by accident? Is a test that is more likely to find a problem, or more likely to find problems that are important, or less likely to yield false beliefs considered as much effort as a bad test? If so, then shouldn’t we consider the quality of the tests and the testers to make the effort numbers valid? Are “regression tests” only considered regression test effort if executed at a specific time or at a specific event where we are specifically looking for a backslide in quality, and if so do we consider the discovery, investigation and reporting of new problems as regression test effort. If not, how do we calculate how to exclude it?
  3. I don’t know how may test cases I use. One test case can easily be totally different from another in terms of topic, size, who runs it, quality, importance, etc. The number seems worthless to measure, so I don’t have the fraction of that number.
  4. An explicit, automatic check is not directly comparable to a human-executed explicit test case.
  5. If a human-executed explicit test case is written down and run by a human so that it can be run again then it’s by definition a regression test case. If an explicit automated check (“automated test case”) is coded so that it can be run more than once then it is also by definition a regression test case. So the percentage of regression test cases that are automated is the same as the percentage of any test cases that are automated. We could ask the question: what could a non-regression explicit test case artifact possibly look like?
  6. If a regression test yield is based on the number I find and on the number of cases I execute that assumes a relationship that may not exist. If I find a problem while not executing a test case (most problems I find) then I have a finding based on no test cases - and I can’t divide by zero. If I find a problem while executing a test case that is not associated with that test case (e.g. I run a login test and find that the program crashes even though the case doesn’t look for it) then I have a problem but it’s not because I have test cases. So it seems to me that regression test yield isn’t a useful or reliable measurement.
  7. I don’t know why you’re asking or what you’ll use the answers for. I might be involved in something I don’t agree with, like certification sales or misleading tool vendor marketing.

I have some feedback for your infographic:

  1. It’s crazy small and hard to read. I also have reading difficulties and I’m struggling with they dark grey font on darker blue background. If I can’t read the numbers then I can’t evaluate the validity of the data.
  2. The phrase “possibly close to half the test effort”, even if we ignore my previous statements, doesn’t mean a lot. It refers to anything that could (and therefore might not) be approximately 50% of test effort.
  3. “Only a quarter of folks” means that you are confident that your sampling method and sample size are sufficient to draw a statistically significant conclusion from your findings. A good report generally will include a p-value calculation and methodology.
  4. It’s not clear how you’ve drawn the conclusion in your titles from the data. Low yielding human effort is expensive, but I don’t know how you’ve come to that idea. You’ve also stated that a only fifth of regression test cases find issues but that would involve insane extrapolation of your data. Also, it’s not clear to me that a case that finds no issues has no value. Discovering that a problem doesn’t exist is as informative as finding that problem does exist. If finding problems was the only goal then we should write more bugs!

I hope this critical appraisal is helpful!

6 Likes

Hello Chris,

My responses are inline. Deeply appreciate your critical comments. Thank you very much Sir!

Going for full scientific-peer-review-style honesty in the pursuit of being helpful, hope you’re strapped in! [This was a simple and quick survey was a precursor to a webinar we organized on ‘smart regression’.]
I can’t answer it.

  1. I don’t know what “testing effort” is, or how I might find out. Therefore I can’t measure a percentage of it. I looked up “effort” and the one that seemed to fit was “strenuous physical or mental exertion”. Between cycling to an office and being involved with some of the most difficult thought work in the industry most of my days in testing were made of effort. [That’s good one :smile: My boss is an avid cyclist. I’m sure he would appreciate your effort more than me]

  2. I don’t know what differentiates regression test effort from what I will call “non-regression test effort”. If I perform testing under the presumption that I am not looking for a backslide in perceived quality, and I find a backslide in perceived quality, is that non-regression test effort with a regression or regression test effort by accident? Is a test that is more likely to find a problem, or more likely to find problems that are important, or less likely to yield false beliefs considered as much effort as a bad test? If so, then shouldn’t we consider the quality of the tests and the testers to make the effort numbers valid? Are “regression tests” only considered regression test effort if executed at a specific time or at a specific event where we are specifically looking for a backslide in quality, and if so do we consider the discovery, investigation and reporting of new problems as regression test effort. If not, how do we calculate how to exclude it?[Do resonate with your thought. With many product organisations that we work them, there is the nagging thought how much we are validating the ‘backslide’ versus the new ones (say features) that we are adding. So from a business context, could we spend less on the past and more on the newer one for the future ]

  3. I don’t know how may test cases I use. One test case can easily be totally different from another in terms of topic, size, who runs it, quality, importance, etc. The number seems worthless to measure, so I don’t have the fraction of that number. [I understand your concern of number of test cases. Answers about Test cases also can be ‘it-depends’ because of so many complexities, a metric must be tracked to give decision making some sane logic.What we see in practice with our customers is that they have a ‘count’ of test cases and they are keen to reduce the effort of execution by automating or executing as least as possible when they are not automated.]

  4. An explicit, automatic check is not directly comparable to a human-executed explicit test case. [ Absolutely agree. Automated test is more treated as a health check. Agree smart humans are no match to the typical current practice of automated tests that we see here are often]

  5. If a human-executed explicit test case is written down and run by a human so that it can be run again then it’s by definition a regression test case. If an explicit automated check (“automated test case”) is coded so that it can be run more than once then it is also by definition a regression test case. So the percentage of regression test cases that are automated is the same as the percentage of any test cases that are automated. We could ask the question: what could a non-regression explicit test case artifact possibly look like? [ What we have observed with our customers is that they have identified test cases that they run as a “standard regression pack” to give them a sense of “health-ok”. This is what we meant as #regression test cases as a fraction of total ]

  6. If a regression test yield is based on the number I find and on the number of cases I execute that assumes a relationship that may not exist. If I find a problem while not executing a test case (most problems I find) then I have a finding based on no test cases - and I can’t divide by zero. If I find a problem while executing a test case that is not associated with that test case (e.g. I run a login test and find that the program crashes even though the case doesn’t look for it) then I have a problem but it’s not because I have test cases. So it seems to me that regression test yield isn’t a useful or reliable measurement. [The line of thinking here is “as systems harden”, bugs may not be there in this hardened areas. How can I use this information to optimise the test? ]

  7. I don’t know why you’re asking or what you’ll use the answers for. I might be involved in something I don’t agree with, like certification sales or misleading tool vendor marketing. [Well no devious intentions I promise ! Just a quick survey at a high level to get a feel of the practice at large. In engagements with our customers, we notice the challenge of regression from a business view point and we were keen to understand the state of this at large. BTW this was a pre-cursor for a free webinar titled “Is regression hindering your progression”. Apart from being shared in social media, like this forum, the information was not shared anywhere else. Deeply appreciate your detailed comments.]
    I have some feedback for your infographic:

  8. It’s crazy small and hard to read. I also have reading difficulties and I’m struggling with they dark grey font on darker blue background. If I can’t read the numbers then I can’t evaluate the validity of the data. [Sorry, we used a standard template from the infographic software]

  9. The phrase “possibly close to half the test effort”, even if we ignore my previous statements, doesn’t mean a lot. It refers to anything that could (and therefore might not) be approximately 50% of test effort.

  10. “Only a quarter of folks” means that you are confident that your sampling method and sample size are sufficient to draw a statistically significant conclusion from your findings. A good report generally will include a p-value calculation and methodology. [ Appreciate your comment, this was a quick and simple survey ]

  11. It’s not clear how you’ve drawn the conclusion in your titles from the data. Low yielding human effort is expensive, but I don’t know how you’ve come to that idea. You’ve also stated that a only fifth of regression test cases find issues but that would involve insane extrapolation of your data. Also, it’s not clear to me that a case that finds no issues has no value. Discovering that a problem doesn’t exist is as informative as finding that problem does exist. If finding problems was the only goal then we should write more bugs! [The engineering management folks view human effort as premium and always looking forward to optimise. Hence could we optimise by not doing tests (especially human-executed )that can be avoided as those parts have hardened? ]

I hope this critical appraisal is helpful!

Totally agree with you

Hi Ravi,

I understand the business context from which these survey questions are posed. Your intention is to increase regression effectiveness by ‘more bugs’ per suite, and reduce regression ‘waste’ by avoiding tests that won’t ‘yield’ ‘bugs’.

As you might have already observed in your experience, regression is not a ‘set’ set of ‘tests’ that you can execute and compare efficiencies. Software under development keeps evolving, and hence your regression ‘tests’ should keep pace with that too. More so with continuous integration and deployment. If you take that into consideration, every regression run could involve different set of test cases, which needs be intelligently selected and overseen by humans (I have to explicitly say that!)

I would say it would make sense to pick your regression ‘areas’ or ‘themes’ for a regression run (you can think of it as parallels to ‘modules’ in software code), and focus on those as you deemed fit. Of course, how large or how small your ‘area’ is is up to the context.

Finally, the survey sounds too ‘mechanical’, more inclined to a rigid environment of execution of regression ‘tests’, while software regression is far from it. It involves a lot of human ingenuity in thinking, observing, adapting, and orienting the ‘tests’.

Hope that helps. Feel free to message me to discuss your company’s regression situation if you need help and if you would like.

1 Like