Unit tests vs generative, integration thread thoughts

I was served this tweet thread the other day and found it interesting, but Iā€™m not sure I know exactly what the poster means by some of it:
https://twitter.com/bobpoekert/status/1495464664561451009?s=21
Questions:

  1. have you ever seen the term ā€˜generative testingā€™ before. If yes, what do you understand it to mean?
  2. ā€˜really what you want is to sample from a statistical estimator fit to real world log data but afaik nobody does this.ā€™ I take this to mean, you should track the real use of your app and build your tests from that info, but ā€˜statistical estimatorā€™ - ??? No clue on that piece, you have any thoughts?

I was struck by the thread because I also have perceived on some of my teams a gap between ā€˜the tests that are createdā€™ and ā€˜the tests that I want to maintain because I think they are highly valuable.ā€™

6 Likes

Generative testing is also known as property-based testing. I havenā€™t used it myself, but Iā€™m interested in doing it at some point. More info: Property-based or generative testing - insights from European Testing Conference - Agile Testing with Lisa Crispin. If you prefer a video, this explains a particular tool for this kind of testing called QuickCheck Code Checking Automation - Computerphile - YouTube

TL;DR: instead of specifying all the details of each test case, you specify one or more properties that must always be true at the end of a test case, and maybe some constraints on the inputs (e.g. input string X must have at least 1 character) then the test framework repeatedly: generates its own inputs, runs the test, and sees if the properties are true.

As to the statistical estimator bit - I think that itā€™s making the generated test data have a similar distribution of values to those experienced in production. I.e. if the input has a field X which can be an integer, and in production a quarter of inputs have X > 3000, then the generated test data inputs should also have X > 3000 a quarter of the time.

Iā€™m not sure I agree with this (but then Iā€™m no expert in this kind of testing). Knowing that your test data matches historical production usage is good, but it wonā€™t help you identify bugs that users havenā€™t stumbled on yet. So Iā€™d suggest that thereā€™s benefit in having all possibilities equally likely, at least for some test runs. (This is so that you can generate values that havenā€™t been encountered yet in production, because otherwise these could have 0% probability and so never get generated.)

4 Likes

Thanks, Bob! That is very helpful.

2 Likes

I think generative/property-based testing can be beneficial, but the original Twitter poster seems to think that itā€™s an either/or. Traditional unit tests are the first line of defense, and you add other layers on top of it, which can include generative/property-based testing.

As for pulling data from prod to make the generative testing more reflective of the real world, that doesnā€™t seem like a great idea. Iā€™d rather define my input/output criteria based on what I expect from the method, not from what I see in prod. Thereā€™s also the issue that youā€™re several degrees removed from unit tests once youā€™ve gotten to prod/functional test data, and mapping that back to unit tests seems cumbersome at best.

The big argument for property based testing is that they can find new bugs, where as unit tests verify existing behavior. I think in general, if you find a new bug via property based testing, youā€™re likely going to add a unit test for that.

3 Likes

We are talking data driven testing but at the complexity level where simple domain analysis is not going to find the defect when the code itself is actually complex. Which is probably an argument for using production data as an input, but I agree with @bobs , production data will probably not find really costly or fatal bugs that only 2% of your customers may ever encounter. What if that 2% of your customers who hit it happen to be the ones who send you the most cash for their single instance license? Most of us will hard code some basic regression test input ā€œsetsā€, and to me that is always a red flag. Often we only test a small subset of the real boundaries as they change over time, and because the code you are testing is waaaay ā€œsmarterā€ than you are (itā€™s hiding this bug from you, so it has to be) dumb arrays of data are costing time.

Have only ever seen this done in a security testing suite, where the adversarial data sets were generated using a small ā€œset of rulesā€ and then randomly thrown at a component. Would love to see a Domain Specific Language perhaps, that helps the tester find good (evil) input candidates.

1 Like

Oh, your data driven/sampling comments reminded me that I meant to say that depending on your tolerance for ā€œbugsā€ in production, thereā€™s probably more bang for the buck in increasing the observability and alerting of your system than trying to wire up property based testing and using stats/prod data to find good ways to bound the properties . . .

2 Likes