Acceptance criteria without specified time

If you have the following acceptance criteria:
“API call shall take maximum 300 ms (response tid) per call while having 10 pararel users”

Would it be enough for the test to be “passed” if I run a test for 2 minutes with 10 users and the average response time is 243? Should I run the test for a longer period of time? Thanks.

It depends on what kind of testing you are targeting , as only performance testing (performance evaluation) i think this is enough also you can raise the duration to 5 or 10 minutes .But in case you want to execute load testing , stress or endurance testing , the duration should be different.

Hope my answer was clear :slight_smile:


1 Like

Clear and helpful thanks!

1 Like
  1. Hedge my bets: Because there is no spec for the 1st call, which could conceptually take 30 seconds, and subsequent calls being 30ms, a test for this contract needs to go into a class or suite of testing that runs completely separately as well.
    This will give you control over the language/communication with teams; and add ability to add new scenarios that allow a performance dashboard to be created with known baselines to compare against over time and show improvement or “resource costs” over time more clearly.

  2. Time is not the only useful metric, it’s useless if the crypo pegs 10 cpu cores at 80% per transaction, or you forgot to enable TCP checksum hardware offloading for example.

1 Like

I smell that the main concern here is on monitoring rather than simple verification.

“10 parallel users”

  • When? As soon as the service is boot up or after 3 years on usage on the same server?
  • Where? Is hardware status relevant to either situation? Will your app process share resources with other resource-demanding processes?
  • Who? Are third party services involved? Will this investigation stub them out or consider the whole chain of dependencies?

After an initial positive assessment, I would suggest having some monitoring / alerting hooks to keep an eye on how things are - especially if you don’t control or don’t know these variables.

1 Like

When considering the duration there are two factors that I like to take into account.

One is sample size. If you are doing 10 parallel users at an average for 243 ms ~ 4 * 10 = 100 requests / s
100 rps * 120 = 12000 samples. Should be sufficient to be fairly confident in the result.

The second factor is trickier and that is internal job frequency. This requires a lot more information about your system and it’s dimensions. But things that I normally would like to cover are any standing jobs like database is replicated every 10 minutes. Or every hour all the orders are batch archived. Another one is cleanup like garbage collection etc. This is not a timing based but a resource based event but I would like to make sure to measure the average response time with at least one cleanup in it since some system take way longer to respond during these.

As a bonus: If your application / service also have a lot different use cases as in fetching data and adding new, and if it is a critical metric that the business takes decisions on I would suggest to add a mixed traffic model and have an agent measure response times during “normal load” of the system since that may be more in line with the actual user experience.


Thanks @ola.sundin your answer really helped me, useful information! :slight_smile:

You didn’t cover the acceptance criterium here. When a maximum response time of 300 msec is requested, the average response time over any period of time alone is not the answer. There is a decent chance that with a measured average of 2.43 msec at least a number of requests took longer than 300 msec… which would be unacceptable. The average means nothing here if you don’t have the spread of the observations.

1 Like

@hermannijland has reminded me of the time I worked on an app that wrote records, whenever it wrote the 10 000’th record , something would glitch. It’s for this reason that a stress and a performance test case must always look at every transaction minutely. If we have not tested for sizes exactly that lie on and, either side of that amount, we would never have found and compared in a way that flushed out a bug out. Imagine counting grains of sand from a childs bucket, if one grain was actually salt, is the test still passing?

1 Like

What about memory leaks ? They don’t manifest on shorter performance runs. I tend to treat performance tests in 3 categories:

  • What is the load my application / API can take under stress ( Peak moments )
  • What is the behavior of my application / API under longer periods of time ( identify memory leaks )
  • What is the behavior of my application / API when I simulate real user behavior. => real users don’t put your applications under constant stress, they do other things in the mean time. Like answering a post on The Club for example.

The point is that it would not be enough from my point of view to run this in such a short period of time without knowing more about the architecture , purpose of your API. I would need a bit more info in order to answer more accurate but as I said gut feeling … no it’s not enough :slight_smile:

1 Like

This seems like a question for your product owner, or whoever set the acceptance criteria in the first place. IMO, the most important thing we do is force clarification.