Performance Testing Environment: How Are You Managing It?

I’ve started thinking more and more about test environments lately.

Something that always intrigued me while testing was performance testing although I didn’t get to dive too deeply into it. Relating the two, I started to wonder how people manage test environments for performance testing.

As I understand it, a performance environment should trigger a run when code is committed. Then I started to think about how that gets managed in a place where code is committed and deployed pretty frequently.

This, of course, spiralled a lot of other questions for me but I’ll start with the first ones:

How is your performance environment set up? How do you manage it? Who has access to it?

In my experience, doing continuous performance testing in fully integrated environments is very difficult, at least in orgs larger than an early startup.

There are a few reasons for this, but the biggest is that you generally need large and diverse data sets to be able to do good performance testing. Maintaining this (and keeping in sync across the whole system) on a continuous basis takes a lot of engineering and process discipline.

The last consulting gig I did was primarily as a performance engineer, and the stack was a mixture of legacy eCommerce products and modern microservices. Configuration management of the legacy parts was one of our biggest headaches - we’d quite often run two tests believing the configuration was identical (or had only varied in the way we intended), only to find an uncontrolled change had altered the performance profile.

I also find with full system perf tests that there are so many confounding factors that you can’t just automate a red/green result at the end - you need a human in the loop and time spent in analysis.

Having said all this, I’ve seen a few teams successfully doing continuous perf testing at the component level, mocking all of an app/service’s dependencies and running short tests on commit. Quite often they record and plot the key metrics such as latency quantiles over time in order to establish when a new build has degraded performance relative to the previous one.


Whew. Lots to say here, and probably the first time I’ve tried to explain what I’ve been doing comprehensively for the past six months.

The load test / performance environment itself points our tests against our preproduction system. The “load test box” runs Jmeter, and has the application managed by Chef (a hard requirement in our infrastructure). The advantage of having it managed by Chef is that I can make application changes very easily. Upgrading Jmeter recently was done by changing the version number in an attributes file. It also means that multiple or new boxes are a snap.

We put all the load tests into their own repository which gets pulled on the load test box every 15 minutes. There’s a “regression run” of tests that starts every night at 7pm, and is defined through a config file in the repo (we can then update what’s running every night via pull request). Manual testing can be triggered through the command line.

Because the configuration and tests are in repositories, anyone can submit pull requests to add or modify the tests or testing system.

My honey-do for the next while is generating a better command line system (I don’t want to have to remember a bunch of flags and the location of everything), and setting up anomaly detection so that I don’t have to manually compare the results of the testing every morning.