Test data management: I assume not every type of software needs it?

So lately I’ve been researching on test data management and while the research is continuing, I’m realizing its on of those topics that are heavily context dependent. (I guess all of them are)
The way I see it, test data management involves usage of large scale data sets that are taken from production databases followed by masking or “dedupe” for confidentiality and then subsequent processing is done. The last bit could be either to test or to debug a user reported issue.
In software where engineers require large data sets that take considerable amount of time to produce, such strategies are necessary. But for a tech startup like software where the scale is not even 100 active users? Would it be worth the time, effort and money?

The only test data management there could be is creation and deletion of data when an automation suite runs…


I don’t know. Do you think you can generalize?

What’s the business model? what’s the target of the company? who owns/manages the data? which part is in the company’s control? who are the users and what’s important to them? what regulations are required to be respected? where does the data come from(built by the system, imported)?..

1 Like

I’m just trying to figure out if effort in test data management depends on the number of users or scale of the product.

What does the run of an (E2E? API? something lese?) automation changes to the test data management? How does the source of the data (its creation) changes the management? I think you are mixing this (it also might not always be that easy to differ them).

You can store (and manage) data from a previous version, insert it into the database and the run tests on that. Surely a migration of this data should be done when the data changes.
Maybe there is public data next to the upcoming prod data, but in a different format which you “just” have to transform.
I’m not sure what you include in “automation suite”, but creating artificial test data problematically might also a way to go. While production data might comparable easy available in big numbers, it is seldom guaranteed to cover specific cases. When you need cover specific cases at the very last you have to create the data by your own.

My point: Any data could somehow be managed.
To me it sounds you are more interested in the sources of the test data. Is that true?

Like most things, consider what problem you are trying to solve, will test data management help solve that problem, if so is it worth the effort to do so and find a balance.


I used to work with fund performance calculations, often needing multiple years of data in order to do real time calculations. This is a data heavy product with high data based risk so we did a lot of test data management. In some cases this merited as close to real data as possible so masking real data was at times leveraged. At times the creation of test data was a case of selecting a database dump and loading that into the test environment, we had loads of dumps depending on the tests.

Similarly customer issue replication can benefit from cloning and masking real data - how easy this will be will vary alongside its potential value.

Now on other products some of them have very little data, maybe a user profile and history and that’s it. It may be the data is of low risk or can be simply created by a few lines of script so perhaps no obvious test data management required at all.

Like almost everything in testing there is no one best practice and you pick and choose what’s of value to your own context every single time.

I think I was looking into large scale test data management. The source of data depends, as mentioned by @andrewkelly2555, on how data heavy the product is.
I can now conclude that for the scale of my product, a full blown TDM solution is not needed.