What is the best approaches to handle data in Test Automation?

For the handling the Data in the test automation script, I’m aware about the two approaches

  • Creating a data generation scripts,

  • Predefine the data and direct use that in script

  • In the first approach, We are developing the scripts, that will run before the test and generate the data

  • In Second approach, We manually create the data like User or some configuration for user and then use that data directly into the scripts

So In the long run of the projects, What are the best approaches to handle the data that will have less maintenance ?

  1. create test data manually
  2. backup DB by using script
  3. execute automated test
  4. rollback test data from backup
    (and loop 3 and 4…)

This approach is like your first approach.


When I think of test data and automation, I think of a test that must evaluate a product when the data varies. That is, the more diverse the test data, the more behaviors that can be evaluated.

One of the best sources of diverse data is production. I’ve been fortunate to be able to evaluate production data in a couple of ways. In one project, we received a set of production data and scrubbed the sensitive parts. We presented this data to the product in a non-production environment. This help us tremendously to identify errors before allowing the data to enter our production system and possibly cause downstream errors.
In another project, a team routinely collects production requests to their system, scrubs sensitive data, and places the request into a non-production database. These requests are available to use in testing. I found them very valuable because they were very diverse.


It’s not clear your questions here. Is this with respect to data in the system already for which you test with such as reading & validating that data or modifying that data? Seeded test data to work off of in tests.

Or data you use in tests to write into database/system at test time as part of tests (not test setup or precondition execution), for which this data does not exist in the database/system?

Or a combination of both scenarios that I mention above?

1 Like

if you really mean to how to take care of static data like filling out address, names then use
Faker. This is particularly i use in my selenium python tests. There must be something available too in other languages.


1 Like

I personally like managing my test data by keeping it separate from the automation code. I usually store it in a separate location like excel sheets, resource files, config files etc. I then pass them to the test during run time.

Also, I try to have a diverse data set for testing in QA, Stating and Production. In fact, I usually have 3 config files for the 3 environments. Based on what environment I am running the tests on, I pass the appropriate config file.

Generating Test Data dynamically may work in some cases, but for me personally, I like to have control on what test data I am using. One suggestion would be, if the test data is dynamically generated, it would be a good idea to record them as well. This is in case you want to investigate pass/failed tests in the future.