Automation Test Data - Advice

Hi All,

I work for a company that has some existing automation in place a mixture of pure Selenium and page model automation using Gherkin/Cucumber, the test team is taking ownership of the automation now as it’s been left abandoned for some time and we want to bring it back to life.

The main sticking point at the moment for us before we set off on the journey, is what’s the best way to get the data required into the application. The application is heavily reliant on setup before you get to the real usage points, we’re unsure as to the best way to bridge that gap.

  • Do we create the data manually and point the automation too it, giving us a reliable base in that the data has been entered in a legitmate fashion.
  • Do we create the data on the fly into the DB as needed, gives us flexibility but probably stretches the ability of the team to make the link from the setup to the usage the link is heavy so requires more expert help.

Trying to avoid setting off down a road that isn’t ideal, the main thing I’ve picked up so far is you should try and keep the data as legitmate as possible which would lean towards entering it manually but some team members are worried about the overhead of managing that and losing the flexibility option 2 would provide…

Would anyone have any thoughts on the best way forward or experience of automating tests where configuration is required in one application that links all the way through to a later portion of the application that could provide advice?

Hello @noelf21 and Welcome!

In my opinion, the test data used may depend on the information objectives of the tests. The tests may be evaluating behaviors and scenarios (such as a regression) in which case the data presented to the system may not need a lot of diversity. If that is the case, I recommend getting production data and scrubbing it for confidential and sensitive information.

The on-going maintenance of the automation would require a review of both the code and the test data to verify both are providing valid results.

I rarely have concerns around “legitimate” test data. In my opinion, all data presented to an application is legitimate. If the application can determine the difference between legitimate and some other kind of data, there is likely a different problem.


1 Like

Thanks for the feedback, I’d missed an earlier post asking a very similar question so combining that and your feedback has helped!

Much appreciated.


I just have to echo Joe’s experience here. In fact how you get the system into a state where the data is in-place is completely up for grabs, it does not have top be “in-band” populated, if you can import a DB table or other underhanded technique, so long as it fits the test-case and find defects, using scrubbed real life data is worth some pain.

Find a way to grab a copy of a customer config, and with permission, anonomize it if at all possible.

Being able to get the system to a desired state (prerequisites) by loading records in a database for example, is going to allow you to focus on what you actually want to test. Jumping through extra hoops in the UI to achieve the same result may be more ‘authentic’ but will cost time and possibly fragility in tests. Eg to test a checkout on a web store, you could ensure that you already have a basket of goods prepopulated.
It goes without saying that you want to check (as another test case) that you can perform those same steps though eg picking a basket of goods.
It is debatable whether you want dependencies between those two test cases later on - avoid by adding the data straight to the database.


Thanks for all the feedback, sounds like some combination of the two would be an ok route to follow initially. Intimidating for a fresh set of eyes as there is considerable setup required to get to some of the more high value areas of the system.