In some projects, under privacy, compliance, or security regulations, no production data whatsoever is allowed to be used for testing. That indeed sets a reasonable boundary on realistic testing.
Do you generate synthetic datasets, mask sensitive fields, use small subsets of anonymized exports, or mock everything? And, more importantly, how do you validate that your tests are still representative of real-world usage when none of the data you’re dealing with are real?
Its a very valid question, concern & struggle. How we make test data realistic that actually resonates with production dataset, however due to compliance we cant utilised it.
So how do we test? I believe this is the point where real collaboration & cross functional teams really kicks in. You have to on board people who have access to production , to clients to take on their views on the test data samples you have curated.
It does not become time consuming or repetitive, try to time box it in Exploratory Test session, Ensemble Test or Mob Tests - create clear Charters, goals and expected outcome of the meeting to make best out of it and peoples’ time and insights!
I’d consider this hurdle as an opportunity to meet, explore, learn & collaborate!
As you mentioned … Why would you have access to production data? gdpr
In order to test we synthetically create our data or mask our production data or we take a copy from production and scramble/anonymize the data that comes from production.
So it will be like FN_Testuser1 (this would be the firstname of the user)
We create test cases accordingly. You got a customer with ABC, you create a test case with ABC.
A customer has the possibility to have ABCDEYZKJNFDJ then we create a test case for it. (depending on the limitation)
My teams used to be able to use scrubbed production data for testing, but that’s gotten harder as privacy laws got more strict. Generating realistic data was one way my teams went. We used our monitoring, observability and analytic tools that report production data to help us understand what would be realistic cases and scenarios to capture. I’ve found that marketing people can be super helpful with this, they are always watching what users are doing. Customer support folks are another great resource. These days, with the right prompting, an LLM could help generate test data.