@han_toan_lin raised the issue of randomly-generated data coincidentally giving rise to real world data existing within test datasets, and he drew my attention to the GDPR. I’ve looked at that, and I think that no liability attaches in this case. I speak here from the position of someone with (in a previous existence) 30 years of applying administrative law in real world situations in some (admittedly out-of-the-way) corners of the UK government.
Looking at the document Han Lin pointed us to, on page 8 the guidance says:
The GDPR applies to ‘personal data’ meaning any information relating to an identifiable person who can be directly or indirectly identified in particular by reference to an identifier.
So first, we are looking at data relating to an “identifiable” person. if you have generated your test data randomly, the fact that you happen to hit a valid social security number could be considered coincidental. The person that relates to is not “identifiable” without further action. For your purposes within the application under test, the person that the number applies to is not identifiable. It ls only if you go looking using other tools that identifying the person becomes possible. And in such cases, I would say that it was the person going looking who is committing the breach, rather than the person or persons using a randomly-generated number not associated with a real person for purposes of testing a completely different application.
Later in the same paragraph, the guidance says:
Personal data that has been pseudonymised – eg key-coded – can fall within the scope of the GDPR depending on how difficult it is to attribute the pseudonym to a particular individual.
If your application under test is not intended to identify individuals from that one data item (the social security number), and the data allowing identification - a real name that can be associated with a real social security number - does not exist within the application under test, I would suggest that passes the ‘difficulty’ test.
On page 10, the guidance says:
Article 5 of the GDPR requires that personal data shall be:
“a) processed lawfully, fairly and in a transparent manner in relation to individuals;
b) collected for specified, explicit and legitimate purposes and not further processed in a manner that is incompatible with those purposes; (and)
c) adequate, relevant and limited to what is necessary in relation to the purposes for which they are processed…"
The key word here is “processed”. Having a randomly-generated social security number, not associated with a real person’s name, and using that to test an application meets the requirement to "process…lawfully (and) fairly…*. The purpose of holding the information also applies here. The purpose of testing an application is perfectly legitimate. And simply using that randomly-generated number to populate a test dataset meets the requirement to be “limited to what is necessary in relation to the purposes for which they are processed”.
In my opinion, it would be the act of setting out to associate that number with any individual’s name which would go beyond these provisions and cause a GDPR breach.
At least, that’s the explanation I would send to our legal team if this challenge arose in practice. Different organisations will obviously react differently. When I was working in the Government sector, I would be expected to be familiar with the law and its application in practice, and our legal team would give their opinion based on the position of expecting to start out on firm legal ground. A private company might want to err on the side of caution, depending on how confident their legal team are in their own opinion.
And now you see why lawyers charge such high fees!