Large Language Models: Ethic and society - How can we help as testers?

We have an incredible article on testing large language models such as ChatGPT that spans testing, linguistics and philosophy from @elojd.

As someone who has been diving deep into the world of LLMs, I found this to be a very interesting read, but I struggled to think of a great question to follow up the article on.

So I asked ChatGPT :smiley: It seemed apt given the topic and it came up with this question:

What are some potential ethical concerns or societal implications that arise when using large language models like GPT, and how can we address them during the testing and development process?

Obviously there is concern about how it will impact our industry and the testing role. But what about other industries and society as a whole? What responsibility as testers do we when it comes to LLMs? What can we do?


I would like to know whose language is being modeled and from what context, and does this fit with the intended use. The trouble is the context isn’t always clear - something that humans struggle with sometimes.

For instance, satire, sarcasm, hyperbole, humour etc. mean that e.g. a politician’s name might be associated with words relating to the opposite of their views or other attributes. I’m a Brit, and when a British company I worked for was bought by an American one, it took a while for some of my new American colleagues to get to grips with quite how sarcastic Brits can be.

Excluding that, as the article indicates, language can be dependent on both the speaker and the context. Most people can do style switching (Style (sociolinguistics) - Wikipedia), and different people will have different preferred styles - my teenage children use different words for some things to me, or the same words have different meanings or strengths of meaning (e.g. rude words often get less rude over time, but people sometimes keep hold of the stronger version they first learned).

For instance: what does the word “jumper” mean? If you’re a Brit, it’s clothing. If you’re doing electronics it’s a way of connecting things or not. It means something else related to horses.

Did the LLM understand enough about the context to know how to interpret the words it’s reading in the training data? Does it know enough about the current user and their context to know what language is appropriate when it generates output?


For those interested in LLM Testing, OWASP just came out with a Top 10 Security vulnerability list for LLM:

Very interesting read, and enjoyment for testing! :slight_smile:


The LLM is in my grasp of it, merely Stochastic Parroting, after this paper
The ethical problem is real because robots are not going to like Stephen Hawkings predicts mutate ever more rapidly. They will merely allow us to abdicate our responsibilities as humans in businesses up and down the spectrum and be guided by biased data. So as testers, data scrubbing becomes our special skill to add? surely? I’m with Bob, AI is not the enemy, the data gatherer is.

1 Like

Also for people who want to try out some Prompt Injection:

Enjoy! :slight_smile:

Winning Gif
WoW! This was fun!

Would recommend everyone to try this out too.

1 Like