How would you use GenAI Pair Testing as part of your testing toolkit?

rosie · 11 November 2024 22:22

Inspired by a post from @maaret on a LinkedIn post where she raises the positives of GenAI Pair Testing:

’I’m manual tester, genAI tools aren’t at all helpful to me’ speaks of not understanding the core role of reflection/introspection to testing. We can both be surprised of things it ‘knows’ and hate-inspired to do better ourselves. Pair test with it. Works on your schedule.

What ideas do you have for GenAI Pair Testing?
What are the positive and negative aspects of this?
What kind of prompts would you write?
How would you use it to support gaps in your knowledge, skills or team?

kristof · 12 November 2024 06:14

LLM systems became my pairing testing buddy.
It’s like they do the work and I review it and add some more…

I can’t imagine myself not using it these days.

Clear case of not knowing how to use LLM systems.
Compare it to a Blender,

if you put ice, strawberries and water into a blender.
Mix it

You’ll get a lovely smoothy

If you put sh*t into the blender
Mix it

It’s still sh*t…

Using LLM systems without big prompts and proper training of the model is just going to give you bad results.

I used AI on an internal project to do my work.
These are things that I let it fix for me:

Create ToDo tasks
- Workflows
- RFC
- Performance testing
- …
The membrane
- Ask it to think for you; example:
  - Ask what to keep in mind of when performance testing
  - What did I forget?
Analyse Analysis
Write acceptance criteria & review acceptance criteria
Review requirements and adjust acceptance criteria
Create test cases & scenario’s
- Think about edge cases
Predict where bugs will occur from developers; based on:
- Previous data
- Often forgotten things
Create Dev Code
- UI => IDs and locators
- API => Specs & code
Create POM Files
Create test scripts
- UI & API & Unit
Create performance tests based on Flows from API tests
Create CI/CD
- Create containers
- Creating pipeline files
Reporting
- Test case reviews
- Pass/Fail
- Performance Testing review
- Make powerpoints
- Make Graphs
- Review multiple iterations
- Write documentation
Monitoring & Alerting
- Tell where it’s going to be needed
- Reviewing of Logs
- Anomaly detection

I wanted to do more but I left the company and now I have to restart XD

Was it perfect? Absolutely not in the beginning. I made separate AI Agent for EACH Role and each Job (eg: performance testing, automation, analysis, etc…)

At the beginning the output was bad but that was because my prompting and trainings were bad. I trained each agent more and more with online content and internal content (careful that you don’t share secrets).

After ~2 months the results were amazing. I felt like a DevLead who did 80% code review and wrote 20% himself. I was the “AI Reviewer” for 80% and I added 20% myself.

The main problem is often:

1. You need mature content of your project and almost nobody has that. You’ll need to change a lot of structure internally towards your User Stories and way of working.

See it as the receipt of the smoothy.

1. You need a very good specific AI Agent which is trained on things you specifically want for your project.

You want it to be fed with online YT videos, blogs, posts, internal content.

1. You need to describe your Agent, but you really need to describe your agent.

When I made the AI Agents, I started off with maybe 3 sentences of what my AI agent should be and how it should act. Eventually they were almost a full A4 page long, which made it so much more specific and perfect.

The pro part is, it’s basically the description of your of your AI Agent and not really the “prompt” itself. It’s a continuous journey to edit the description.

1. You need to prompt

Asking "can you write me a test case for X or Y " is not going to cut it. You need to add a lot of detail to it. (You can make custom commands for this, so you don’t always have to re-write it helped me a lot!)

1. It’s not going to work from the initial start

You need to train train train the model. It took me a long time to get decent outputs, but overtime you can see it improve as you get better at prompting and describing what it needs to do.

Fun Part:

You can integrate AI Agents with webhooks towards JIRA or Azure and then you can teach it that the requirements are inside of your user story, so you can create workflows between your AI Agents.

Workflows?

When one reviews the user story, he can potentially add new acceptance criteria and if it’s done, he trigger another AI Agent to write test cases, he could then trigger another Agent to write automated test scripts, etc etc… so many options.

buw91 · 13 November 2024 08:59

In general, LLMs can serve you with testing in providing you fresh insights on test scenarios and how to perform the testing. If not that, it might just even be about a simple check that you did not miss any obvious scenarios.

Negative aspects involve relying on them too much, use it as a junior that you want to check, at least till you understand what are the strengths and weaknesses of it.

Regarding prompts, it’s most important that you give it enough context and a clear instructions. Being clear about it’s role and what you expect really works. ‘You are an expert QA Tester that writes testcases with clear steps and per each step an expected result’ for example. Or, explain that you want BDD scenarios and the rules applying to those. Then give it enough context.

Honestly, there are sufficient tools out there that are optimized for test case writing which I would use rather than copy pasting these things in ChatGPT or whatever. For example, this jira add-on creates the test case for you, then you can adjust it to your needs. No need to write instructions or copy paste the context in ChatGPT.

How to support gaps in your knowledge? Now that you have options to connect LLMs to the internet, it’s easy to gather more info by simple questions. Product specific knowledge is a bit harder, since it needs to be a very well documented product that has sufficient coverage on the internet to be trained on. For testing best practices etc, it works quite well to ask questions and let it browse the internet for the latest best practices.

animesh.pathak · 23 December 2024 09:25

GenAI Pair Testing opens up exciting possibilities for testers by introducing a non-judgmental and ever-available collaborator. Here are some ideas and reflections inspired by Maaret’s post:

Ideas for GenAI Pair Testing:

Expanding Testing Perspectives: Use GenAI to challenge assumptions by asking it to provide alternative viewpoints or edge cases you may have overlooked.
Validating Scenarios: Prompt GenAI to simulate user behaviors or generate test cases for unique scenarios, especially when exploring boundary conditions.
Learning New Skills: Leverage its vast knowledge base to learn about unfamiliar testing tools, frameworks, or strategies in real-time.

Positives of GenAI Pair Testing:

Creativity Boost: It provides fresh ideas or questions, acting as a catalyst for deeper exploration.
Non-Judgmental Feedback: Encourages freedom to ask questions that might feel “too basic” to ask a colleague.
Efficiency: It can handle repetitive tasks, allowing testers to focus on creative and critical thinking.

Challenges to Address:

Context Awareness: GenAI can lack the nuanced understanding of your specific product or domain.
Over-Reliance: The tool is supplementary and should not replace human intuition or expertise.
Bias Risks: Like any AI, it may reinforce biases inherent in its training data.

Coming to How It Can Fill Gaps:
When used wisely, GenAI enhances the manual testing process by encouraging feedback and debugging, and while allowing to get higher % of code coverage. But the key lies in maintaining a balance between leveraging its strengths and applying critical human oversight.

At current company, the LLM’s have became our testing buddy, although we are using LLM’s after fine-tuning them via prompt engineering and then just doing a final review. Initially, it wasn’t perfect and result were somewhat vague, but after a lot of prompt engineering, and setting the boundaries and filters seem to work for us.
Recently, we ended up integrating our AI tester into over our main product Keploy as well, to maintain the coverage and tests still relevant.

Topic		Replies	Views
Can AI or chatGPT create and run tests?if yes which tools are you using? Discussions tools	13	1029	4 May 2025
What do you think about AI features for software testers? Discussions tools , ai	19	1092	11 December 2023
How to test genAI Discussions automation , automation-in-testing	5	766	24 February 2024
I wonder in which activities could ChatGPT help you in your testing? Discussions learning	16	1224	8 April 2023
How will Artificial intelligence affect testing? Discussions learning , career-development	4	364	29 January 2024

How would you use GenAI Pair Testing as part of your testing toolkit?

Related topics