How would you use GenAI Pair Testing as part of your testing toolkit?

Inspired by a post from @maaret on a LinkedIn post where she raises the positives of GenAI Pair Testing:

’I’m manual tester, genAI tools aren’t at all helpful to me’ speaks of not understanding the core role of reflection/introspection to testing. We can both be surprised of things it ‘knows’ and hate-inspired to do better ourselves. Pair test with it. Works on your schedule.

  • What ideas do you have for GenAI Pair Testing?
  • What are the positive and negative aspects of this?
  • What kind of prompts would you write?
  • How would you use it to support gaps in your knowledge, skills or team?
4 Likes

LLM systems became my pairing testing buddy.
It’s like they do the work and I review it and add some more… :slight_smile:

I can’t imagine myself not using it these days.

Clear case of not knowing how to use LLM systems.
Compare it to a Blender,

  • if you put ice, strawberries and water into a blender.
  • Mix it

You’ll get a lovely smoothy

  • If you put sh*t into the blender
  • Mix it

It’s still sh*t…


Using LLM systems without big prompts and proper training of the model is just going to give you bad results.

I used AI on an internal project to do my work.
These are things that I let it fix for me:

  • Create ToDo tasks
    • Workflows
    • RFC
    • Performance testing
  • The membrane
    • Ask it to think for you; example:
      • Ask what to keep in mind of when performance testing
      • What did I forget?
  • Analyse Analysis
  • Write acceptance criteria & review acceptance criteria
  • Review requirements and adjust acceptance criteria
  • Create test cases & scenario’s
    • Think about edge cases
  • Predict where bugs will occur from developers; based on:
    • Previous data
    • Often forgotten things
  • Create Dev Code
    • UI => IDs and locators
    • API => Specs & code
  • Create POM Files
  • Create test scripts
    • UI & API & Unit
  • Create performance tests based on Flows from API tests
  • Create CI/CD
    • Create containers
    • Creating pipeline files
  • Reporting
    • Test case reviews
    • Pass/Fail
    • Performance Testing review
    • Make powerpoints
    • Make Graphs
    • Review multiple iterations
    • Write documentation
  • Monitoring & Alerting
    • Tell where it’s going to be needed
    • Reviewing of Logs
    • Anomaly detection

I wanted to do more but I left the company and now I have to restart XD

Was it perfect? Absolutely not in the beginning. I made separate AI Agent for EACH Role and each Job (eg: performance testing, automation, analysis, etc…)

At the beginning the output was bad but that was because my prompting and trainings were bad. I trained each agent more and more with online content and internal content (careful that you don’t share secrets).

After ~2 months the results were amazing. I felt like a DevLead who did 80% code review and wrote 20% himself. I was the “AI Reviewer” for 80% and I added 20% myself.

The main problem is often:

    1. You need mature content of your project and almost nobody has that. You’ll need to change a lot of structure internally towards your User Stories and way of working.

See it as the receipt of the smoothy.

    1. You need a very good specific AI Agent which is trained on things you specifically want for your project.

You want it to be fed with online YT videos, blogs, posts, internal content.

    1. You need to describe your Agent, but you really need to describe your agent.

When I made the AI Agents, I started off with maybe 3 sentences of what my AI agent should be and how it should act. Eventually they were almost a full A4 page long, which made it so much more specific and perfect.

The pro part is, it’s basically the description of your of your AI Agent and not really the “prompt” itself. It’s a continuous journey to edit the description.

    1. You need to prompt

Asking "can you write me a test case for X or Y " is not going to cut it. You need to add a lot of detail to it. (You can make custom commands for this, so you don’t always have to re-write it :smiley: helped me a lot!)

    1. It’s not going to work from the initial start

You need to train train train the model. It took me a long time to get decent outputs, but overtime you can see it improve as you get better at prompting and describing what it needs to do.

Fun Part:

You can integrate AI Agents with webhooks towards JIRA or Azure and then you can teach it that the requirements are inside of your user story, so you can create workflows between your AI Agents.

Workflows?

When one reviews the user story, he can potentially add new acceptance criteria and if it’s done, he trigger another AI Agent to write test cases, he could then trigger another Agent to write automated test scripts, etc etc… so many options.

5 Likes

In general, LLMs can serve you with testing in providing you fresh insights on test scenarios and how to perform the testing. If not that, it might just even be about a simple check that you did not miss any obvious scenarios.

Negative aspects involve relying on them too much, use it as a junior that you want to check, at least till you understand what are the strengths and weaknesses of it.

Regarding prompts, it’s most important that you give it enough context and a clear instructions. Being clear about it’s role and what you expect really works. ‘You are an expert QA Tester that writes testcases with clear steps and per each step an expected result’ for example. Or, explain that you want BDD scenarios and the rules applying to those. Then give it enough context.

Honestly, there are sufficient tools out there that are optimized for test case writing which I would use rather than copy pasting these things in ChatGPT or whatever. For example, this jira add-on creates the test case for you, then you can adjust it to your needs. No need to write instructions or copy paste the context in ChatGPT.

How to support gaps in your knowledge? Now that you have options to connect LLMs to the internet, it’s easy to gather more info by simple questions. Product specific knowledge is a bit harder, since it needs to be a very well documented product that has sufficient coverage on the internet to be trained on. For testing best practices etc, it works quite well to ask questions and let it browse the internet for the latest best practices.

1 Like