How would you use GenAI Pair Testing as part of your testing toolkit?

Inspired by a post from @maaret on a LinkedIn post where she raises the positives of GenAI Pair Testing:

ā€™Iā€™m manual tester, genAI tools arenā€™t at all helpful to meā€™ speaks of not understanding the core role of reflection/introspection to testing. We can both be surprised of things it ā€˜knowsā€™ and hate-inspired to do better ourselves. Pair test with it. Works on your schedule.

  • What ideas do you have for GenAI Pair Testing?
  • What are the positive and negative aspects of this?
  • What kind of prompts would you write?
  • How would you use it to support gaps in your knowledge, skills or team?
4 Likes

LLM systems became my pairing testing buddy.
Itā€™s like they do the work and I review it and add some moreā€¦ :slight_smile:

I canā€™t imagine myself not using it these days.

Clear case of not knowing how to use LLM systems.
Compare it to a Blender,

  • if you put ice, strawberries and water into a blender.
  • Mix it

Youā€™ll get a lovely smoothy

  • If you put sh*t into the blender
  • Mix it

Itā€™s still sh*tā€¦


Using LLM systems without big prompts and proper training of the model is just going to give you bad results.

I used AI on an internal project to do my work.
These are things that I let it fix for me:

  • Create ToDo tasks
    • Workflows
    • RFC
    • Performance testing
    • ā€¦
  • The membrane
    • Ask it to think for you; example:
      • Ask what to keep in mind of when performance testing
      • What did I forget?
  • Analyse Analysis
  • Write acceptance criteria & review acceptance criteria
  • Review requirements and adjust acceptance criteria
  • Create test cases & scenarioā€™s
    • Think about edge cases
  • Predict where bugs will occur from developers; based on:
    • Previous data
    • Often forgotten things
  • Create Dev Code
    • UI => IDs and locators
    • API => Specs & code
  • Create POM Files
  • Create test scripts
    • UI & API & Unit
  • Create performance tests based on Flows from API tests
  • Create CI/CD
    • Create containers
    • Creating pipeline files
  • Reporting
    • Test case reviews
    • Pass/Fail
    • Performance Testing review
    • Make powerpoints
    • Make Graphs
    • Review multiple iterations
    • Write documentation
  • Monitoring & Alerting
    • Tell where itā€™s going to be needed
    • Reviewing of Logs
    • Anomaly detection

I wanted to do more but I left the company and now I have to restart XD

Was it perfect? Absolutely not in the beginning. I made separate AI Agent for EACH Role and each Job (eg: performance testing, automation, analysis, etcā€¦)

At the beginning the output was bad but that was because my prompting and trainings were bad. I trained each agent more and more with online content and internal content (careful that you donā€™t share secrets).

After ~2 months the results were amazing. I felt like a DevLead who did 80% code review and wrote 20% himself. I was the ā€œAI Reviewerā€ for 80% and I added 20% myself.

The main problem is often:

    1. You need mature content of your project and almost nobody has that. Youā€™ll need to change a lot of structure internally towards your User Stories and way of working.

See it as the receipt of the smoothy.

    1. You need a very good specific AI Agent which is trained on things you specifically want for your project.

You want it to be fed with online YT videos, blogs, posts, internal content.

    1. You need to describe your Agent, but you really need to describe your agent.

When I made the AI Agents, I started off with maybe 3 sentences of what my AI agent should be and how it should act. Eventually they were almost a full A4 page long, which made it so much more specific and perfect.

The pro part is, itā€™s basically the description of your of your AI Agent and not really the ā€œpromptā€ itself. Itā€™s a continuous journey to edit the description.

    1. You need to prompt

Asking "can you write me a test case for X or Y " is not going to cut it. You need to add a lot of detail to it. (You can make custom commands for this, so you donā€™t always have to re-write it :smiley: helped me a lot!)

    1. Itā€™s not going to work from the initial start

You need to train train train the model. It took me a long time to get decent outputs, but overtime you can see it improve as you get better at prompting and describing what it needs to do.

Fun Part:

You can integrate AI Agents with webhooks towards JIRA or Azure and then you can teach it that the requirements are inside of your user story, so you can create workflows between your AI Agents.

Workflows?

When one reviews the user story, he can potentially add new acceptance criteria and if itā€™s done, he trigger another AI Agent to write test cases, he could then trigger another Agent to write automated test scripts, etc etcā€¦ so many options.

7 Likes

In general, LLMs can serve you with testing in providing you fresh insights on test scenarios and how to perform the testing. If not that, it might just even be about a simple check that you did not miss any obvious scenarios.

Negative aspects involve relying on them too much, use it as a junior that you want to check, at least till you understand what are the strengths and weaknesses of it.

Regarding prompts, itā€™s most important that you give it enough context and a clear instructions. Being clear about itā€™s role and what you expect really works. ā€˜You are an expert QA Tester that writes testcases with clear steps and per each step an expected resultā€™ for example. Or, explain that you want BDD scenarios and the rules applying to those. Then give it enough context.

Honestly, there are sufficient tools out there that are optimized for test case writing which I would use rather than copy pasting these things in ChatGPT or whatever. For example, this jira add-on creates the test case for you, then you can adjust it to your needs. No need to write instructions or copy paste the context in ChatGPT.

How to support gaps in your knowledge? Now that you have options to connect LLMs to the internet, itā€™s easy to gather more info by simple questions. Product specific knowledge is a bit harder, since it needs to be a very well documented product that has sufficient coverage on the internet to be trained on. For testing best practices etc, it works quite well to ask questions and let it browse the internet for the latest best practices.

4 Likes

GenAI Pair Testing opens up exciting possibilities for testers by introducing a non-judgmental and ever-available collaborator. Here are some ideas and reflections inspired by Maaretā€™s post:

Ideas for GenAI Pair Testing:

  1. Expanding Testing Perspectives: Use GenAI to challenge assumptions by asking it to provide alternative viewpoints or edge cases you may have overlooked.
  2. Validating Scenarios: Prompt GenAI to simulate user behaviors or generate test cases for unique scenarios, especially when exploring boundary conditions.
  3. Learning New Skills: Leverage its vast knowledge base to learn about unfamiliar testing tools, frameworks, or strategies in real-time.

Positives of GenAI Pair Testing:

  • Creativity Boost: It provides fresh ideas or questions, acting as a catalyst for deeper exploration.
  • Non-Judgmental Feedback: Encourages freedom to ask questions that might feel ā€œtoo basicā€ to ask a colleague.
  • Efficiency: It can handle repetitive tasks, allowing testers to focus on creative and critical thinking.

Challenges to Address:

  • Context Awareness: GenAI can lack the nuanced understanding of your specific product or domain.
  • Over-Reliance: The tool is supplementary and should not replace human intuition or expertise.
  • Bias Risks: Like any AI, it may reinforce biases inherent in its training data.

Coming to How It Can Fill Gaps:
When used wisely, GenAI enhances the manual testing process by encouraging feedback and debugging, and while allowing to get higher % of code coverage. But the key lies in maintaining a balance between leveraging its strengths and applying critical human oversight.

At current company, the LLMā€™s have became our testing buddy, although we are using LLMā€™s after fine-tuning them via prompt engineering and then just doing a final review. Initially, it wasnā€™t perfect and result were somewhat vague, but after a lot of prompt engineering, and setting the boundaries and filters seem to work for us.
Recently, we ended up integrating our AI tester into over our main product Keploy as well, to maintain the coverage and tests still relevant.

3 Likes