Inspired by a post from @maareton a LinkedIn post where she raises the positives of GenAI Pair Testing:
āIām manual tester, genAI tools arenāt at all helpful to meā speaks of not understanding the core role of reflection/introspection to testing. We can both be surprised of things it āknowsā and hate-inspired to do better ourselves. Pair test with it. Works on your schedule.
Review requirements and adjust acceptance criteria
Create test cases & scenarioās
Think about edge cases
Predict where bugs will occur from developers; based on:
Previous data
Often forgotten things
Create Dev Code
UI => IDs and locators
API => Specs & code
Create POM Files
Create test scripts
UI & API & Unit
Create performance tests based on Flows from API tests
Create CI/CD
Create containers
Creating pipeline files
Reporting
Test case reviews
Pass/Fail
Performance Testing review
Make powerpoints
Make Graphs
Review multiple iterations
Write documentation
Monitoring & Alerting
Tell where itās going to be needed
Reviewing of Logs
Anomaly detection
I wanted to do more but I left the company and now I have to restart XD
Was it perfect? Absolutely not in the beginning. I made separate AI Agent for EACH Role and each Job (eg: performance testing, automation, analysis, etcā¦)
At the beginning the output was bad but that was because my prompting and trainings were bad. I trained each agent more and more with online content and internal content (careful that you donāt share secrets).
After ~2 months the results were amazing. I felt like a DevLead who did 80% code review and wrote 20% himself. I was the āAI Reviewerā for 80% and I added 20% myself.
The main problem is often:
You need mature content of your project and almost nobody has that. Youāll need to change a lot of structure internally towards your User Stories and way of working.
See it as the receipt of the smoothy.
You need a very good specific AI Agent which is trained on things you specifically want for your project.
You want it to be fed with online YT videos, blogs, posts, internal content.
You need to describe your Agent, but you really need to describe your agent.
When I made the AI Agents, I started off with maybe 3 sentences of what my AI agent should be and how it should act. Eventually they were almost a full A4 page long, which made it so much more specific and perfect.
The pro part is, itās basically the description of your of your AI Agent and not really the āpromptā itself. Itās a continuous journey to edit the description.
You need to prompt
Asking "can you write me a test case for X or Y " is not going to cut it. You need to add a lot of detail to it. (You can make custom commands for this, so you donāt always have to re-write it helped me a lot!)
Itās not going to work from the initial start
You need to train train train the model. It took me a long time to get decent outputs, but overtime you can see it improve as you get better at prompting and describing what it needs to do.
Fun Part:
You can integrate AI Agents with webhooks towards JIRA or Azure and then you can teach it that the requirements are inside of your user story, so you can create workflows between your AI Agents.
Workflows?
When one reviews the user story, he can potentially add new acceptance criteria and if itās done, he trigger another AI Agent to write test cases, he could then trigger another Agent to write automated test scripts, etc etcā¦ so many options.
In general, LLMs can serve you with testing in providing you fresh insights on test scenarios and how to perform the testing. If not that, it might just even be about a simple check that you did not miss any obvious scenarios.
Negative aspects involve relying on them too much, use it as a junior that you want to check, at least till you understand what are the strengths and weaknesses of it.
Regarding prompts, itās most important that you give it enough context and a clear instructions. Being clear about itās role and what you expect really works. āYou are an expert QA Tester that writes testcases with clear steps and per each step an expected resultā for example. Or, explain that you want BDD scenarios and the rules applying to those. Then give it enough context.
Honestly, there are sufficient tools out there that are optimized for test case writing which I would use rather than copy pasting these things in ChatGPT or whatever. For example, this jira add-on creates the test case for you, then you can adjust it to your needs. No need to write instructions or copy paste the context in ChatGPT.
How to support gaps in your knowledge? Now that you have options to connect LLMs to the internet, itās easy to gather more info by simple questions. Product specific knowledge is a bit harder, since it needs to be a very well documented product that has sufficient coverage on the internet to be trained on. For testing best practices etc, it works quite well to ask questions and let it browse the internet for the latest best practices.
GenAI Pair Testing opens up exciting possibilities for testers by introducing a non-judgmental and ever-available collaborator. Here are some ideas and reflections inspired by Maaretās post:
Ideas for GenAI Pair Testing:
Expanding Testing Perspectives: Use GenAI to challenge assumptions by asking it to provide alternative viewpoints or edge cases you may have overlooked.
Validating Scenarios: Prompt GenAI to simulate user behaviors or generate test cases for unique scenarios, especially when exploring boundary conditions.
Learning New Skills: Leverage its vast knowledge base to learn about unfamiliar testing tools, frameworks, or strategies in real-time.
Positives of GenAI Pair Testing:
Creativity Boost: It provides fresh ideas or questions, acting as a catalyst for deeper exploration.
Non-Judgmental Feedback: Encourages freedom to ask questions that might feel ātoo basicā to ask a colleague.
Efficiency: It can handle repetitive tasks, allowing testers to focus on creative and critical thinking.
Challenges to Address:
Context Awareness: GenAI can lack the nuanced understanding of your specific product or domain.
Over-Reliance: The tool is supplementary and should not replace human intuition or expertise.
Bias Risks: Like any AI, it may reinforce biases inherent in its training data.
Coming to How It Can Fill Gaps:
When used wisely, GenAI enhances the manual testing process by encouraging feedback and debugging, and while allowing to get higher % of code coverage. But the key lies in maintaining a balance between leveraging its strengths and applying critical human oversight.
At current company, the LLMās have became our testing buddy, although we are using LLMās after fine-tuning them via prompt engineering and then just doing a final review. Initially, it wasnāt perfect and result were somewhat vague, but after a lot of prompt engineering, and setting the boundaries and filters seem to work for us.
Recently, we ended up integrating our AI tester into over our main product Keploy as well, to maintain the coverage and tests still relevant.