AI in QA - New ideas for my Project

kristof · 17 April 2025 10:45

Alright, this is going to be a long post so get ready! In the past I had a project going on to do everything QA-related for 6 months with AI. The first few months were absolute sh*t but when my descriptions of AI agents became PAGES and not sentences it was magnificent.

I’m not saying it’s the holy grail and AI the the best but I felt like a Dev Tech Lead, which job consists of 80% code reviews and 20% writing stuff himself. That was me… AI made 80% of the output and I added 20% more.

Remember this project was ~6months at first it was probably the other way around, I added 80% and AI 20% but overtime when the models learned more about my project and way of working and my description of AI agents became larger and more specified it became more and more accurate.

It’s like a blender, put nice ingredients in it and you’ll get a smoothie.
Put poop in it, well it’s still poop after mixing just liquid .

I’m hoping to find new ideas and suggestions from you guys that I can try out or maybe even give you guys some ideas of what AI could be used for.

I have a new internal project at my current employer and I’m going to recreate the same process only bigger and better! I’ll explain what I did/have in mind.

So everything on the schema’s below:

Green is manual work
Blue is AI work + it has an individual AI agent which is specifically trained for that purpose and that purpose ONLY.

Reminder: EVERYTHING is AUTOMATED and looped until the AI Agent says “That’s good, I cannot think of anything new”. + I still did the 20% Job of adding things myself on each BLUE block (then it would loop again through the process)

Step 1:

We have a manual input of business & functional analysis.
After it’s submitted the first AI Agent will review the analysis and produce questions for the analyst to remove assumptions. Together at the same time 2 other agents will start producing:

requirements and acceptance criteria
After writing REQ & AC there is another agent which reviews this output adds new REQ & AC if required and pushes it towards the functional analysis and looped into a review again until all AI Agents say “That’s good, I cannot think of anything new”

Extra explanation: You can see the grey ish background here. Which basically means that all the AI agents in that background are linked to each other, meaning they’ll loop and refactor Automatically based on an Update in the previous state (manual or by AI). This is for all following schema’s. We created scripts that took the output from the AI and added it to the Analysis in this picture.

Step 2:

Based of those requirements, analysis and Acceptance Criteria my agents will create Test scenario’s/cases/steps. I had issues here with coverage so I had to create another AI Agent which specifically thought of Edge Cases.

Step 3:

This was only added later towards the project whenever we had enough data from JIRA/Developers. We on purposely made mistakes let’s say developer X didn’t add ‘required fields’ often, it would write this as output as “risk” that and it would advise on it.

The advise would be: Give this user story to developer Y to produce the least bugs.
Or if Developer X picks up this ticket, don’t forget to check this or don’t forget to add required fields. (If I remember correct, it was about 82% accurate)

Step 4:

We made it generate code for the devs according to guidelines & standards, and already name Locators, add IDs for automation, API Specs etc. Which was really great! It was reviewed in a loop.

Step 5a:

Based off step 4 we were able to create POM files for automation, create test scripts with Cypress back in the days and review test coverage (compared to REQ & AC)

There was also a trigger: if the code changed, the POM files Agent validated it and perhaps created more content, triggering the rest of the agents again etc etc …

Step 5b:

Parallel we had an AI agent creating performance testing scripts, in JMeter. Which was “hard” to get it right, the description was probably the longest because in our analysis we had to get a specific section for performance testing with all the proper requirements.

Step 6:

I don’t have to explain this much I suppose, we created files/scripts for Azure DevOps pipelines.
This was very accurate, like almost 100% since over time our description of the agent was pages long

Step 7:

The reporting section! Here is still a lot to improve imho. Whenever the automated tests were executed OR manual tests. (Automated for fetched from Azure DevOps, manual had to still be manually uploaded, I didn’t finish this part yet)

It analysed like:

Repeating failing tests and advised on it
Noted down pass/fail %'s
Spotted flaky tests (that we made on purpose)

For performance testing it did the same, found bottlenecks etc and advised on how to approach to debug. Same for Anomaly detection

We don’t like reporting so we though, let’s make a powerpoint
So we had an Agent to create graphs specifically for this.
Which were fed to the Agent to create powerpoints (but they were ugly af)

Step 8:

This is a “side project” where we got Agents for our “shift right”
It scanned logs, activity etc and reported it

For example when we launched a vulnerability scan with loads of attacks, it will trigger alerts of a hacker being present and required attention.

Anomaly detection was for example: When in a wizard/flow people (me duh) would bypass a step via API and it would detect that it’s not the correct flow and report that people are “trying to cheat”

This wasn’t too mature either, I wanted to add way more into the log reviews etc.
BUT the nicest part of this was that if there were bugs detected in production, it would automatically create test cases for it if it was forgotten! (that was the initial goal)

So this is basically what I did with AI at a client, I don’t have IP of it nor access anymore but I’m going to recreate it for an internal project at my current employer.

PS: I didn’t set this up all alone duh! I had help from a data guy who created me all these agents and together we created the linking process and I provided it training data. (internal documents, video’s, blogs, etc )

It was all run on a server, not in the cloud.

What I’m looking for is new ideas to add onto it to try it out for “fun” since it’s an internal project and we are trying to discover what AI can do. I am now aware that it will probably require at least 6 months to get some decent results so I have patience

Things I want to add already:

Threat Modeling
Security - SAST
Add BDD
I also wanted to add some capacity planning for testers

Thanks for reading and hope you got some ideas or might give me some ideas!

hananurrehman · 17 April 2025 15:03

You’ve taken AI in testing to whole new level

kristof · 17 April 2025 15:06

Haha of course
I never stop at the basics. I wanted to see what it was capable of doing.

Because most people who use LLM systems like chatGPT, put in 1 prompt and go like “meh I don’t like the answer” but when you develop your own agent and create a HUGE prompt, it becomes really accurate. Still not 100% but that’s why I felt like a DevLead doing 80% reviews & 20% writing myself.

Who would I be if I didn’t try to reach the limits!
I really like to do innovation ^^

hananurrehman · 17 April 2025 15:06

What you built then, in my case this covers every single thing we do in our sprints.
I’ve seen teams struggle on use cases and do-ables though. So your next agent can factor in business requirement changes based on some work load threshold. We often drop a use case or two if it doesn’t work out during development and becomes costly in time.

hananurrehman · 17 April 2025 15:08

Speaking of training own agent, how would you train one to learn the scope and context of a website provided there’s no documentation for it?
For now I found giving it massive prompts and screenshots helps generate training files for the other agent that is being trained.

kristof · 17 April 2025 16:26

Well we had documentation XD but without documentation and analysis you can feed to the agents, it’s not that easy.

What we did is give it user stories and per user story it goes through this flow.
The user story had a description, requirements, Acceptance criteria, performance requirements etc + then we gave it the written documentation (confluence) and figma screens too

I can say it was a already “mature” user story.

I don’t know how you would do it without documentation, probably the same only you’ll have a much harder time getting the right output.

hananurrehman · 17 April 2025 17:05

and which AI agent did you use for this?

kristof · 17 April 2025 17:10

The one we build ourselves
It’s the Analyse analysis agent, it reviewed the US + docs

theology · 19 April 2025 02:07

@kristof This is awesome, I am facing 2 issues when I tried AI to generate test cases

How to say test cases generated are x % is test coverage, and once test cases are generated, manual intervention is required to review test cases
When prompting the AI to generate test cases multiple times, the number varied—in some instances, it produced 10 cases, while in others only 8

kristof · 19 April 2025 07:30

Aaah yes exactly! This is one of my struggle points also and not just with test case generation but also when creating pom files, performance scripts etc etc…

That’s why I had to create the “Reviewer Agent” for each component.

So it first creates scenario’s with test cases/steps.
Then a new AI the “Edge Case Agent” will basically ask those results “Did you forget anything? Did you think about edge cases?” The first few times it will say “yes” quite often:
Then it loops back to stop 1 to create extra scenario’s/cases/steps.
Then it will go back to the “Edge Case Agent” and repeat that question.
This will happen for as long as there is a change in “create test scenario” OR you manually update the test scenario’s.

Again AI is not the holy grail and I still had to add 20% myself.

Test coverage can not be 100% that’s just testing. What we did do it calculate the “test coverage based on requirements & acceptance criteria” So we compared the REQ & AC to all the test cases and if one was missing then you wouldn’t have “100% ‘test’ coverage” We also did this for automation.

So after creating POM files & test scripts, we compared the test cases with the “automated test cases” to see if we covered it all. If not… it would loop back.

Topic		Replies	Views
Automatic Test Generation (ATG) using genAI Discussions tools , automation	13	564	13 November 2024
AI agents are acknowledged to be used in the documentation and testing review. What are the benefits? Discussions tools , learning , ai , ai-agents	1	71	17 October 2025
AI Agents in Testing - Let's list the ones you have tried Discussions 30-days-of-ai-in-testing , ai-agents	14	198	18 December 2025
Rethinking QA Strategy with Next-Gen AI & Agentic AI Discussions automation , ai	13	288	4 September 2025
AI in Manual Testing Discussions tools , ai	6	143	2 December 2025

AI in QA - New ideas for my Project

Related topics