Alright, this is going to be a long post so get ready! In the past I had a project going on to do everything QA-related for 6 months with AI. The first few months were absolute sh*t but when my descriptions of AI agents became PAGES and not sentences it was magnificent.
I’m not saying it’s the holy grail and AI the the best but I felt like a Dev Tech Lead, which job consists of 80% code reviews and 20% writing stuff himself. That was me… AI made 80% of the output and I added 20% more.
Remember this project was ~6months at first it was probably the other way around, I added 80% and AI 20% but overtime when the models learned more about my project and way of working and my description of AI agents became larger and more specified it became more and more accurate.
It’s like a blender, put nice ingredients in it and you’ll get a smoothie.
Put poop in it, well it’s still poop after mixing just liquid
.
I’m hoping to find new ideas and suggestions from you guys that I can try out or maybe even give you guys some ideas of what AI could be used for.
I have a new internal project at my current employer and I’m going to recreate the same process only bigger and better! I’ll explain what I did/have in mind.
So everything on the schema’s below:
- Green is manual work
- Blue is AI work + it has an individual AI agent which is specifically trained for that purpose and that purpose ONLY.
Reminder: EVERYTHING is AUTOMATED and looped until the AI Agent says “That’s good, I cannot think of anything new”. + I still did the 20% Job of adding things myself on each BLUE block (then it would loop again through the process)
Step 1:
We have a manual input of business & functional analysis.
After it’s submitted the first AI Agent will review the analysis and produce questions for the analyst to remove assumptions. Together at the same time 2 other agents will start producing:
- requirements and acceptance criteria
- After writing REQ & AC there is another agent which reviews this output adds new REQ & AC if required and pushes it towards the functional analysis and looped into a review again until all AI Agents say “That’s good, I cannot think of anything new”
Extra explanation: You can see the grey ish background here. Which basically means that all the AI agents in that background are linked to each other, meaning they’ll loop and refactor Automatically based on an Update in the previous state (manual or by AI). This is for all following schema’s. We created scripts that took the output from the AI and added it to the Analysis in this picture.
Step 2:
Based of those requirements, analysis and Acceptance Criteria my agents will create Test scenario’s/cases/steps. I had issues here with coverage so I had to create another AI Agent which specifically thought of Edge Cases.
Step 3:
This was only added later towards the project whenever we had enough data from JIRA/Developers. We on purposely made mistakes let’s say developer X didn’t add ‘required fields’ often, it would write this as output as “risk” that and it would advise on it.
The advise would be: Give this user story to developer Y to produce the least bugs.
Or if Developer X picks up this ticket, don’t forget to check this or don’t forget to add required fields. (If I remember correct, it was about 82% accurate)
Step 4:
We made it generate code for the devs according to guidelines & standards, and already name Locators, add IDs for automation, API Specs etc. Which was really great! It was reviewed in a loop.
Step 5a:
Based off step 4 we were able to create POM files for automation, create test scripts with Cypress back in the days and review test coverage (compared to REQ & AC)
There was also a trigger: if the code changed, the POM files Agent validated it and perhaps created more content, triggering the rest of the agents again etc etc …
Step 5b:
Parallel we had an AI agent creating performance testing scripts, in JMeter. Which was “hard” to get it right, the description was probably the longest because in our analysis we had to get a specific section for performance testing with all the proper requirements.
Step 6:
I don’t have to explain this much I suppose, we created files/scripts for Azure DevOps pipelines.
This was very accurate, like almost 100% since over time our description of the agent was pages long ![]()
Step 7:
The reporting section! Here is still a lot to improve imho. Whenever the automated tests were executed OR manual tests. (Automated for fetched from Azure DevOps, manual had to still be manually uploaded, I didn’t finish this part yet)
It analysed like:
- Repeating failing tests and advised on it
- Noted down pass/fail %'s
- Spotted flaky tests (that we made on purpose)
For performance testing it did the same, found bottlenecks etc and advised on how to approach to debug. Same for Anomaly detection
We don’t like reporting so we though, let’s make a powerpoint ![]()
So we had an Agent to create graphs specifically for this.
Which were fed to the Agent to create powerpoints (but they were ugly af)
Step 8:
This is a “side project” where we got Agents for our “shift right”
It scanned logs, activity etc and reported it
For example when we launched a vulnerability scan with loads of attacks, it will trigger alerts of a hacker being present and required attention.
Anomaly detection was for example: When in a wizard/flow people (me duh) would bypass a step via API and it would detect that it’s not the correct flow and report that people are “trying to cheat”
This wasn’t too mature either, I wanted to add way more into the log reviews etc.
BUT the nicest part of this was that if there were bugs detected in production, it would automatically create test cases for it if it was forgotten! (that was the initial goal)
So this is basically what I did with AI at a client, I don’t have IP of it nor access anymore
but I’m going to recreate it for an internal project at my current employer.
PS: I didn’t set this up all alone duh! I had help from a data guy who created me all these agents and together we created the linking process and I provided it training data. (internal documents, video’s, blogs, etc )
It was all run on a server, not in the cloud.
What I’m looking for is new ideas to add onto it to try it out for “fun” since it’s an internal project and we are trying to discover what AI can do. I am now aware that it will probably require at least 6 months to get some decent results so I have patience ![]()
Things I want to add already:
- Threat Modeling
- Security - SAST
- Add BDD
- I also wanted to add some capacity planning for testers
Thanks for reading and hope you got some ideas or might give me some ideas!









