Has anyone used an AI agent as a tool to help guide them to areas where they may want to do deeper exploratory testing?

rosie · 29 April 2025 09:32

Asking on behalf of @techgirl1908 and @aclairefication

christinepinto · 29 April 2025 09:50

We’ve actually been exploring this idea in our tool — using AI not to replace exploratory testing, but to supercharge it, especially during things like bug bashes.
The way we see it, AI can throw out a bunch of interesting scenarios, “what if?” questions, or even oddball challenges that testers might not think of right away. It’s not about the AI doing the exploration for you — it’s about sparking your thinking, breaking you out of obvious paths, and giving you new angles to investigate.
Kind of like having a really curious teammate who’s constantly asking, “But what if the user does this weird thing?” or “What happens if the server hiccups here?”emphasised text

From one of our protoypes:

steve.green · 29 April 2025 14:18

I struggle to see how this would be useful. In 30 years of exploratory testing, I have literally never run out of ideas for tests on any of the hundreds of projects I have worked on. I always run out of time first.

If testers run out of ideas, it’s because of the almost total absence of good exploratory testing training. Apart from the AST, BBST and RST courses, there’s pretty much nothing. I developed a pretty good 4-day course but I don’t deliver it anymore.

The result is that the quality of exploratory testing is pretty dismal. We trained a handful of seriously good testers, but I was always astoundingly unimpressed with the quality of contractors and interview candidates.

But the solution isn’t to use AI as a crutch because it can only provide some ideas, most of which won’t be useful. The tester still needs to do the iterative cycle of test design, test execution and learning, and AI won’t help them get better at that. The solution is for testers to take their personal development more seriously. Go on the courses, read the books and learn to read code.

kinofrost · 29 April 2025 22:27

I think that I agree. I think a responsible tester can use those tools, ethical concerns about data sourcing and energy usage aside, if they are indeed informed and responsible. Your point about the quality of testing being dismal.. well, yes, and that I suppose directs me to a similar point.

My worry about any tool is that it serves as a replacement for capability and thought. Testing software is centred around human beings because software and users and the business exist in a series of complex and complicated social systems. We evaluate based on information specific to a company, user base, tech stack, team, global economic situation, competitor behaviour, whatever we care to know about. I can ask AI for ideas on how to test something, but it would be devoid of the context of my software and users, and I can only spend so long explaining what these things are and why they might matter to a tool. That accumulated knowledge, tacit or otherwise, is what makes a tester so powerful. Combined with practiced skill this is what makes humans such powerful explorers, and the use of tools is guided by that same knowledge and skill.

It’s quite insidious, because AI can generate ideas. And testers can generate ideas. So they look the same. But testing happens within testers and their internal structures and models, and should leverage a lot of understanding about the needs of test clients and a myriad of other contextual factors that an AI simply does not have access to. The goal would be to find factors it has access to that we do not, and use it that way. Perhaps simply as a safety net, in the same way a checklist might be used.

Testing is a social science but a lot of the material written about testing suggests that it’s mechanical, procedural and formalised. That means that AI systems trained on the vast fields of nonsense written about what testing is, can do, and is, all feel likely to repeat that nonsense. If an AI suggests that I use faulty metrics (and there’s a lot of things written about how to employ metrics that don’t work) then it’s giving me terrible advice. So it requires a certain amount of understanding to be able to reject the tool when it gives us faulty or damaging ideas - making it more of a tool for experts, perhaps.

I’d look to helpful lists if I wanted to generate ideas. The HTSM is a good way to add to ideas that the software itself brings up as we explore it, for one example. I’m not sure how AI can fit in here - without knowing the software it can’t define ideas by product area, so they’d have to be more generic. It could help to create connections between facts about the software and testing concepts - so if I told it that I was creating something for a specific domain it could give me facts about that domain related to testing it. As we adjust the testability of the software by accumulating knowledge, an AI could be part of finding and collating those facts - although they’d have to be checked if they’re specific and important facts.

I think AI is likely best used as a slightly uninformed brainstorming partner for specific cases. To be able to feed in context about who is testing, what is being tested, what risks you’re testing for, how you want to go about it, and/or the oracles against which you are evaluating you may be able to find connections to other concerns, or scenarios that make sense. To flesh out techniques, help find holes in coverage, and so on. When we have those then a tester can decide how to apply resources and if they’re worthy of concern. Whether or not that’s worth the cost is another concern.

steve.green · 30 April 2025 06:51

That’s an astoundingly good analysis.

mistercwood · 30 April 2025 23:49

I’m trying really hard not to be a hater here, but why would someone trust that your tool is worthwhile when you couldn’t even be bothered to write your comment yourself? It is painfully obvious you used an LLM for your entire comment…

AI can be a practical tool, but it becomes extremely worrying when people outsource even the most basic writing tasks to it.

rosie · 1 May 2025 00:03

I just checked the comment on GPTZero and it’s come back as human. It felt human to me too, I wonder what I’m missing.

What parts make it feel LLM created?

mistercwood · 1 May 2025 22:45

Two markers stood out to me - first, the use of Em-dashes. Yes, people do use them in their writing, but most people don’t know how to do them from the keyboard. LLM’s however use them very liberally. This is a known marker of LLM output at the moment, though I can see the various providers putting in patches to try and remove this “tell” that content is from AI, so it may disappear soon enough.

Secondly was in a later paragraph where it fell into a very common LLM pattern, “It’s not about X, it’s about Y”. Again, of course this is something a human might write, but the combination of the two flags stood out to me.

Oh, and there’s the “emphasised text” left at the end of the final paragraph, not sure if that was an artifact from MOT formatting options in the posting field getting broken by accident, or if it’s a leftover instruction that wasn’t processed properly?

Other posts from the user also follow the current LLM flavour - lots of lists with emojis for dot points. Once you’ve seen it a couple of times, it stands out like a beacon when you next see it.

Look, I could be completely wrong, but I don’t think I am. And I think the ongoing outsourcing of our creativity is worth calling out from a quality perspective - as a potential customer, I am instantly turned away when a product looks like it’s being presented (and thus, potentially put partially together) via AI. It doesn’t matter how good the end product is if the marketing steers people away from even trying it in the first place.

Anywho, just my 2c…

andrewkelly2555 · 15 August 2025 00:50

I’ve started experimenting with playwright mcp, initially as a greybox tool for generating a quick UI automated healthcheck when I don’t have access to code.

It did surprising well on that front, it could generate 20 runnable tests in POM in about five mins but I still recommend it being a hands on guided, directed reviewed activity which will take a bit longer but still way faster than me writing these scripts from scratch.

This also flagged some things worth exploring deeper for example some accessibility issues and I’m wondering now how the llm combo will get on with exploratory test charters as the prompt.

So rather than “using tab only navigate to all contacts on the site and create tests to verify accessible and existence and accuracy of those” to “explore the site using tab only to identify potential related accessibility issues that could require further investigation”.

That hands on guided, directed reviewed activity remains key for me, a bit like how a security tester might use burp suite is how I potentially imagine it but I need to find some time to explore this potential, might even experiment with the hands free approach.

Topic		Replies	Views
AI assisted exploratory testing, how are you doing it? Discussions exploratory-testing , ai	6	350	22 February 2026
AI Agents in Testing - Let's list the ones you have tried Discussions 30-days-of-ai-in-testing , ai-agents	15	1523	25 March 2026
How to get started using AI in the testing space? Discussions ai , testing	7	274	30 April 2025
What do you think about AI features for software testers? Discussions tools , ai	19	1181	11 December 2023
Is it too late to discover the impact AI will have on the testing industry? Discussions tools , ai , career-growth , industry-trends , upskilling	3	374	28 November 2023

Has anyone used an AI agent as a tool to help guide them to areas where they may want to do deeper exploratory testing?

Related topics