Anyone exploring agentic pentesting for web apps and APIs yet?

I’ve been spending some time recently testing the alpha version of an agentic pentesting setup we’ve been developing internally, and it’s been an interesting shift from the usual automated scanning approach.

One thing that stood out early is how much effort typically goes into validating false positives from traditional scanners. With an agent-driven model, the system attempts to verify findings before surfacing them, which has noticeably reduced that noise in my testing flow so far.

It’s still early, and I don’t see it replacing manual testing anytime soon, especially for logic gaps that AI is certainly incapable of analyzing. But it does feel like a practical step toward making automated testing more reliable and helpful.

I’m curious if anyone else here has started experimenting with agentic workflows or similar approaches. Are you seeing real value with the current tools in the market?

We have been building autonomous testing service called AutoExplore for 2years now.

We have same experience with false positives, we also realized it is actually impossible to fully get rid off them. Our latest idea is to enrich the execution context and observed facts with source code information, that should reduce false positives dramatically.

Ref: AutoExplore AI Integration: Exporting Findings for Better Root Cause Analysis

Autonomously testing applications AI driven ways most likely won’t replace manual testing nor automated test scripts, its something else entirely to find new information

One of my colleagues has been looking at GitHub - KeygraphHQ/shannon: Shannon is an autonomous, white-box AI pentester for web applications and APIs. It analyzes your source code, identifies attack vectors, and executes real exploits to prove vulnerabilities before they reach production. · GitHub , his view was its pretty good, once configured almost one click level.

It needs source code access and you need to make sure you are on a cannot do harm environment.

He is a pen tester and he was saying it was catching things he would potentially have missed.

The interesting part for me is he gave the impression there is a level of critical analysis between loops so the concept of run a risk experiment, take the results of that and design the next experiment. I have not been able to confirm that level of loop directly but if this is your thing I’d like feedback specifically on that, one general so called exploratory tools I have not seen that yet.

False positives, I am not sure sure on this tool but more general tools like accessibility I was getting a lot, it was often this could be an issue but in a different context than we were working but it was too many that it became a distraction for the developer with me filtering them.

At this point I am very much of the view of humans at the helm and not just in the loop but lets see if there are tools that can prove me wrong consistently.

I agree with the false positive point. A lot of my time used to go into validating scanner output before I could even start digging into the interesting parts of a test.

I’ve been trying a few agentic tools lately, including XBOW, ZeroThreat AI, and HexStrike. What I’ve noticed is that they’re generally better at filtering noise and giving me a cleaner starting point than traditional scanners.

I still wouldn’t trust them with business logic testing, but they’ve definitely reduced the amount of repetitive work in my workflow. That’s been the biggest win for me so far.