AutoExplore is now live!

AutoExplore is now Live!

Hi Ministry of Testing Club,

I’m part of a small team building AutoExplore, an “always-on” autonomous exploratory testing tool for web apps. We’re sharing here because we’d genuinely like feedback from people who do testing for real applications.

The problem we’re trying to solve

Our hypothesis: a bot that continuously explores your app like a curious user can surface regressions and “weird edges” earlier, and reduce the gap between changes shipping and issues being noticed.

What AutoExplore does today (and what it doesn’t)

What it does:

  • Runs in a real browser against your staging/production environment
  • Explores the UI (clicks, navigates, fills forms) without predefined test cases
  • Produces findings with reproduction steps, screenshots, and a timeline of what happened
  • Includes built-in accessibility checks while exploring
  • Includes safe, non-destructive security scanning as part of the run
  • Shows “visual coverage” (what the agent interacted with) to help you understand what it actually touched

What it doesn’t do (yet):

  • Replace a human tester’s intent, judgment, or domain knowledge
  • Guarantee meaningful coverage of business-critical paths without guidance
  • Magically eliminate noise; we’re actively learning what signals matter and how to present them

Where we need your help

If you’ve used (or evaluated) autonomous testing / crawling / synthetic monitoring tools — or if you’ve avoided them on purpose — I’d love your perspective on any of these:

  1. Where would this fit in your workflow?

  2. What makes a finding actionable vs. annoying?

  3. How do you measure value for something exploratory?

  4. What would you want to control?

  5. Where do you expect this to fail?

If you’re willing, sharing your context helps a lot:

  • app type (B2B/B2C), complexity, release cadence
  • current mix of test automation/manual testing/monitoring
  • the biggest category of bugs you wish you caught earlier

If you want to try it (optional)

If you’d like to put it against a staging / test environment and tell us what’s missing, there’s a free trial for 7 days:

And if you’d rather just poke at the idea and critique it (no signup), that’s equally valuable — please reply here with thoughts, concerns, or “this will never work because…”.

Thanks for reading — we’ll take all feedback seriously and we’re happy to share what we learn as we iterate.

2 Likes

I think its a really interesting area.

I’ve been experimenting with playwright’s agents that also dynamically navigate the browser so I can share my early thoughts that may be worth a comparison.

You can give these oracles and heuristics to specifically look for. For example you could ask it to explore for accessibility issues where it would use both scanners and navigation, include keyboard navigation flows, screen reader ability, zooms, contrasts etc, etc.

So it becomes guided exploration of a website to an extent, once this is set up and you find a value point you could likely turn that into a specific accessibility agent and just change the target url. I have not done that step yet. I sort of want to get it to a level of a starting point of AA compliance check for example. This approach can be extended to other risks in the same way.

It does find problems, these problems tend to be fairly generic though, things a scan would pick up are the basic level testing, the navigation approach can pick up things for example no confirmation message displayed to a user on an action that by the standards should have.

Are the issue found fairly shallow level, like an automated e2e test level or can it once it finds an issue run its own experiments and do deeper testing? This I have not seen yet so would be interesting if this tool did.

Then you have context awareness, how are you feeding in context and requirements so it can explore with those as guides? Out the box a lot of tools will not have that context awareness so it tends to get stuck at the generic known very well risk level, that is not entirely a bad thing as there are often a lot of generic issues in apps.

You also have the consideration of what do you lose out on, loads of human learning happens when you explore an app, bio/wet brains have thousands of times more receptors running than the machines currently. Is losing that learning worth the risk of letting a tool explore?

What is not there? This can be addressed by feeding in requirements but it not usually out of the box.

Unknowns and wtf’s - how good is it at picking this up, the dancing gorrilla in the background that should not be there for example. Real world issues in real world context.

Cost and time is another angle, I found in many case I could find things quicker than the agents running, the costs is not something I have evaluated.

How good is at testing, are the tests any good - usual measure is if a risk of a specific type exists this will give the best opportunity to find that risk that would not be found efficiently otherwise. This is likely where you look at measuring value.

So a few points.

Shallow or deep testing? Generic issues or context aware issues only? Costs and value vs other approaches. How well does it use oracles and heuristics its given?

The big one remains for me is what do we lose by having machines trying to simulate a professional human investigative tester? Note I also see waste in humans trying to simulate mechanical testing models like following test cases and scripted testing but this is potentially the opposite using machines to inefficiently simulate human wet biological brain strengths on products that are very specifically designed for a bio brain environment?

Even if we can do this, should we?

1 Like

Thank you for your honest feedback and thoughts!

Are the issue found fairly shallow level, like an automated e2e test level or can it once it finds an issue run its own experiments and do deeper testing? This I have not seen yet so would be interesting if this tool did.

Yeah, this is something we are investing our engineering efforts currently. We are trying to push the agent to reason about the service under test by exploring and trying out stuff.

Then you have context awareness, how are you feeding in context and requirements so it can explore with those as guides?

We have discussed this with our customers, there have been requests to tell the agent how the application works. However, we have not implemented this at least yet, as we are trying to make it behave like “human”. So rather than telling it how it works, it should be able to figure it out by itself.

You also have the consideration of what do you lose out on, loads of human learning happens when you explore an app, bio/wet brains have thousands of times more receptors running than the machines currently. Is losing that learning worth the risk of letting a tool explore?

Yep exactly this.

The big one remains for me is what do we lose by having machines trying to simulate a professional human investigative tester?

From my own personal experience, probably not much as people are so busy they don’t have time to do it anyway. Should they do it? Maybe… One still needs human to stay in the loop for decision making for findings.