I ran through the experiments.
It has the mcp sort of built in however it does not always trigger, I needed to change lm to get it create the test plan based on actual browsing.
Nothing really overcomplicated with them, you can do all three just using the lm ai but nice to have them already configured with templates etc. I did not check if it does POM but I believe it should if its told to use that. Perhaps before running agents get the AI to add ID’s to all elements and create the POM structure for it to use.
The following carries my bias, time on automation takes me away from testing, keep UI automation very light and my context is small to medium sized web apps so I’m not looking for it to scale massively or do anything overly complex.
You can probably wrap the agents to talk to each other.
Combining code based test script generation with mcp browser based may allow increased edge case and efficiency plus real behaviour coverage.
50 basic tests running successfully in an hour or so could accelerate a starting point into automation for a new product. Even if its value lifetime is only two weeks you can throw the lot away and regenerate every couple of weeks.
On the face of it getting basic automation coverage for a few hours a week seems reasonable to me.
Questions outstanding for me.
Are the test cases good and provide value, tbd. Note it did have accessibility and some performance checks in there so potentially for the basics it could be good enough.
Can it do deep tests or just fairly shallow coverage?
Self healing - use with care. Example an accessibility test fails as two errors are found, self healing could set error count to two and test will pass, an easy always green model. If the agents are running self healing in your CI you will still need a review of what it found and what it healed.
Does it scale, does it do complex features - tbd
Do you learn, not so much in my view.
Should automators be worried? I suspect this is related to the scale and complexity level question but guided with critical thinking and reviews are still required.
If you currently have no automation on your web app, this could be an easy way to get started.
I’ve no idea how effective it would be on a complex app with years of automation already there, my bias would likely be throwing that away anyway unless it was catching useful things.
It’s likely a stepping stone to something new, these experiment steps are good though.
keep Humans At The Helm