Oh this is such an interesting thing to think about!
My first reaction is that I would treat it just like any other browser and begin officially supporting once it reaches some baseline usage %. But… I think there’s more to it in these cases bc of the way these browsers will interact with websites.
It seems similar to testing a feature that has some functionality around sharing something to facebook et al - testing the way the thing is being shared on those platforms is part of the work (even though you don’t own those surfaces)
My initial reaction was that this is a nothingburger, but on reflection I concluded that some testing would indeed be valid. However, my conclusion further down may not be what you expect. ChatGPT itself says:
“ChatGPT Agent’s built-in web browsers use a custom, sandboxed rendering engine developed by OpenAI rather than standard browsers like Chromium, WebKit, or Gecko. The Visual Browser (which can click, scroll, and interact with pages) runs inside a secure virtual machine and is designed to simulate a real browser’s behavior …”
Diagnosis and fixing
The agent’s web browser is going to be a black box, like embedded Java applets and Flash movies (showing my age - you youngsters should ask your grandparents what they are). It’s not going to have a DOM Inspector, so even if you see an issue you are not going to be able to investigate and diagnose it.
It’s very unlikely you will be able to fix any issues unless ChatGPT publish a list of HTML, CSS or JavaScript properties or coding patterns it doesn’t handle correctly. I think there is zero chance of them doing that.
Good accessibility will be important
The agent needs to be able to identify and interact with interactive controls in websites. The best way to ensure this happens is to make the website accessible. If you are already doing this properly (which extremely few companies do) everything should be fine. If you are not, you will be reliant on the agent’s visual heuristics, which are a lottery.
In my view, it would be far better to do high quality accessibility testing in a normal browser rather than trying to test your website in the agent where you can’t see any code and you can’t use any tools. The problem is that there are no more than a couple of dozen people in the UK capable of doing high quality accessibility testing. ChatGPT said the following:
"ChatGPT Agent’s web browser identifies interactive components using a combination of DOM analysis, accessibility cues, and visual heuristics within its controlled environment. Here’s how it generally identifies interactive components:
DOM Element Types
It scans the page’s HTML structure to find standard interactive elements like <button>, <a> (links), <input>, <select>, <textarea>, and other form controls.
These elements are inherently interactive by default.
ARIA Attributes and Accessibility Roles
The agent looks for ARIA roles and properties such as role="button", role="link", aria-haspopup, aria-expanded, and aria-pressed.
These help identify interactive widgets implemented with custom markup or JavaScript.
Event Listeners
It detects elements with attached event listeners (like onclick, onkeydown) which indicate interactivity even if the element isn’t a traditional interactive tag.
This helps detect custom buttons or clickable divs/spans.
CSS Styles and Visual Cues
The browser analyzes styles like cursor: pointer, tabindex attributes, or focusable elements to infer interactiveness.
Elements visibly styled as buttons or links can be flagged as interactive.
Tab Order and Keyboard Navigability
It uses the tab sequence (tabindex attribute and natural tab order) to identify elements that users can interact with via keyboard, reinforcing their interactivity.
Heuristics and AI-based Inference
Since it’s part of an AI agent, the system can apply heuristics or learned models to identify interactive patterns beyond static rules, helping with complex or dynamically generated interfaces."