Klaster_1
3 days ago
This is the direction I'd love UI automation to move towards, congrats on bringing it closer. Gonna try and see how it works for me. I have a couple of questions:
1. How's reproducibility of actions, is it flaky?
2. How does it perform under adversarial conditions, such as slow network and high CPU load? With current crop of frameworks, you have to write tests defensively.
3. Any plans for visual regression integration? I'd love to have a smart VR tool that doesn't fail because a Linux machine CI renders fonts differently than Windows. None of the existing image comparison libraries are robust enough.
gooru
3 days ago
Thanks for reading and sharing your inputs.
1. To answer this, I will try to record a video this week. It's not a short answer because, if you’ve heard of Anthropic's computer-use, browser-use, or OpenAI's operator, this takes a slightly improved approach. It was demonstrated by Playwright MCP, which leverages the Accessibility Tree. In my testing, this worked well for a very clunky web app with many Shadow DOMs and iframes.
2. There is a built-in mechanism to deal with this. Once I hear more feedback on its performance, I can make improvements.
3. Right now, it does not use vision. The reason I didn't add vision in v1 is that I wanted to run tests with lower token costs and prove it works equally well without vision. I plan to add vision as an option for those who don’t mind the token cost.
*Summary*: This is a lightweight library, making it easy to adjust and improve.