wariatus
a month ago
Have you tried to equip those agents with an access to grounded vision model to analyse that image?
In my experience most models can’t understand such imput properly
I am now experimenting with Molmo2 and it looks promising
a month ago
Have you tried to equip those agents with an access to grounded vision model to analyse that image?
In my experience most models can’t understand such imput properly
I am now experimenting with Molmo2 and it looks promising