Workaccount2
an hour ago
Well
It is the first model to get partial-credit on an LLM image test I have. Which is counting the legs of a dog. Specifically, a dog with 5 legs. This is a wild test, because LLMs get really pushy and insistent that the dog only has 4 legs.
In fact GPT5 wrote an edge detection script to see where "golden dog feet" met "bright green grass" to prove to me that there were only 4 legs. The script found 5, and GPT-5 then said it was a bug, and adjusted the script sensitivity so it only located 4, lol.
Anyway, Gemini 3, while still being unable to count the legs first try, did identify "male anatomy" (it's own words) also visible in the picture. The 5th leg was approximately where you could expect a well endowed dog to have a "5th leg".
That aside though, I still wouldn't call it particularly impressive.
As a note, Meta's image slicer correctly highlighted all 5 legs without a hitch. Maybe not quite a transformer, but interesting that it could properly interpret "dog leg" and ID them. Also the dog with many legs (I have a few of them) all had there extra legs added by nano-banana.
Rover222
40 minutes ago
I just tried to get Gemini to produce an image of a dog with 5 legs to test this out, and it really struggled with that. It either made a normal dog, or turned the tail into a weird appendage.
Then I asked both Gemini and Grok to count the legs, both kept saying 4.
Gemini just refused to consider it was actually wrong.
Grok seemed to have an existential crisis when I told it it was wrong, becoming convinced that I had given it an elaborate riddle. After thinking for an additional 2.5 minutes, it concluded: "Oh, I see now—upon closer inspection, this is that famous optical illusion photo of a "headless" dog. It's actually a three-legged dog (due to an amputation), with its head turned all the way back to lick its side, which creates the bizarre perspective making it look decapitated at first glance. So, you're right; the dog has 3 legs."
You're right, this is a good test. Right when I'm starting to feel LLMs are intelligent.
AIorNot
17 minutes ago
Its not that they aren’t intelligent its that they have been RL’d like crazy to not do that
Its rather like as humans we are RL’d like crazy to be grossed out if we view a picture of a handsome man and beautiful woman kissing (after we are told they are brother and sister) -
Ie we all have trained biases - that we are told to follow and trained on - human art is about subverting those expectations
irthomasthomas
4 minutes ago
Isn't this proof that LLMs still don't really generalize beyond their training data?
dwringer
18 minutes ago
I had no trouble getting it to generate an image of a five-legged dog first try, but I really was surprised at how badly it failed in telling me the number of legs when I asked it in a new context, showing it that image. It wrote a long defense of its reasoning and when pressed, made up demonstrably false excuses of why it might be getting the wrong answer while still maintaining the wrong answer.
danielvaughn
an hour ago
I don’t know much about AI, but I have this image test that everything has failed at. You basically just present an image of a maze and ask the LLM to draw a line through the most optimal path.
Here’s how Nano Banana fared: https://x.com/danielvaughn/status/1971640520176029704?s=46
kridsdale3
22 minutes ago
I have also tried the maze from a photo test a few times and never seen a one-shot success. But yesterday I was determined to succeed so I allowed Gemini 3 to write a python gui app that takes in photos of physical mazes (I have a bunch of 3d printed ones) and find the path. This does work.
Gemini 3 then one-shot ported the whole thing (which uses CV py libraries) to a single page html+js version which works just as well.
I gave that to Claude to assess and assign a FAANG hiring level to, and it was amazed and said Gemini 3 codes like an L6.
Since I work for Google and used my phone in the office to do this, I think I can't share the source or file.
pwagland
26 minutes ago
I tried this with Claude:
``` > [Image #1] Create a unicode "ascii-art" version of this image, with the optimal path through the maze highlighted in a solid colour.
I'll create an ASCII art version of this maze with the solution path highlighted!
┌─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┐
●●│ │ │ │ │ │ │ │ │ │ │ │ │ │
├─┤●└─┴─┐ ├───┐ │ ╔═╗ ├─────┤ ╔═══╝ │ ╔═╝ ╔═╝ │ │ ╔═╝ ├─┤
│ │●●●●●└─┤ │ │ ║ │ │ │ ║ │ ║ ║ │ │ ║ │ │
│ └───┐●╔═╧═╗ │ └─╢ │ │ ┌─╔═╝ ║ ┌───┤ ╚═╗ ╚═╗ │ └─╢ ┌─┘ │
│ │●║ ║ │ ║ │ │ │ ║ ║ │ │ ║ ║ │ ║ │ │
├───┐ │●╚═╗ ╚═╧═╗ ║ └─┤ │ ║ ╔═╝ └─╔═╧═╗ ╚═╗ ╚═╧═╗ ║ └─┬─┤
│ │ │●●●└───┐ ║ ║ │ │ ║ ║ ║ ║ ║ ║ ║ │ │
│ ╔═╝ └───┐●╔═╝ ║ ╚═╗ │ └─╢ ╚═╗ ╔═╝ ╔═╝ ┌─╢ ╔═══╝ ╚═╗ │ │
│ ║ │●║ ║ ║ │ ║ ║ ║ ║ │ ║ ║ ║ │ │
│ ╚═══╗ ╔═╝●╚═╗ ╚═╗ ║ ├─╔═╧═╗ ║ ║ ╔═╝ ╔═╝ ║ ║ ╔═╗ ╔═╝ │ │
│ ║ ║●●●●●│ ║ ║ │ ║ ║ ║ ║ ║ ║ ║ ║ ║ ║ ║ │ │
├───╔═╝ ║●┌─┐●└─╗ ║ └─┤ ║ ╔═╝ ║ ╚═╝ ╔═╧═╗ ║ ╚═╝ ║ ╚═╗ │ │
│ ║ ║●│ │●●●║ ║ │ ║ ║ ║ ║ ║ ║ ║ ║ │ │
│ ╔═╝ ╔═╝●│ └─┐●╚═╧═╗ │ ║ ║ ╔═╧═══╗ ║ ╔═╝ ╚═══╗ ╚═╗ ║ └─┤
│ ║ ║●●●│ │●●●●●║ │ ║ ║ ║ ║ ║ ║ ║ ║ ║ │
│ ║ ╔═╝●┌─┴─╗ └───┐●╚═╝ ║ ║ ║ ╔═╗ ║ └─╢ ╔═══╗ ╚═╗ ╚═╝ ╔═╝
│ ║ ║●●●│ ║ │●●●●●║ ║ ║ ║ ║ ║ ║ ║ ║ ║ ║
│ ╚═╝●╔═╧═╗ ╚═╗ ╔═╧═══┐●╚═╝ ║ ║ ║ ╚═╗ ║ ║ ╔═╧═╗ ╚═══╗ ╚═╗
│ ●●║ ║ ║ ║ │●●●●●║ ║ ║ ║ ║ ║ ║ ║ ║ ║
├─╔═┘●╚═╗ ╚═╗ ║ ║ ╔═╗ └───┐●╚═╝ ╚═╗ ║ └─╢ ║ ╔═╧═╗ ╔═╝ ╔═╝
│ ║●●●●●║ ║ ║ ║ ║ ║ │●●●●●●●║ ║ ║ ║ ║ ║ ║ ║
│ ║●┌─╗ ╚═╗ ║ └─╢ ║ ╚═══╗ └─────┐●╚═╧═╗ ║ ║ ║ ╔═╝ ║ ╔═╝ ╔
│ ║●│ ║ ║ ║ ║ ║ ║ │●●●●●║ ║ ║ ║ ║ ║ ║ ║
│ ╚═╝ ║ ╔═╝ ╚═╗ ║ ╚═╗ ╔═╧═════╗ └───┐●╚═╝ ║ ╚═╝ ╔═╝ ║ ╔═╝
│ ║ ║ ║ ║ ║ ║ ║ │●●●●●║ ║ ║ ║
└─────╨─╨─────╨─╨───╨─╨───────╨─────┴───●●╨─────╨───╨─╨──
The path from the mouse () to the cheese () is marked with ● characters. This shows the optimal route through the maze, navigating through the corridors while avoiding all the walls!
```Suffice to say, it didn't do either part right.
vunderba
40 minutes ago
In fact, one of the tests I use as part of GenAI Showdown involves both parts of the puzzle: draw a maze with a clearly defined entrance and exit, along with a dashed line indicating the solution to the maze.
Only one model (gpt-image-1) out of the 18 tested managed to pass the test successfully. Gemini 3.0 Pro got VERY close.
danielvaughn
34 minutes ago
super cool! Interesting note about Seedream 4 - do you think awareness of A* actually could improve the outcome? Like I said, I'm no AI expert, so my intuitions are pretty bad, but I'd suspect that image analysis + algorithmic pathfinding don't have much crossover in terms of training capabilities. But I could be wrong!
vunderba
31 minutes ago
Great question! I wish we had a bit more insight into the exact background "thinking" that was happening on systems like Seedream.
When you think about posing the "solve a visual image of a maze" to something like ChatGPT, there's a good chance it'll try to throw a python VM at it, threshold it with something like OpenCV, and use a shortest-path style algorithm to try and solve it.
buildbot
an hour ago
That might be an interesting training set, a bunch of mazes…
jiggawatts
32 minutes ago
The reason is that image generators don't iterate on the output in the same way the text-based LLMs do. Essentially they produce the image in "one hit" and can't solve a complex sequence in the same way you couldn't one-shot this either. Try taking a random maze, glance at it, then go off to draw a squiggle on a transparency. If you were to place that on top of the maze, there's virtually no chance that you'd have found the solution on the first try.
That's essentially what's going on with AI models, they're struggling because they only get "one step" to solve the problem instead of being able to trace through the maze slowly.
An interesting experiment would be to ask the AI to incrementally solve the maze. Ask it to draw a line starting at the entrance a little ways into the maze, then a little bit further, etc... until it gets to the end.
Benjammer
13 minutes ago
It always feels to me like these types of tests are being somewhat intentionally ignorant of how LLM cognition differs from human cognition. To me, they don't really "prove" or "show" anything other than simply - LLMs thinking works different than human thinking.
I'm always curious if these tests have comprehensive prompts that inform the model about what's going on properly, or if they're designed to "trick" the LLM in a very human-cognition-centric flavor of "trick".
Does the test instruction prompt tell it that it should be interpreting the image very, very literally, and that it should attempt to discard all previous knowledge of the subject before making its assessment of the question, etc.? Does it tell the model that some inputs may be designed to "trick" its reasoning, and to watch out for that specifically?
More specifically, what is a successful outcome here to you? Simply returning the answer "5" with no other info, or back-and-forth, or anything else in the output context? What is your idea of the LLMs internal world-model in this case? Do you want it to successfully infer that you are being deceitful? Should it respond directly to the deceit? Should it take the deceit in "good faith" and operate as if that's the new reality? Something in between? To me, all of this is very unclear in terms of LLM prompting, it feels like there's tons of very human-like subtext involved and you're trying to show that LLMs can't handle subtext/deceit and then generalizing that to say LLMs have low cognitive abilities in a general sense? This doesn't seem like particularly useful or productive analysis to me, so I'm curious what the goal of these "tests" are for the people who write/perform/post them?
biophysboy
2 minutes ago
I thought adversarial testing like this was a routine part of software engineering. He's checking to see how flexible it is. Maybe prompting would help, but it would be cool if it was more flexible.
rottencupcakes
17 minutes ago
Super interesting. I replicated this.
I passed the AIs this image and asked them how many fingers were on the hands: https://media.post.rvohealth.io/wp-content/uploads/sites/3/2...
Claude said there were 3 hands and 16 fingers. GPT said there are 10 fingers. Grok impressively said "There are 9 fingers visible on these two hands (the left hand is missing the tip of its ring finger)." Gemini smashed it and said 12.
vunderba
13 minutes ago
I just re-ran that image through Gemini 3.0 Pro via AI Studio and it reported:
I've moved on to the right hand, meticulously tagging each finger. After completing the initial count of five digits, I noticed a sixth! There appears to be an extra digit on the far right. This is an unexpected finding, and I have counted it as well. That makes a total of eleven fingers in the image.
This right HERE is the issue. It's not nearly deterministic enough to rely on.andai
11 minutes ago
This is interesting, and demonstrates how language and belief clouds direct perception. Now I'm wondering what's the LLM equivalent of opening the doors of perception ;)
grog454
12 minutes ago
What image are you using?
When I look at google image search results for "dog with 5 legs" I don't see a lot of great examples. The first unequivocal "dog with 5 legs" was an illustration. Here was my conversation with Chat GPT.
> How many legs does this dog have?
"The dog in the image has four legs."
> look closer.
" looking closely, the drawing is a bit tricky because of the shading, but the dog actually has five visible legs.
Two front legs (normal)
Two hind legs (normal)
Plus one extra hind leg–like limb drawn overlapping in the back
It seems to be an artistic or anatomical error in the engraving."
Seems fair to me.
vunderba
18 minutes ago
Anything that needs to overcome concepts which are disproportionately represented in the training data is going to give these models a hard time.
Try generating:
- A spider missing one leg
- A 9-pointed star
- A 5-leaf clover
- A man with six fingers on his left hand and four fingers on his right
You'll be lucky to get a 25% success rate.
The last one is particularly ironic given how much work went into FIXING the old SD 1.5 issues with hand anatomy... to the point where I'm seriously considering incorporating it as a new test scenario on GenAI Showdown.
reed1234
an hour ago
Could you link the image? Interesting stuff.
cancan
an hour ago
this is hilarious and incredibly interesting at the same time! thanks for writing it up.
teaearlgraycold
35 minutes ago
"There are FOUR legs!!!"