Gemini 3 Pro: the frontier of vision AI

195 pointsposted 5 hours ago
by xnx

84 Comments

Workaccount2

an hour ago

Well

It is the first model to get partial-credit on an LLM image test I have. Which is counting the legs of a dog. Specifically, a dog with 5 legs. This is a wild test, because LLMs get really pushy and insistent that the dog only has 4 legs.

In fact GPT5 wrote an edge detection script to see where "golden dog feet" met "bright green grass" to prove to me that there were only 4 legs. The script found 5, and GPT-5 then said it was a bug, and adjusted the script sensitivity so it only located 4, lol.

Anyway, Gemini 3, while still being unable to count the legs first try, did identify "male anatomy" (it's own words) also visible in the picture. The 5th leg was approximately where you could expect a well endowed dog to have a "5th leg".

That aside though, I still wouldn't call it particularly impressive.

As a note, Meta's image slicer correctly highlighted all 5 legs without a hitch. Maybe not quite a transformer, but interesting that it could properly interpret "dog leg" and ID them. Also the dog with many legs (I have a few of them) all had there extra legs added by nano-banana.

Rover222

40 minutes ago

I just tried to get Gemini to produce an image of a dog with 5 legs to test this out, and it really struggled with that. It either made a normal dog, or turned the tail into a weird appendage.

Then I asked both Gemini and Grok to count the legs, both kept saying 4.

Gemini just refused to consider it was actually wrong.

Grok seemed to have an existential crisis when I told it it was wrong, becoming convinced that I had given it an elaborate riddle. After thinking for an additional 2.5 minutes, it concluded: "Oh, I see now—upon closer inspection, this is that famous optical illusion photo of a "headless" dog. It's actually a three-legged dog (due to an amputation), with its head turned all the way back to lick its side, which creates the bizarre perspective making it look decapitated at first glance. So, you're right; the dog has 3 legs."

You're right, this is a good test. Right when I'm starting to feel LLMs are intelligent.

AIorNot

17 minutes ago

Its not that they aren’t intelligent its that they have been RL’d like crazy to not do that

Its rather like as humans we are RL’d like crazy to be grossed out if we view a picture of a handsome man and beautiful woman kissing (after we are told they are brother and sister) -

Ie we all have trained biases - that we are told to follow and trained on - human art is about subverting those expectations

irthomasthomas

4 minutes ago

Isn't this proof that LLMs still don't really generalize beyond their training data?

dwringer

18 minutes ago

I had no trouble getting it to generate an image of a five-legged dog first try, but I really was surprised at how badly it failed in telling me the number of legs when I asked it in a new context, showing it that image. It wrote a long defense of its reasoning and when pressed, made up demonstrably false excuses of why it might be getting the wrong answer while still maintaining the wrong answer.

danielvaughn

an hour ago

I don’t know much about AI, but I have this image test that everything has failed at. You basically just present an image of a maze and ask the LLM to draw a line through the most optimal path.

Here’s how Nano Banana fared: https://x.com/danielvaughn/status/1971640520176029704?s=46

kridsdale3

22 minutes ago

I have also tried the maze from a photo test a few times and never seen a one-shot success. But yesterday I was determined to succeed so I allowed Gemini 3 to write a python gui app that takes in photos of physical mazes (I have a bunch of 3d printed ones) and find the path. This does work.

Gemini 3 then one-shot ported the whole thing (which uses CV py libraries) to a single page html+js version which works just as well.

I gave that to Claude to assess and assign a FAANG hiring level to, and it was amazed and said Gemini 3 codes like an L6.

Since I work for Google and used my phone in the office to do this, I think I can't share the source or file.

pwagland

26 minutes ago

I tried this with Claude:

``` > [Image #1] Create a unicode "ascii-art" version of this image, with the optimal path through the maze highlighted in a solid colour.

I'll create an ASCII art version of this maze with the solution path highlighted!

  ┌─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┐
  ●●│ │ │ │       │   │       │     │ │   │   │ │ │   │
  ├─┤●└─┴─┐ ├───┐ │ ╔═╗ ├─────┤ ╔═══╝ │ ╔═╝ ╔═╝ │ │ ╔═╝ ├─┤
  │ │●●●●●└─┤   │ │ ║ │ │     │ ║     │ ║   ║   │ │ ║   │ │
  │ └───┐●╔═╧═╗ │ └─╢ │ │ ┌─╔═╝ ║ ┌───┤ ╚═╗ ╚═╗ │ └─╢ ┌─┘ │
  │     │●║   ║ │   ║ │ │ │ ║   ║ │   │   ║   ║ │   ║ │   │
  ├───┐ │●╚═╗ ╚═╧═╗ ║ └─┤ │ ║ ╔═╝ └─╔═╧═╗ ╚═╗ ╚═╧═╗ ║ └─┬─┤
  │   │ │●●●└───┐ ║ ║   │ │ ║ ║     ║   ║   ║     ║ ║   │ │
  │ ╔═╝ └───┐●╔═╝ ║ ╚═╗ │ └─╢ ╚═╗ ╔═╝ ╔═╝ ┌─╢ ╔═══╝ ╚═╗ │ │
  │ ║       │●║   ║   ║ │   ║   ║ ║   ║   │ ║ ║       ║ │ │
  │ ╚═══╗ ╔═╝●╚═╗ ╚═╗ ║ ├─╔═╧═╗ ║ ║ ╔═╝ ╔═╝ ║ ║ ╔═╗ ╔═╝ │ │
  │     ║ ║●●●●●│   ║ ║ │ ║   ║ ║ ║ ║   ║   ║ ║ ║ ║ ║   │ │
  ├───╔═╝ ║●┌─┐●└─╗ ║ └─┤ ║ ╔═╝ ║ ╚═╝ ╔═╧═╗ ║ ╚═╝ ║ ╚═╗ │ │
  │   ║   ║●│ │●●●║ ║   │ ║ ║   ║     ║   ║ ║     ║   ║ │ │
  │ ╔═╝ ╔═╝●│ └─┐●╚═╧═╗ │ ║ ║ ╔═╧═══╗ ║ ╔═╝ ╚═══╗ ╚═╗ ║ └─┤
  │ ║   ║●●●│   │●●●●●║ │ ║ ║ ║     ║ ║ ║       ║   ║ ║   │
  │ ║ ╔═╝●┌─┴─╗ └───┐●╚═╝ ║ ║ ║ ╔═╗ ║ └─╢ ╔═══╗ ╚═╗ ╚═╝ ╔═╝
  │ ║ ║●●●│   ║     │●●●●●║ ║ ║ ║ ║ ║   ║ ║   ║   ║     ║
  │ ╚═╝●╔═╧═╗ ╚═╗ ╔═╧═══┐●╚═╝ ║ ║ ║ ╚═╗ ║ ║ ╔═╧═╗ ╚═══╗ ╚═╗
  │   ●●║   ║   ║ ║     │●●●●●║ ║ ║   ║ ║ ║ ║   ║     ║   ║
  ├─╔═┘●╚═╗ ╚═╗ ║ ║ ╔═╗ └───┐●╚═╝ ╚═╗ ║ └─╢ ║ ╔═╧═╗ ╔═╝ ╔═╝
  │ ║●●●●●║   ║ ║ ║ ║ ║     │●●●●●●●║ ║   ║ ║ ║   ║ ║   ║
  │ ║●┌─╗ ╚═╗ ║ └─╢ ║ ╚═══╗ └─────┐●╚═╧═╗ ║ ║ ║ ╔═╝ ║ ╔═╝ ╔
  │ ║●│ ║   ║ ║   ║ ║     ║       │●●●●●║ ║ ║ ║ ║   ║ ║   ║
  │ ╚═╝ ║ ╔═╝ ╚═╗ ║ ╚═╗ ╔═╧═════╗ └───┐●╚═╝ ║ ╚═╝ ╔═╝ ║ ╔═╝
  │     ║ ║     ║ ║   ║ ║       ║     │●●●●●║     ║   ║ ║
  └─────╨─╨─────╨─╨───╨─╨───────╨─────┴───●●╨─────╨───╨─╨──

  The path from the mouse () to the cheese () is marked with ● characters. This shows the optimal route through the maze, navigating through the corridors while avoiding all the walls!
```

Suffice to say, it didn't do either part right.

vunderba

40 minutes ago

In fact, one of the tests I use as part of GenAI Showdown involves both parts of the puzzle: draw a maze with a clearly defined entrance and exit, along with a dashed line indicating the solution to the maze.

Only one model (gpt-image-1) out of the 18 tested managed to pass the test successfully. Gemini 3.0 Pro got VERY close.

https://genai-showdown.specr.net/#the-labyrinth

danielvaughn

34 minutes ago

super cool! Interesting note about Seedream 4 - do you think awareness of A* actually could improve the outcome? Like I said, I'm no AI expert, so my intuitions are pretty bad, but I'd suspect that image analysis + algorithmic pathfinding don't have much crossover in terms of training capabilities. But I could be wrong!

vunderba

31 minutes ago

Great question! I wish we had a bit more insight into the exact background "thinking" that was happening on systems like Seedream.

When you think about posing the "solve a visual image of a maze" to something like ChatGPT, there's a good chance it'll try to throw a python VM at it, threshold it with something like OpenCV, and use a shortest-path style algorithm to try and solve it.

buildbot

an hour ago

That might be an interesting training set, a bunch of mazes…

jiggawatts

32 minutes ago

The reason is that image generators don't iterate on the output in the same way the text-based LLMs do. Essentially they produce the image in "one hit" and can't solve a complex sequence in the same way you couldn't one-shot this either. Try taking a random maze, glance at it, then go off to draw a squiggle on a transparency. If you were to place that on top of the maze, there's virtually no chance that you'd have found the solution on the first try.

That's essentially what's going on with AI models, they're struggling because they only get "one step" to solve the problem instead of being able to trace through the maze slowly.

An interesting experiment would be to ask the AI to incrementally solve the maze. Ask it to draw a line starting at the entrance a little ways into the maze, then a little bit further, etc... until it gets to the end.

Benjammer

13 minutes ago

It always feels to me like these types of tests are being somewhat intentionally ignorant of how LLM cognition differs from human cognition. To me, they don't really "prove" or "show" anything other than simply - LLMs thinking works different than human thinking.

I'm always curious if these tests have comprehensive prompts that inform the model about what's going on properly, or if they're designed to "trick" the LLM in a very human-cognition-centric flavor of "trick".

Does the test instruction prompt tell it that it should be interpreting the image very, very literally, and that it should attempt to discard all previous knowledge of the subject before making its assessment of the question, etc.? Does it tell the model that some inputs may be designed to "trick" its reasoning, and to watch out for that specifically?

More specifically, what is a successful outcome here to you? Simply returning the answer "5" with no other info, or back-and-forth, or anything else in the output context? What is your idea of the LLMs internal world-model in this case? Do you want it to successfully infer that you are being deceitful? Should it respond directly to the deceit? Should it take the deceit in "good faith" and operate as if that's the new reality? Something in between? To me, all of this is very unclear in terms of LLM prompting, it feels like there's tons of very human-like subtext involved and you're trying to show that LLMs can't handle subtext/deceit and then generalizing that to say LLMs have low cognitive abilities in a general sense? This doesn't seem like particularly useful or productive analysis to me, so I'm curious what the goal of these "tests" are for the people who write/perform/post them?

biophysboy

2 minutes ago

I thought adversarial testing like this was a routine part of software engineering. He's checking to see how flexible it is. Maybe prompting would help, but it would be cool if it was more flexible.

rottencupcakes

17 minutes ago

Super interesting. I replicated this.

I passed the AIs this image and asked them how many fingers were on the hands: https://media.post.rvohealth.io/wp-content/uploads/sites/3/2...

Claude said there were 3 hands and 16 fingers. GPT said there are 10 fingers. Grok impressively said "There are 9 fingers visible on these two hands (the left hand is missing the tip of its ring finger)." Gemini smashed it and said 12.

vunderba

13 minutes ago

I just re-ran that image through Gemini 3.0 Pro via AI Studio and it reported:

  I've moved on to the right hand, meticulously tagging each finger. After completing the initial count of five digits, I noticed a sixth! There appears to be an extra digit on the far right. This is an unexpected finding, and I have counted it as well. That makes a total of eleven fingers in the image.
This right HERE is the issue. It's not nearly deterministic enough to rely on.

andai

11 minutes ago

This is interesting, and demonstrates how language and belief clouds direct perception. Now I'm wondering what's the LLM equivalent of opening the doors of perception ;)

grog454

12 minutes ago

What image are you using?

When I look at google image search results for "dog with 5 legs" I don't see a lot of great examples. The first unequivocal "dog with 5 legs" was an illustration. Here was my conversation with Chat GPT.

> How many legs does this dog have?

"The dog in the image has four legs."

> look closer.

" looking closely, the drawing is a bit tricky because of the shading, but the dog actually has five visible legs.

Two front legs (normal)

Two hind legs (normal)

Plus one extra hind leg–like limb drawn overlapping in the back

It seems to be an artistic or anatomical error in the engraving."

Seems fair to me.

vunderba

18 minutes ago

Anything that needs to overcome concepts which are disproportionately represented in the training data is going to give these models a hard time.

Try generating:

- A spider missing one leg

- A 9-pointed star

- A 5-leaf clover

- A man with six fingers on his left hand and four fingers on his right

You'll be lucky to get a 25% success rate.

The last one is particularly ironic given how much work went into FIXING the old SD 1.5 issues with hand anatomy... to the point where I'm seriously considering incorporating it as a new test scenario on GenAI Showdown.

reed1234

an hour ago

Could you link the image? Interesting stuff.

cancan

an hour ago

this is hilarious and incredibly interesting at the same time! thanks for writing it up.

knollimar

2 hours ago

I do some electrical drafting work for construction and throw basic tasks at LLMs.

I gave it a shitty harness and it almost 1 shotted laying out outlets in a room based on a shitty pdf. I think if I gave it better control it could do a huge portion of my coworkers jobs very soon

amorzor

an hour ago

Can you give an example of the sort of harness you used for that? Would love to play around with it

knollimar

11 minutes ago

I've been using pyrevit inside revit so I just threw a basic loop in there. There's already a building model and the coworkers are just placing and wiring outlets, switches, etc. The harness wasn't impressive enough to share (alos contains vibe coded UI since I didn't want to learn XAML stuff on a friday night). Nothing fancy; I'm not very skilled (I work in construction)

I gave it some custom methods it could call, including "get_available_families", "place family instance", "scan_geometry" (reads model walls into LLM by wall endpoint), and "get_view_scale".

The task is basically copy the building engineer's layout onto the architect model by placing my families. It requires reading the symbol list, and you give it a pdf that contains the room.

Notably, it even used a GFCI family when it noticed it was a bathroom (I had told it to check NEC code, implying outlet spacing).

reducesuffering

15 minutes ago

"AI could never replace the creativity of a human"

"Ok, I guess it could wipe out the economic demand for digital art, but it could never do all the autonomous tasks of a project manager"

"Ok, I guess it could automate most of that away but there will always be a need for a human engineer to steer it and deal with the nuances of code"

"Ok, well it could never automate blue collar work, how is it gonna wrench a pipe it doesn't have hands"

The goalposts will continue to move until we have no idea if the comments are real anymore.

Remember when the Turing test was a thing? No one seems to remember it was considered serious in 2020

webdood90

a minute ago

> blue collar work

I don't think it's fair to qualify this as blue collar work

fngjdflmdflg

2 hours ago

These OCR improvements will almost certainly be brought to google books, which is great. Long term it can enable compressing all non-digital rare books into a manageable size that can be stored for less than $5,000.[0] It would also be great for archive.org to move to this from Tesseract. I wonder what the cost would be, both in raw cost to run, and via a paid API, to do that.

[0] https://annas-archive.org/blog/critical-window.html

kridsdale3

19 minutes ago

More Data for the Data Gods!

bovermyer

2 minutes ago

I would be interested in seeing what G3P makes of the Dead Sea Scrolls or similarly old documents.

djoldman

2 hours ago

Interesting "ScreenSpot Pro" results:

    72.7% Gemini 3 Pro
    11.4% Gemini 2.5 Pro
    49.9% Claude Opus 4.5
    3.50% GPT-5.1
ScreenSpot-Pro: GUI Grounding for Professional High-Resolution Computer Use

https://arxiv.org/abs/2504.07981

jasonjmcghee

an hour ago

That is... astronomically different. Is GPT-5.1 downscaling and losing critical information or something? How could it be so different?

ericd

28 minutes ago

I found much better results with smallish UI elements in large screenshots on GPT by slicing it up manually and feeding them one at a time. I think it does severely lossy downscaling.

agentifysh

2 hours ago

impressive.....most impressive

its going to reach low 90s very soon if trends continue

TheAceOfHearts

an hour ago

Since I think it's interesting to highlight the jagged intelligence, I have a simple word search puzzle [0] that Nano Banana Pro stills struggles to solve correctly. Gemini 3 Pro with Code Execution is able to one-shot the problem and find the positions of each word (this is super impressive! one year ago it wasn't possible), but Nano Banana Pro fails to highlight the words correctly.

Here's the output from two tests I ran:

1. Asking Nano Banana Pro to solve the word search puzzle directly [1].

2. Asking Nano Banana Pro to highlight each word on the grid, with the position of every word included as part of the prompt [2].

The fact that it gets 2 words correct demonstrates meaningful progress, and it seems like we're really close to having a model that can one-shot this problem soon.

There's actually a bit of nuance required to solve this puzzle correctly which an older Gemini model struggled to do without additional nudging. You have to convert the grid or word list to use matching casing (the grid uses uppercase, the word list uses lowercase), and you need to recognize that "soup mix" needs to have the space removed when doing the search.

[0] https://imgur.com/ekwfHrN

[1] https://imgur.com/1nybezU

[2] https://imgur.com/18mK5i5

simonw

3 hours ago

In case the article author sees this, the "HTML transcription" link is broken - it goes to https://aistudio-preprod.corp.google.com/prompts/1GUEWbLIlpX... which is a Google-employee-only URL.

TechRemarker

3 hours ago

Love how employee portals for many companies essentially never get updated design wise over the decades, lol. That page styling and the balls certainly take me back.

inerte

an hour ago

I used to work for a company where the SSO screen had a nice corporate happy people at the office type of image. 25mb. I was in Brazil on a crappy roaming 2g service and couldn't login at all. I know most of the work happens on desktop but geee.....

Oh speaking on mobile, I remember when I tried to use Jira mobile web to move a few tickets up on priority by drag and dropping and ended up closing the Sprint. That stuff was horrible.

jamiek88

2 hours ago

Wow yeah. Flashbacks to when Gmail Invites were cool! Google too.

buildbot

44 minutes ago

I’m a little surprised how open the help links are… I guess that if need help logging in you can’t be expected to well, log in.

rohanlikesai

an hour ago

hey, it's Rohan (the author of the article) - appreciate you catching this, we just fixed this!

ed

2 hours ago

Same with "See prompt in Google AI Studio" which links to an unpublished prompt in AI Studio.

hodder

2 hours ago

"Gemini 3 Pro represents a generational leap from simple recognition to true visual and spatial reasoning."

Prompt: "wine glass full to the brim"

Image generated: 2/3 full wine glass.

True visual and spatial reasoning denied.

minimaxir

2 hours ago

Gemini 3 Pro is not Nano Banana Pro, and the image generation/model that decodes the generated image tokens may not be as robust.

The thinking step of Nano Banana Pro can refine some lateral steps (i.e. the errors in the homework correction and where they are spatially in the image) but it isn't perfect and can encounter some of the typical pitfalls. It's a lot better than Nano Banana base, though.

hodder

2 hours ago

As a consumer I typed this into "Gemini". The behind the scenes model selection just adds confusion.

If "AI" trust is the big barrier for widespread adoption to these products, Alphabet soup isn't the solution (pun intended).

iknowstuff

an hour ago

Nano Banana generates images.

This article is about understanding images.

Your task is unrelated to the article.

spchampion2

2 hours ago

I actually did this prompt and found that it worked with a single nudge on a followup prompt. My first shot got me a wine glass that was almost full but not quite. I told it I wanted it full to the top - another drop would overflow. The second shot was perfectly full.

RyJones

an hour ago

The correction I expect to give to an intern, not a junior person.

ugh123

an hour ago

did it return the exact same glass and surrounding imagery, just with more wine?

devinprater

2 hours ago

Audio described Youtube please? That'd be so amazing! Even if I couldn't play Zelda yet, I could listen to a playthrough with Gemini describing it.

SXX

an hour ago

Hey, I just made simple test on 5 minute downloaded YouTube video uploading it to Gemini app.

Source video title: Zelda: Breath of the Wild - Opening five minutes of gameplay

https://www.youtube.com/watch?v=xbt7ZYdUXn8

Prompt:

   Please describe what happening in each scene of this video.
   
   List scenes with timestamp, then describe separately:
   - Setup and background, colors
   - What is moving, what appear
   - What objects in this scene and what is happening,
   
   Basically make desceiption of 5 minutes video for a person who cant watch it.
Result on github gist since there too much text:

https://gist.github.com/ArseniyShestakov/43fe8b8c1dca45eadab...

I'd say thi is quite accurate.

ed

2 hours ago

What’s new here? I believe this is just gemini 3 which was released last month (the model id hasn’t changed AFAICT)

minimaxir

2 hours ago

Nothing new, it's just highlighting practical vision use cases.

k8sToGo

29 minutes ago

When will we get Gemini 3 Flash?

caseyf

an hour ago

I'm playing with this and wondering if this is an actually good way to identify dominant colors and other features of a garment/product when using a photo where the item is styled and not isolated from the model or other garments

siva7

2 hours ago

Interesting. When i asked Gemini 3 Pro to generate a Infographic from my personal accounting sheet, it first failed to generate anything except a black background, then it generated something where it mixed different languages in a non-sensical way, with obvious typos and irrelevant information grouping. It's certainly a leap forward in OCR, rendering classic OCR useless.

minimaxir

2 hours ago

That's more of an issue with Nano Banana Pro than with Gemini 3 Pro.

siva7

2 hours ago

What's the difference? I thought the vision ai component of gemini 3 is called nano banana?

IanCal

2 hours ago

That’s about generating images, the other side is about understanding images.

brokensegue

2 hours ago

i assumed nano banana was just a tool that gemini 3 used though i don't know

minimaxir

2 hours ago

Gemini 3 Pro's text encoder powers Nano Banana Pro, but it has its own image decoding model that decodes the generated image tokens into an actual image, which appears to be the more pertinent issue in this case.

drivebyhooting

39 minutes ago

Screen understanding is huge for further automating dev work.

causal

3 hours ago

Okay maybe this one isn't an exaggeration when they say leap forward

iamjackg

2 hours ago

Curious how this will fare when playing Pokemon Red.

minimaxir

2 hours ago

Gemini 3 Pro has been playing Pokemon Crystal (which is significantly harder than Red) in a race against Gemini 2.5 Pro: https://www.twitch.tv/gemini_plays_pokemon

Gemini 3 Pro has been making steady progress (12/16 badges) while Gemini 2.5 Pro is stuck (3/16 badges) despite using double the turns and tokens.

euvin

2 hours ago

Yeah the "High frame rate understanding" feature caught my eye, actual real time analysis of live video feeds seems really cool. Also wondering what they mean by "video reasoning/thinking"?

skybrian

2 hours ago

I don’t think it’s real time? The videos were likely taken previously.

stego-tech

2 hours ago

The document is paints a super impressive picture, but the core constraint of “network connection to Google required so we can harvest your data” is still a big showstopper for me (and all cloud-based AI tooling, really).

I’d be curious to see how well something like this can be distilled down for isolated acceleration on SBCs or consumer kit, because that’s where the billions to be made reside (factories, remote sites, dangerous or sensitive facilities, etc).

oklahomasports

2 hours ago

People with your concerns probably make up 1% of the market if that. Also I don’t upload stuff I’m worried about Google seeing. I wonder if they will allows special plans for corporations

stego-tech

an hour ago

I’m very curious where you get that number from, because I thought the same thing until I got a job inside that market and realized how much more vast it actually is. The revenue numbers might not be as big as Big Tech, but the product market is shockingly vast. My advice is not to confuse Big Tech revenues for total market size, because they bring in such revenue by catering to everyone, rather than specific segments or niches; a McDonald’s will always do more volume than a steakhouse, but it doesn’t mean the market for steakhouses is small enough to ignore.

As for this throwaway line:

> Also I don’t upload stuff I’m worried about Google seeing.

You do realize that these companies harvest even private data, right? Like, even in places you think you own, or that you pay for, they’re mining for revenue opportunities and using you as the product even when you’re a customer, right?

> I wonder if they will allows special plans for corporations

They do, but no matter how much redlining Legal does to protect IP interests, the consensus I keep hearing is “don’t put private or sensitive corporate data into third-parties because no legal agreement will sufficiently protect us from harm if they steal our IP or data”. Just look at the glut of lawsuits against Apple, Google, Microsoft, etc from smaller companies that trusted them to act in good faith but got burned for evidence that you cannot trust these entities.

_trampeltier

29 minutes ago

Special since Trump, which non-US company should trust and invest know-how to an us company. And then are also governments. Also special since Trump, is way to risky to send any data to an us company.

bgwalter

2 hours ago

Arpanet was supposed to be decentralized. Now everyone wants to centralize everything so in a war it is sufficient to strike 100 data centers and the whole tethered economy collapses.

That is called progress.

EDIT: You can downvote the truth but still no one wants your "AI" slop.

stego-tech

2 hours ago

Ah, the fond memories of telnetting to NCSA to upload the raw HTML of my first website, written on an OG Macintosh computer and ported via floppy to a PowerMac for network connectivity.

Simple, elegant. I do miss those days.

jonplackett

2 hours ago

Google really are a fully woken sleeping giant. More code reds being issued today I expect.

ch2026

3 hours ago

what framework is being utilized for computer use here?

dmarzio

an hour ago

So we’re going to use this to make the maid from the Jetsons finally. Right?

agentifysh

2 hours ago

im realizing how much of a bottleneck vision models are

im just a glorified speedreadin' promptin' QA at this point with codex

once it replaces the QA layer its truly over for software dev jobs

future would be a software genie where on aistudio you type: "go make counterstrike 1.6 clone, here is $500, you have two hours"

edit: saw the Screenspot benchmark and holy ** this is an insane jump!!! 11% to 71% even beating Opus 4.5's 50%...chatgpt is at 3.5% and it matches my experience with codex

alex1138

2 hours ago

> once it replaces the QA layer its truly over for software dev jobs

Maybe. However, with CYA requirements being everywhere in industry, there would have to be 100 waiver forms signed. I-promise-not-to-sue-company-if-AI-deletes-the-entire-database

It won't happen for that reason alone. Oh who am I kidding of course it will