Ask HN: Do coding assistants "see" attached images?

3 pointsposted a month ago
by cryptography

Item id: 45488666

12 Comments

yawpitch

a month ago

They’re closed source black boxes, not even the people who built them really know what’s happening under the hood.

That said, one can reasonably infer that an LLM-based system isn’t doing any form of visual processing at all… it’s just looking at your HTML and CSS and flagging where it diverts from the statistical mean of all such structures in the training data (modulo some stochastic wandering and that it may, somehow, have mixed some measure of Rick Astley or Goatse into its multidimensional lookup table).

chistev

a month ago

> They’re closed source black boxes, not even the people who built them really know what’s happening under the hood.

Please explain

iswapna_

a month ago

LLMs are trained to predict a bunch of tokens (GenAI produces text) from all the previous seen tokens, based on the data it was trained on. It does not understand anything about the spatial relationships like lines, objects etc in an image. "Not even the people who built them" - We have no real understanding of how LLMs work, yet. Traditional ML theory (classification/regression/clustering) largely does not apply to LLM's emergent capabilities like coding, arithmetic and reasoning. No such theory exists today. People are trying.

chistev

a month ago

If I understand you, you're saying people how built them have no idea why they work?

muzani

a month ago

Yup, it's emergent behavior. This has been going for a while in ML, I believe. To be fair, we know how brains work, but we don't understand why consciousness either.

yawpitch

a month ago

To be truly fair, we barely know how flatworm and fruit fly brains work… we haven’t the slightest clue how human brains work. Understanding consciousness is a long way off.

iswapna_

a month ago

:) yea and here we are talking abut UPI

yawpitch

a month ago

Imagine an equation several miles long, comprised of billions of variables, billions of constants, and billions of exponents.

You know none of the variables. None of the constants. None of the exponents. No one does, really, but even if you did it wouldn’t help, because no one bothered to write down the operators and the parentheses are randomly shifted around every time the equation is resolved.

All you know is that if you ask it for tea, it will always, invariably, and forever, give you back something that is almost, but not quite entirely, unlike tea. Sometimes it might be more unlike coffee, some times more unlike vodka and cow urine.

What you’ll never, ever, ever reliably know is what’s in the cup.

That’s about the best way I know to explain black box abstractions. In a few decades we might have a workable theory as to why these things function, to the degree that they do, though I’ll bet a rather large amount of money that we won’t.