abeppu
4 days ago
Skimming the actual paper ... it seems pretty bad?
The thing about Beethoven's 9th and biological materials which is mentioned in the OP is just that, out of a very large knowledge graph, they found small subgraph isomorphic to a subgraph created from a text about the symphony. But they seem not to cover the fact that a sufficiently large graph with some high-level statistical properties would have small subgraphs isomorphic to a 'query' graph. Is this one good or meaningful in some way, or is it just an inevitable outcome of having produced such a large knowledge graph at the start? The reader can't really tell, because figure 8 which presents the two graphs has such a poor resolution that one cannot read any of the labels. We're just expected to see "oh the nodes and their degrees match so it has the right shape", but that doesn't really tell us that their system had any insight through this isomorphism-based mining process.
For the stuff about linking art (e.g. a Kandinsky painting) with material design ... they used an LLM to generate a description of a material for DALL-E where the prompt includes information about the painting, and then they show the resulting image and the painting. But there's no measure of what a "good" material description is, and there certainly is no evaluation of the contribution of the graph-based "reasoning". In particular an obvious comparison would be to "Describe this painting." -> "Construct a prompt for DALL-E to portray a material whose structure has properties informed by this description of a painting ..." -> render.
It really seems like the author threw a bunch of stuff against the wall and didn't even look particularly closely to see if it stuck.
Also, the only equation in the paper is the author giving the definition of cosine similarity, before 2 paragraphs justifying its use in constructing their graph. Like, who is the intended audience?
https://iopscience.iop.org/article/10.1088/2632-2153/ad7228#...
bbor
4 days ago
Great writeup, thanks! That Kadinsky quote is what set off alarm bells for me, as it seems like a quintessential failure case for laypeople understanding LLMs -- they take some basic, vague insights produced by a chatbot as profound discoveries. It seems the reviewers may have agreed, to some extent; note that it was received by Machine Learning 24-03-26, but only accepted (after revisions) on 24-08-21.
I wrote more below with a quote, but re: "who's the intended audience?" I think the answer is the same kind of people Gary Marcus writes for: other academic leaders, private investors, and general technologists. Definitely not engineers looking to apply their work immediately, nor the vast majority of scientists that are doing the long, boring legwork of establishing facts.
In that context, I would defend the paper as evocative and creative, even though your criticisms all ring true. Like, take a look at their (his?) HuggingFace repo: https://huggingface.co/lamm-mit It seems clear that they're doing serious work with real LLMs, even if it's scattershot.
Honestly, if I was a prestigious department head with millions at my disposal in an engineering field, I'm not sure I would act any differently!
ETA: Plus, I'll defend him purely on the basis of having a gorgeous, well-documented Git repo for the project: https://github.com/lamm-mit/GraphReasoning?tab=readme-ov-fil... Does this constitute scientific value on its own? Not really. Does it immediately bias me in his favor? Absolutely!
refulgentis
4 days ago
Thank you for taking the time to read and write this up, something was "off" in the quotes describing the materials that had me at 4 of 5 alarm bells ringing. Now I can super skim confidently and giggle.
- real output here is text, using a finetuned Mixtral provided leading Qs
- the initial "graph" with the silly beethoven-inspired material is probably hand constructed, they don't describe its creation process at all
- later, they're constructing graphs with GPT-3.5 (!?) (they say rate limits, but somethings weird with the whole thing, they're talking about GPT-4 vision preview etc., which was roughly a year before the paper was released)
- Whole thing reads like someone had a long leash to spend a year or two exploring basic consumer LLMs, finetune one LLM, and sorta just published whatever they got 6 months to a year later.
DaiPlusPlus
4 days ago
> and sorta just published whatever they got 6 months to a year later.
Publish and perish...
kevindamm
4 days ago
I thought it was "publish xor perish" but, huh, it really is 'or.
caetris2
4 days ago
The paper is a tremendous effort of passion and love for the art of science and the science of deriving discovery from art. I assure you, this person is someone to pay attention to and I hope they never give up on loving the work they do.
idiotsecant
4 days ago
Found the author.
griomnib
4 days ago
I’ve actually been thinking a lot about how LLM need to bridge the gap to symbolic reasoning and was very much waiting for something like this in theory…but this ain’t it.
Looking forward to a more serious effort.
gremgoth
4 days ago
We're adding symbolic verification to LLM-generated SQL code at http://sql.ai
user
4 days ago