Have LLMs Learned to Reason? A Characterization via 3-SAT Phase Transition

3 pointsposted 12 hours ago
by jacklondon

3 Comments

jacklondon

12 hours ago

I found this paper quite interesting.

Instead of discussing “reasoning” in a vague way, it studies LLM behavior on 3-SAT and especially near the phase transition, where the instances become much harder. This brings the discussion closer to computational complexity and avoids bare benchmarking.

It seems to suggest that many models fail badly in the hard region, while some newer ones may capture a bit more genuine reasoning structure.

I wonder if this is a meaningful bridge between LLM evaluation and complexity theory, or if it is still mostly a stress test and not much more.

derrak

10 hours ago

My cynical take on this sort of research is that we will never use raw LLMs to solve these kinds of reasoning problems and it’s therefore unclear why we bother to test them on these kinds of benchmarks.

Modern SAT solvers are completely cracked. I think there are a lot of potential synergies between such symbolic solvers and machine learning (and maybe even LLMs). But it doesn’t seem like an LLMs ability to directly solve these tasks with no symbolic tool use is going to predict the quality of these synergies.