Deductive Verification for Chain-of-Thought Reasoning in LLMs

80 pointsposted 5 days ago
by smooke

21 Comments

YeGoblynQueenne

5 days ago

Chain of Thought prompting reminds me of Facilitated Communication:

https://en.wikipedia.org/wiki/Facilitated_communication

A long discredited intervention where a "facilitator" guides the hand of a non-verbal human to help them write down their thoughts and experiences. Experiments that blinded the facilitator to the observations of the subject, where the written message matched the facilitator's, rather than the subject's, observations, have convincingly proved that it was so much bunkum. It's the Clever Hans Effect all by another name, and with non-verbal humans rather than horses.

Chain of Thought works like that: without hand-holding by a human who understands how to answer a question, the LLM's performance drops, or drops off a cliff even. Of course this is much harder to prove for LLMs than it was for facilitated communication because LLMs don't really do anything without a prompt in the first place. Which should be a very big hint of what's really going on with CoT.

TZubiri

4 days ago

Much of the benefit of LLM is that it's a great interlocutor yes. The chatgpt app is an interactive notepad.

ithkuil

4 days ago

Yes! I'd like to find some time to explore and fine tune a model to act as an effective rubber duck.

I benefit greatly from talking to colleagues which helped me understand things by not saying much of the answer but by asking the right questions and "thinking together".

The current chat interface where each sentence gets an immediate response and the model always has to say something feels unnatural. But OTOH it's not easy to find the right balance, so I didn't make much progress. Probably I need to talk with somebody :-)

Anyway. Being senior naming engineer myself I already have the name for the product:

Robert Duck

TZubiri

4 days ago

No ML needed. Just hardcode a couple of

"That's interesting, tell me more"

"And why do you think that is?"

"And how does that make you feel?"

ithkuil

3 days ago

well, technically you can also use a literal rubber duck which doesn't say anything at all.

psd1

3 days ago

Yes, but you still have to train it.

throwthrowuknow

4 days ago

You can do this now by simply telling it to use the Socratic method or your own variation of it

jawon

5 days ago

People out there trying build some semblance of AI out of an LLM using larger and larger networks of “agents” that generate, classify, revise and verify data using the same LLM they're building larger and larger networks of agents upon to try and build some semblance of AI.

The end game is a brain-sized network where each neuron is an agent sending a 1M token prompt to a 10T parameter model to update their "weights".

Aeium

3 days ago

You are making this sound absurd, but what if you actually can build some kind of AGI like that.

arthurcolle

4 days ago

Yep, this sounds pretty much accurate

Lerc

5 days ago

Sometimes it looks like the computationalists are trying to sneak back into the room while no-one is looking.

There does seem to be quite a lot of independent ad-hoc efforts making custom notations for C-O-T. I feel like we're in a period similar to just after the first programming languages and compilers were invented but regular expressions were yet to come. In a way that's quite exciting, its another little Cambrian explosion.

I don't think it will be a panacea though. In my observations of failures of reasoning in LLMs, a lot of the problem isn't that they fail to follow logical steps but that they fail to notice the presence of implied premises completely. Chain of Thought is good for spotting the wrong reasoning, but not for spotting that the problem is not the one that it appears at first glance.

selfhoster11

a day ago

I genuinely think that to some extent, it’s an engineering problem. If you collect enough sufficiently high quality chains of thought with a structured notation that avoids the reversal curse, you can make a dent in the problem. Or it might be just tilting at windmills I guess, but I would want to try it at least.

mdp2021

5 days ago

> the problem isn't that they fail to follow logical steps but that they fail to notice the presence of implied premises completely

Could you please explain more clearly what you noticed? Can you find an example?

svnt

5 days ago

An example is where the link between the ideas is not present in the prompt.

E.g. an oversimple example is:

It’s going to rain tomorrow so you should take an umbrella.

If someone doesn’t understand the relationship between an umbrella and keeping dry — the implied premise — they won’t understand why they should bring one and the statement would be puzzling.

We talk about this often as someone or something “not knowing what an x is” where in this case “knowing” means in a spatial or logical sense, not just a linguistic sense.

LLMs do not have this layer of understanding but they can fake it in short interactions.

Sakos

3 days ago

Otherwise known as the idea of qualia (https://en.wikipedia.org/wiki/Qualia).

There's no indication that LLMs possess anything like it, so there is no "knowing". I'm also not sure we can ever imbue an LLM with qualia without rethinking how we build LLMs and how they interact and learn from their environment.

svnt

3 days ago

I think it is like qualia in the sense that it can be something known experientially, but spatial and logical reasoning are things that transcend subjectivity.

No doubt LLMs do not have qualia as humans do, but I also think that the diversity of qualia present in humans is generally unqualified and underestimated at present.

Sakos

2 days ago

You can't have spatial and logical reasoning without qualia.

svnt

2 days ago

I don’t know if that is true, unless your definition of qualia is so loose that it includes (or emerges from) basic data representations such as vectors comprised of weights and vertices, in which case I think LLMs could perhaps be argued to have basic qualia.

There is often this attempt to draw a bright line on one feature or characteristic between humans and animals, or humans and AI, but I think that is a mistake based on an attempt to simplify the argument through reduction.

user

5 days ago

[deleted]

Lerc

5 days ago

One example is the three killers problem

There are three killers in a room, someone enters the room and kills one of them, how many killers are now in the room?

Apart from the basic reasoning errors of small models (where they come up with all sorts of numbers), larger models that fail, do so because they fail to take into account that killing makes a killer. Some models succeed at this but it is unclear if this is because they are familiar with this specific problem.

The model has to make the distinction between that and

There are three fish in a room, someone enters the room and kills one of them, how many fish are now in the room?

or

There are three fish in a room, someone enters the room and kills one of them, how many killers are now in the room?

In these examples the word kills is pushed in one direction or another, but when it serves double duty in the original problem it can notice one premise and miss the other.

It's interesting task in frustration to try and get a model that gets it wrong to COT out of that particular error.

throwthrowuknow

4 days ago

Yes, the problem would be due to attention being limited to the context. Reasoning requires constructing a model that can be referred to and updated. This is what CoT attempts to do but it is limited when expressed in natural language and being append only.