Why aren't LLM's trained on their own Chain Of Thought?

3 pointsposted 11 hours ago
by simianwords

Item id: 47652591

2 Comments

i7l

11 hours ago

For the same reason that anyone's reasoning process and answers to random exam questions are never used as textbooks: if the reasoning is not guaranteed to be right, why would you want to make that training material?

simianwords

11 hours ago

We can empirically figure out how often the reasoning model is correct. With a 95% empirical accuracy, it should still help the model directionally. No training data set needs to be 100% accurate. No?