i7l
11 hours ago
For the same reason that anyone's reasoning process and answers to random exam questions are never used as textbooks: if the reasoning is not guaranteed to be right, why would you want to make that training material?
simianwords
11 hours ago
We can empirically figure out how often the reasoning model is correct. With a 95% empirical accuracy, it should still help the model directionally. No training data set needs to be 100% accurate. No?