Such a service is the opposite of a flywheel (a brake?) in practice. Those tokens are extremely low quality data.
1) I strongly double if even one in a thousand people actually bothers to report the outcome of their result. I and every programmer I know certainly don't (if we give up or it works we both just stop asking). And I'd think its one in a hundred thousand who gives detailed feedback that is useful for training. And one in ten million an expert that sits down and talks about its deep knowledge with it (and the problem of how you deduce this from chats that it isn't some crazy person).
2) AI's are extremely confident in their answer, which fools many people, especially those in a void of knowledge. Even if people did tell OpenAI whether every solution worked or not, I would heavily discount the accuracy of such data.
3) AI output autocannibalism does not lead to better outcomes, AI companies avoid using AI data for training like the plague. Mixed tokens I doubt would be much better.
The situation in reality is something like one in some huge number - maybe hundreds of thousands - of mixed tokens can be useful. Of those, those that are repeats of high quality sources like textbooks, dictionaries, man pages - have no value. Of the remaining, there is huge problem of how do you extract these needles in the haystack with high confidence. Given the incredibly lopsided confusion matrix (with the massive amount of actual negatives to actual positives) and this incredibly unstructured data set, I doubt its even remotely possible to find a way where you don't end up with a totally unacceptable ratio of actual to false positives. Letting this kind of garbage data in is how you get Gemini's gasoline spaghetti.
“AI companies avoid using AI data for training like the plague”
That’s not accurate. All of the big LLM training labs are leaning increasingly into deliberately AI-created training data these days. I’m confident that’s part of the story behind the big improvements for tasks like coding in models such as Claude 3.5 Sonnet.
The idea of “model collapse” from recursively training on AI-created data only occurs in lab conditions that very deliberately set up those conditions, from what I’ve seen. It doesn’t seem to be a major concern in real-world model training.