> Such a service is the opposite of a flywheel (a brake?) in practice. Those tokens are extremely low quality data.
I think not. Say you ask the model to help solve a coding problem. It gives you an idea, you try, it fails, come back and iterate. They can save a note for later finetuning - what worked and what didn't work, using you the user as a validation system for the LLM.
But you might also have your own experience and help the model where it struggles, and finally achieve the task. That is how the model can borrow both your experience and your manual validation work to improve itself.
Some tasks are spread over multiple sessions, or multiple days. They can cluster and look at your progress over time. The latter steps provide rich feedback on the quality of the former steps. Hindsight is 20/20.
Even in chats where the user doesn't perform validation there is rich feedback, people share some of their tacit experience. It's a form of delayed feedback, humans act as caches of unique experience.
The way I conceptualize this is as a search process - problem space search. LLMs can search better with assistance, and humans also search better with assistance. LLMs collect experience from millions of people, they funnel experience into their logs.
“AI companies avoid using AI data for training like the plague”
That’s not accurate. All of the big LLM training labs are leaning increasingly into deliberately AI-created training data these days. I’m confident that’s part of the story behind the big improvements for tasks like coding in models such as Claude 3.5 Sonnet.
The idea of “model collapse” from recursively training on AI-created data only occurs in lab conditions that very deliberately set up those conditions, from what I’ve seen. It doesn’t seem to be a major concern in real-world model training.