If you included the previous and following sentences, it's at least to me clear what they mean:
However, existing methods for training such models still rely heavily on vast human-curated tasks and labels, typically via fine-tuning or reinforcement learning, which poses a fundamental bottleneck to advancing AI systems toward capabilities beyond human intelligence
To overcome this limitation, we introduce R-Zero, a fully autonomous framework that generates its own training data from scratch.
Starting from a single base LLM, R-Zero initializes two independent models with distinct roles, a Challenger and a Solver.
Training a LLM is a multi-stage process[1], and they're tackling the stage at the end. That's where you do fine-tuning or reinforcement learning. They're not training a LLM from scratch. They're explicitly stating they start from a base LLM, ie a pretrained non-tuned model.
As I understand it, and as they mention, training data for the latter stages has typically required high-quality human-curated samples in large numbers, even if they're augmented using LLMs, say by generating multiple variations of each human-curated training sample.
Their proposal is to have a generative adversarial network generate that data without any initial human input, ie from scratch.
[1]: https://snorkel.ai/blog/large-language-model-training-three-...