Paper page - G-Zero: Self-Play for Open-Ended Generation from Zero Data
…based guidance. AI-generated summary Self-evolving LLMs excel in verifiable domains but struggle in open-ended tasks, where reliance on proxy LLM judges introduces capability bottlenecks and reward hacking . To overcome…