Paper page - RewardHarness: Self-Evolving Agentic Post-Training
…We present RewardHarness, a self-evolving agentic reward framework that reframes reward modeling as context evolution rather than weight optimization. Instead of learning from large-scale annotations, RewardHarness aligns with human preferences…