Paper page - On-Policy Self-Evolution via Failure Trajectories for Agentic Safety Alignment
… Compared with strong baselines, FATE reduces attack success rate by 33.5%, harmful compliance by 82.6%, and improves external trajectory-safety diagnosis by 6.5%. These results suggest that failed trajectories can provide structured repair supervision for safer self-evolving agents. …