Paper page - On-Policy Self-Evolution via Failure Trajectories for Agentic Safety Alignment
…For each failure, the same policy proposes repair candidates, which are then re-scored by verifiers and filtered across security, utility, over-refusal control, and trajectory validity. This dense trajectory-level information…
