Unlocking Agentic RL Training for GPT-OSS: A Practical Retrospective
…Was the FA3 + sink backward sufficient for stable training (with MoE materialization and sequence parallelism only for correctness/memory), or did you still rely on rollout correction or the on-policy detach…