Paper page - The Cold-Start Safety Gap in LLM Agents
… To study this systematically, we introduce Safety Over Depth for Agents SODA , a benchmark that controls how many regular agentic tasks the agent completes before encountering a safety threat, supporting up to 20 preceding tasks. …