Search

Showing top 3 results for "Safer AI agents"

Paper page - On-Policy Self-Evolution via Failure Trajectories for Agentic Safety Alignment

… These results suggest that failed trajectories can provide structured repair supervision for safer self-evolving agents. …

Paper page - When No Benchmark Exists: Validating Comparative LLM Safety Scoring Without Ground-Truth Labels

… The substantial differences arise upstream of the chain, in claim-contract enforcement and deployment fit. A Norwegian public-sector procurement case comparing Borealis and Gemma 3 demonstrates the resulting evidence in practice: the safer model depends on scenario category and risk measure. …

May 8, 2026

Paper page - One Turn Too Late: Response-Aware Defense Against Hidden Malicious Intent in Multi-Turn Dialogue

… The following papers were recommended by the Semantic Scholar API SaFeR-Steer: Evolving Multi-Turn MLLMs via Synthetic Bootstrapping and Feedback Dynamics 2026 ContextualJailbreak: Evolutionary Red-Teaming via Simulated Conversational Priming 2026 Transient Turn Injection: Exposing Stateless Multi-… …

May 13, 2026

Followed topics

Paper page - On-Policy Self-Evolution via Failure Trajectories for Agentic Safety Alignment

Paper page - When No Benchmark Exists: Validating Comparative LLM Safety Scoring Without Ground-Truth Labels

Paper page - One Turn Too Late: Response-Aware Defense Against Hidden Malicious Intent in Multi-Turn Dialogue