Search: Security & safety controls

Paper page - The Cold-Start Safety Gap in LLM Agents

… To study this systematically, we introduce Safety Over Depth for Agents SODA , a benchmark that controls how many regular agentic tasks the agent completes before encountering a safety threat, supporting up to 20 preceding tasks. …

Jun 12, 2026

Paper page - When No Benchmark Exists: Validating Comparative LLM Safety Scoring Without Ground-Truth Labels

… Automated Auditing of LLM Agent Benchmarks 2026 AuditRepairBench: A Paired-Execution Trace Corpus for Evaluator-Channel Ranking Instability in Agent Repair 2026 ValueBlindBench: Agreement-Gated Stress Testing of LLM-Judged Investment Rationales Before Returns Are Observable 2026 JudgeSense: A Bench… …

May 8, 2026

Paper page - On-Policy Self-Evolution via Failure Trajectories for Agentic Safety Alignment

… Existing safety-alignment signals are largely response-level or off-policy, and often incur a safety-utility trade-off: improving agent safety comes at the cost of degraded task performance . …

Paper page - LiSA: Lifelong Safety Adaptation via Conservative Policy Induction

… The following papers were recommended by the Semantic Scholar API STARS: Skill-Triggered Audit for Request-Conditioned Invocation Safety in Agent Systems 2026 PolicyBank: Evolving Policy Understanding for LLM Agents 2026 On-Policy Self-Evolution via Failure Trajectories for Agentic Safety Alignment… …

May 15, 2026

Paper page - Agent libOS: A Library-OS-Inspired Runtime for Long-Running, Capability-Controlled LLM Agents

… We describe the design, threat model, Python prototype, and safety-oriented evaluation. …

Jun 4, 2026

Paper page - Operating-Layer Controls for Onchain Language-Model Agents Under Real Capital

… The following papers were recommended by the Semantic Scholar API A Trace-Based Assurance Framework for Agentic AI Orchestration: Contracts, Testing, and Governance 2026 CUJBench: Benchmarking LLM-Agent on Cross-Modal Failure Diagnosis from Browser to Backend 2026 Synthesizing Multi-Agent Harnesses… …

Apr 30, 2026

Followed topics