Paper page - The Cold-Start Safety Gap in LLM Agents
Papers arxiv:2606.07867 The Cold-Start Safety Gap in LLM Agents Published on Jun 5 Submitted by Chung-En, Sun on Jun 12 Authors: Chung-En Sun , , Abstract Tool-calling language…
Papers arxiv:2606.07867 The Cold-Start Safety Gap in LLM Agents Published on Jun 5 Submitted by Chung-En, Sun on Jun 12 Authors: Chung-En Sun , , Abstract Tool-calling language…
…We provide a unified analysis of these two paradigms in consolidating multiple expert capabilities into a single model, identifying capability loss in different ways: mixed RLVR suffers from inter-capability divergence cost…
…AI-generated summary LLM-based Triton kernel generation has attracted significant interest, yet a fundamental empirical question remains unanswered: where does this capability break down, and why? We present KernelBench-X , a…
…Capabilities via Grounded Interaction Synthesis (2026) EVE: Verifiable Self-Evolution of MLLMs via Executable Visual Transformations (2026) Knowledge-to-Verification: Exploring RLVR for LLMs in Knowledge-Intensive Domains (2026) ZeroCoder: Can LLMs…
…Sparse Decoupled Attention for Efficient Long-Context LLM Inference Published on Jun 3 Submitted by Yaosheng Fu on Jun 11 NVIDIA Authors: Yaosheng Fu , , , , Abstract SparDA is a decoupled sparse attention architecture…
…AI-generated summary Large Language Models (LLMs) are known to acquire reasoning capabilities through shared inference patterns in pre-training data, which are further elicited via Chain-of-Thought (CoT) practices. However…
…Disentangling Evolution Capabilities in Self-Evolving LLM Agents (2026) Bayesian-Agent: Posterior-Guided Skill Evolution for LLM Agent Harnesses (2026) SkillAdaptor: Self-Adapting Skills for LLM Agents from Trajectories (2026) Please give…
…Benchmarking and Enhancing Multimodal LLMs on Comprehensive Mechanical Drawing Understanding Published on May 29 Submitted by XiaofengShi on Jun 5 Beijing Academy of Artificial Intelligence Authors: , Xiaofeng Shi , , , , , Abstract Mechanical engineering drawing…
…Large language model (LLM) agents are increasingly applied to long-horizon tasks such as scientific discovery and machine learning engineering (MLE), where sustained self-evolution becomes a key capability. However, existing MLE…
…Maintaining such a library requires three coupled capabilities. The agent selects a relevant skill, utilizes it during execution, and distills new skills from experience. Existing methods optimize these capabilities in isolation or…