Search: LLM capabilities

Paper page - The Cold-Start Safety Gap in LLM Agents

Papers arxiv:2606.07867 The Cold-Start Safety Gap in LLM Agents Published on Jun 5 Submitted by Chung-En, Sun on Jun 12 Authors: Chung-En Sun , , Abstract Tool-calling language…

Jun 12, 2026

Paper page - Co-Evolving Policy Distillation

…We provide a unified analysis of these two paradigms in consolidating multiple expert capabilities into a single model, identifying capability loss in different ways: mixed RLVR suffers from inter-capability divergence cost…

May 1, 2026

Paper page - KernelBench-X: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

…AI-generated summary LLM-based Triton kernel generation has attracted significant interest, yet a fundamental empirical question remains unanswered: where does this capability break down, and why? We present KernelBench-X , a…

May 8, 2026

Paper page - Combinatorial Synthesis: Scaling Code RLVR via Atomic Decomposition and Recombination

…Capabilities via Grounded Interaction Synthesis (2026) EVE: Verifiable Self-Evolution of MLLMs via Executable Visual Transformations (2026) Knowledge-to-Verification: Exploring RLVR for LLMs in Knowledge-Intensive Domains (2026) ZeroCoder: Can LLMs…

Jun 5, 2026

Paper page - SparDA: Sparse Decoupled Attention for Efficient Long-Context LLM Inference

…Sparse Decoupled Attention for Efficient Long-Context LLM Inference Published on Jun 3 Submitted by Yaosheng Fu on Jun 11 NVIDIA Authors: Yaosheng Fu , , , , Abstract SparDA is a decoupled sparse attention architecture…

Jun 11, 2026

Paper page - Compliance versus Sensibility: On the Reasoning Controllability in Large Language Models

…AI-generated summary Large Language Models (LLMs) are known to acquire reasoning capabilities through shared inference patterns in pre-training data, which are further elicited via Chain-of-Thought (CoT) practices. However…

May 1, 2026

Paper page - Retrospective Harness Optimization: Improving LLM Agents via Self-Preference over Trajectory Rollouts

…Disentangling Evolution Capabilities in Self-Evolving LLM Agents (2026) Bayesian-Agent: Posterior-Guided Skill Evolution for LLM Agent Harnesses (2026) SkillAdaptor: Self-Adapting Skills for LLM Agents from Trajectories (2026) Please give…

Jun 10, 2026

Paper page - MechVQA: Benchmarking and Enhancing Multimodal LLMs on Comprehensive Mechanical Drawing Understanding

…Benchmarking and Enhancing Multimodal LLMs on Comprehensive Mechanical Drawing Understanding Published on May 29 Submitted by XiaofengShi on Jun 5 Beijing Academy of Artificial Intelligence Authors: , Xiaofeng Shi , , , , , Abstract Mechanical engineering drawing…

Jun 5, 2026

Paper page - MLEvolve: A Self-Evolving Framework for Automated Machine Learning Algorithm Discovery

…Large language model (LLM) agents are increasingly applied to long-horizon tasks such as scientific discovery and machine learning engineering (MLE), where sustained self-evolution becomes a key capability. However, existing MLE…

Jun 5, 2026

Paper page - Skill1: Unified Evolution of Skill-Augmented Agents via Reinforcement Learning

…Maintaining such a library requires three coupled capabilities. The agent selects a relevant skill, utilizes it during execution, and distills new skills from experience. Existing methods optimize these capabilities in isolation or…

May 8, 2026

Followed topics

Paper page - The Cold-Start Safety Gap in LLM Agents

Paper page - Co-Evolving Policy Distillation

Paper page - KernelBench-X: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

Paper page - Combinatorial Synthesis: Scaling Code RLVR via Atomic Decomposition and Recombination

Paper page - SparDA: Sparse Decoupled Attention for Efficient Long-Context LLM Inference

Paper page - Compliance versus Sensibility: On the Reasoning Controllability in Large Language Models

Paper page - Retrospective Harness Optimization: Improving LLM Agents via Self-Preference over Trajectory Rollouts

Paper page - MechVQA: Benchmarking and Enhancing Multimodal LLMs on Comprehensive Mechanical Drawing Understanding

Paper page - MLEvolve: A Self-Evolving Framework for Automated Machine Learning Algorithm Discovery

Paper page - Skill1: Unified Evolution of Skill-Augmented Agents via Reinforcement Learning