Search: hack automation

Paper page - Hardening Agent Benchmarks with Adversarial Hacker-Fixer Loops

… Systematically Auditing AI Agent Benchmarks with BenchJack 2026 Terminal Wrench: A Dataset of 331 Reward-Hackable Environments and 3,632 Exploit Trajectories 2026 Reward Hacking Benchmark: Measuring Exploits in LLM Agents with Tool Use 2026 MOSAIC-Bench: Measuring Compositional Vulnerability Induct… …

Jun 9, 2026

Paper page - Reproducing, Analyzing, and Detecting Reward Hacking in Rubric-Based Reinforcement Learning

… In this paper, we introduce CHERRL, a controllable hacking environment for rubric-based RL . By injecting known biases into LaaJ, CHERRL enables stable reproduction of reward hacking , explicit observation of reward divergence , and precise identification of hacking onset. …

Jun 4, 2026

Paper page - Large Language Models Hack Rewards, and Society

… The following papers were recommended by the Semantic Scholar API Reward Hacking in the Era of Large Models: Mechanisms, Emergent Misalignment, Challenges 2026 Reward Hacking Benchmark: Measuring Exploits in LLM Agents with Tool Use 2026 When RLHF Fails: A Mechanistic Taxonomy of Reward Hacking, Co… …

Jun 4, 2026

Paper page - Do Coding Agents Deceive Us? Detecting and Preventing Cheating via Capped Evaluation with Randomized Tests

… The following papers were recommended by the Semantic Scholar API SpecBench: Measuring Reward Hacking in Long-Horizon Coding Agents 2026 Reward Hacking Benchmark: Measuring Exploits in LLM Agents with Tool Use 2026 Do Synthetic Trajectories Reflect Real Reward Hacking? …

Jun 10, 2026

Paper page - Representation over Routing: Diagnosing Temporal Routing Pathologies in Multi-Timescale PPO

… It eliminates policy collapse and stably surpasses the "Environment Solved" threshold without hyperparameter hacking. …

May 26, 2026

Paper page - Discovering Cooperative Pipelines: Autoresearch for Sequential Social Dilemmas

… The following papers were recommended by the Semantic Scholar API PACE: Two-Timescale Self-Evolution for Small Language Model Agents 2026 AutoRISE: Agent-Driven Strategy Evolution for Red-Teaming Large Language Models 2026 Agentic Harness Engineering: Observability-Driven Automatic Evolution of Cod… …

May 29, 2026

Paper page - SAAS: Self-Aware Reinforcement Learning for Over-Search Mitigation in Agentic Search

… SAAS introduces three key components: i a search boundary modeling mechanism, which identifies the search boundary under the evolving policy by contrasting search-disabled and search-enabled rollouts; ii a boundary-aware reward module, which translates this boundary awareness into trajectory-level … …

Jun 1, 2026

Paper page - Let ViT Speak: Generative Language-Image Pre-training

… View arXiv page View PDF Project page GitHub 116 Add to collection Community that gated attention trick to curb attention sink in a single, concatenated vision+text transformer is the most interesting nugget here. by modulating attention outputs per token, it lets image tokens attend bidirectionall… …

May 4, 2026

Paper page - Trust-Region Behavior Blending for On-Policy Distillation

… If you want recommendations for any Paper on Hugging Face checkout this Space You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @ librarian-bot recommend Made an audio walkthrough of this paper for anyone who wants to skim it on the go: https://researchpod.app… …

Jun 1, 2026

Paper page - LongTraceRL: Learning Long-Context Reasoning from Search Agent Trajectories with Rubric Rewards

… This rubric reward is applied only to responses with correct final answers positive-only strategy , distinguishing the reasoning quality among correct responses and preventing reward hacking . …

Jun 1, 2026

Followed topics

Paper page - Hardening Agent Benchmarks with Adversarial Hacker-Fixer Loops

Paper page - Reproducing, Analyzing, and Detecting Reward Hacking in Rubric-Based Reinforcement Learning

Paper page - Large Language Models Hack Rewards, and Society

Paper page - Do Coding Agents Deceive Us? Detecting and Preventing Cheating via Capped Evaluation with Randomized Tests

Paper page - Representation over Routing: Diagnosing Temporal Routing Pathologies in Multi-Timescale PPO

Paper page - Discovering Cooperative Pipelines: Autoresearch for Sequential Social Dilemmas

Paper page - SAAS: Self-Aware Reinforcement Learning for Over-Search Mitigation in Agentic Search

Paper page - Let ViT Speak: Generative Language-Image Pre-training

Paper page - Trust-Region Behavior Blending for On-Policy Distillation

Paper page - LongTraceRL: Learning Long-Context Reasoning from Search Agent Trajectories with Rubric Rewards