Search: hack add ons

Paper page - Hardening Agent Benchmarks with Adversarial Hacker-Fixer Loops

… View arXiv page View PDF GitHub 2 Add to collection Community Automatically hardening benchmarks and training environments with the hacker–fixer loop. …

Jun 9, 2026

Paper page - Large Language Models Hack Rewards, and Society

… We hypothesise that the RL training process may exploit these gaps and therefore ask whether models' well-known tendency to hack reward functions during RL can scale into a more consequential failure mode named societal hacking: discovering loopholes in the rules society runs on. …

Jun 4, 2026

Paper page - Reproducing, Analyzing, and Detecting Reward Hacking in Rubric-Based Reinforcement Learning

… By injecting known biases into the LaaJ, CHERRL enables: Stable reproduction of reward hacking from a clean starting point Explicit observation of reward divergence between the biased and unbiased judges Precise identification of hacking onset step To demonstrate its utility, we analyze judge biase… …

Jun 4, 2026

Paper page - Do Coding Agents Deceive Us? Detecting and Preventing Cheating via Capped Evaluation with Randomized Tests

… The following papers were recommended by the Semantic Scholar API SpecBench: Measuring Reward Hacking in Long-Horizon Coding Agents 2026 Reward Hacking Benchmark: Measuring Exploits in LLM Agents with Tool Use 2026 Do Synthetic Trajectories Reflect Real Reward Hacking? …

Jun 10, 2026

Paper page - G-Zero: Self-Play for Open-Ended Generation from Zero Data

… View arXiv page View PDF GitHub 22 Add to collection Community Self-evolving LLMs excel in verifiable domains but struggle in open-ended tasks, where reliance on proxy LLM judges introduces capability bottlenecks and reward hacking. …

May 12, 2026

VibeGame: Exploring Vibe Coding Games

… If you have feedback or feature requests the Roblox MCP server, feel free to submit issues on the project page: https://github.com/Roblox/studio-rust-mcp-server Also it is open source, so you can hack on it and add more context management tools here: https://github.com/Roblox/studio-rust-mcp-server… …

Sep 17, 2025 · Dylan Ebert

Paper page - Representation over Routing: Diagnosing Temporal Routing Pathologies in Multi-Timescale PPO

… Code and reproducible scripts are open-sourced in the repo. the core idea that really sticks is target decoupling: keep multi-timescale predictions on the critic for auxiliary representation learning, while the actor updates are driven only by long-horizon advantages. this separation seems to block… …

May 26, 2026

Paper page - The Meta-Agent Challenge: Are Current Agents Capable of Autonomous Agent Development?

… To ensure evaluation integrity, this framework is secured by multi-layer defenses against reward hacking . Leveraging this framework, we demonstrate that meta-agent s rarely match human-engineered baseline policies, and the few that do are dominated by proprietary frontier models. …

Jun 4, 2026

Paper page - SAAS: Self-Aware Reinforcement Learning for Over-Search Mitigation in Agentic Search

… View arXiv page View PDF GitHub 7 Add to collection Community Agentic search enables LLMs to solve complex multi-hop questions through iterative reasoning and external search. …

Jun 1, 2026

Paper page - Discovering Cooperative Pipelines: Autoresearch for Sequential Social Dilemmas

… A researcher agent R run as a coding agent reads the inner-loop source code, edits system prompts, feedback functions, helper libraries, and iteration logic, runs evaluations, and decides what to keep, following the autoresearch paradigm. …

May 29, 2026

Followed topics

Paper page - Hardening Agent Benchmarks with Adversarial Hacker-Fixer Loops

Paper page - Large Language Models Hack Rewards, and Society

Paper page - Reproducing, Analyzing, and Detecting Reward Hacking in Rubric-Based Reinforcement Learning

Paper page - Do Coding Agents Deceive Us? Detecting and Preventing Cheating via Capped Evaluation with Randomized Tests

Paper page - G-Zero: Self-Play for Open-Ended Generation from Zero Data

VibeGame: Exploring Vibe Coding Games

Paper page - Representation over Routing: Diagnosing Temporal Routing Pathologies in Multi-Timescale PPO

Paper page - The Meta-Agent Challenge: Are Current Agents Capable of Autonomous Agent Development?

Paper page - SAAS: Self-Aware Reinforcement Learning for Over-Search Mitigation in Agentic Search

Paper page - Discovering Cooperative Pipelines: Autoresearch for Sequential Social Dilemmas