Paper page - AcademiClaw: When Students Set Challenges for AI Agents
… The following papers were recommended by the Semantic Scholar API Beyond Binary Correctness: Scaling Evaluation of Long-Horizon Agents on Subjective Enterprise Tasks 2026 Claw-Eval: Towards Trustworthy Evaluation of Autonomous Agents 2026 AlphaEval: Evaluating Agents in Production 2026 ClawBench: C… …