Paper page - Workspace-Bench 1.0: Benchmarking AI Agents on Workspace Tasks with Large-Scale File Dependencies
…Benchmarking AI Agents on Workspace Tasks with Large-Scale File Dependencies Published on May 5 Submitted by taesiri on May 6 Authors: , , , , , , , , , , , , , , , , , , , Abstract Workspace-Bench is a benchmark for evaluating AI agents…