Paper page - A Matter of TASTE: Improving Coverage and Difficulty of Agent Benchmarks
…View arXiv page View PDF GitHub 4 Add to collection Community TASTE is a new way to automatically create diverse, harder, and verified benchmarks for tool-using AI agents. Instead of writing…