Search

Showing top 2 results for "agentic coding"

Paper page - WildClawBench: A Benchmark for Real-World, Long-Horizon Agent Evaluation

…Each task averages roughly 8 minutes of wall-clock time and over 20 tool calls , and runs inside a reproducible Docker container hosting an actual CLI agent harness (OpenClaw, Claude Code, Codex…

May 15, 2026

We Got Claude to Fine-Tune an Open Source LLM

… Another agentic way of wasting tokens is it possible to use this inside vscode's copilot extension ? …

Oct 14, 2025 · ben burtenshaw

Followed topics

Paper page - WildClawBench: A Benchmark for Real-World, Long-Horizon Agent Evaluation

We Got Claude to Fine-Tune an Open Source LLM