Paper page - SWE-WebDevBench: Evaluating Coding Agent Application Platforms as Virtual Software Agencies
… We release SWE-WebDev Bench as a community benchmark to enable such replication and help platform builders identify and address these gaps. …
… We release SWE-WebDev Bench as a community benchmark to enable such replication and help platform builders identify and address these gaps. …
… I think the agent trace link is broken — it points to trace.md but the actual file is agent-trace.txt: https://huggingface.co/hf-skills/h100-diffusers-kernel-builder/blob/main/agent-trace.txt · Thank you @ ClementeH ! …