Harness design for long-running application development
… On 4.5, that boundary was close: our builds were at the edge of what the generator could do well solo, and the evaluator caught meaningful issues across the build. …
… On 4.5, that boundary was close: our builds were at the edge of what the generator could do well solo, and the evaluator caught meaningful issues across the build. …
… We've additionally added guidance to our CLAUDE.md to ensure model-specific changes are gated to the specific model they're targeting. For any change that could trade off against intelligence, we'll add soak periods, a broader eval suite, and gradual rollouts so we catch issues earlier. …
… Use it when: Tool definitions consuming 10K tokens Experiencing tool selection accuracy issues Building MCP-powered systems with multiple servers 10+ tools available Less beneficial when: Small tool library budget "travel limit" : exceeded.append { "name": member "name" , "spent": total, "limit": b… …
… This creates a filtering mechanism where Claude handles routine inquiries, leaving colleagues to address more complex, strategic, or context-heavy issues that exceed AI capabilities “It has reduced my dependence on my team by 80%, but the last 20% is crucial and I go and talk to them” . …
… In headless mode claude -p there is no UI to ask the human, so we instead terminate the process. …