Two Agents, More Tokens, Better Code

How HAVEN Intelligence uses dual-agent orchestration to scale code quality with test-time compute. A walkthrough of the Claudex architecture.

The problem with a single agent

Most developers working with AI-assisted code generation know the pattern: an agent receives a task, writes code, and the developer reviews the result manually. This works fine for simple tasks but scales poorly. As complexity increases, so does the risk of errors slipping through.

The question is not whether LLMs can write code. They can. The question is how to systematically increase the quality of the code they produce without requiring more human review.

Test-time compute scaling

The answer lies in test-time compute scaling: instead of using a better model, use more computation at inference time. In practice, this means two agents working in an iterative loop where one implements and the other challenges.

HAVEN Intelligence has built this principle into Claudex, an open source tool that orchestrates two AI agents for automated code quality control. The planner agent (Claude Code or Codex) plans and implements changes. The reviewer agent reviews the result, identifies issues, and suggests improvements. The loop runs until quality converges.

Deterministic orchestration

A central design choice in Claudex is that Python owns all control logic. LLMs are good at reading and writing code, but they are unreliable when it comes to quality assessments with fixed boundaries. Therefore, Python handles all thresholds, convergence checks, severity counting, and regression detection.

This separation is critical. When a reviewer agent scores code, Python validates whether the score meets the predefined requirements. The agent has no influence on what “good enough” means. That is a deterministic decision.

Three run modes

Claudex supports three modes covering different workflows:

tmux mode for interactive use, where the developer can follow both agents in real time
Dashboard TUI for a visual overview of iterations, scores, and convergence status
Headless mode for CI/CD pipelines, where Claudex runs without user interaction

This flexibility makes it possible to use the same tool for both local development and automated quality checks in production pipelines.

Results and lessons learned

Claudex has a test suite with over 433 tests and is actively used in development projects at HAVEN Intelligence. Experience shows that the dual-agent approach consistently produces code with fewer defects than a single agent, especially during complex refactoring.

The key lesson is that the value does not come from using a “smarter” model. It comes from structuring the interaction between agents with deterministic control logic. The quality improvement is a result of the architecture, not the model’s capabilities alone.

Claudex is open source and available on GitHub.