Agent Utilization Is the New Performance Ceiling

Ryan Lopopolo

For a long time, the upper bound on software output was human attention. Even when tools improved, work still bottlenecked on how many things a person could hold in their head at once.

That bound no longer exists.

Today, it is entirely achievable for a single engineer to productively deploy hundreds of millions of tokens per day. My explicit goal is to get each engineer on my team to a billion tokens per day. Not by “prompting harder,” but by removing humans from the loop wherever possible and delegating more of the job to agents.

This sounds extreme until you look at where the tokens actually go.

A 3D isometric scene showing multiple autonomous worker units operating in parallel across a shared system: code panels, logs, charts, and running services connected by conveyor belts and control nodes. The layout suggests continuous, high-throughput agent activity coordinated by structured harnesses rather than human oversight, rendered in a clean Airbnb-style miniature aesthetic on a white background.

Most software work is not greenfield feature development. It is QA. Performance tuning. Crash analysis. Log inspection. Reading production traces. Diffing behavior across releases. Noticing when something drifted. These tasks are tedious for humans precisely because they require sustained attention across large surfaces of code and data.

Agents are good at this, but only if you let them do the work.

The limiting factor is not inference cost. Tokens are cheap. The limiting factor is whether the system is structured in a way that allows agents to operate at all. That means:

  • access to production logs and traces
  • access to crash dumps and core files
  • access to analytics and metrics
  • the ability to propose and land changes without human ceremony
  • harnesses that turn observations into patches
  • guardrails that prevent unsafe changes by construction
  • tools to launch and drive apps and services in local dev

Until agents can see the system, they cannot improve it. Until they can act, utilization remains artificially capped.

This is why “human in the loop” is often a performance bug, not a safety feature. If every meaningful action requires synchronous human approval, you will never reach high utilization. If you are 10x’ing or 100x’ing the number of generated PRs, for example, you cannot realistically require synchronous human review on every code change merged to main. The system will idle while waiting for attention, even though capacity is available. Existing company processes are not designed for this level of parallelism and throughput.

Once you remove those bottlenecks, utilization compounds quickly. Agents start doing work continuously: scanning, proposing, fixing, tightening. Humans stop being implementers and become supervisors of direction, constraints, and taste.

At that point, throughput stops being interesting. Utilization is the metric that matters. A billion tokens per engineer per day is not a vanity metric, but rather a reframing about how much of the software lifecycle can run continuously once you stop treating implementation as the scarce resource.

The organizations that struggle will not be the ones without access to models. They will be the ones that never restructured their systems to let the models work.