Back to Research
-300ms per hand-offJanuary 20255 min read

Latency in Agent Swarms

Optimizing hand-offs between Manager and Worker agents. How we reduced execution time by rethinking the communication protocol.

Multi-agent systems are powerful. They're also slow. When you chain agents together, latency compounds. A 3-agent workflow can easily take 15+ seconds.

We spent two months optimizing hand-offs between our Manager and Worker agents. Here's what we learned.

The Problem

Our original architecture was simple: Manager agent receives task, decides which Worker to delegate to, sends the full context, Worker executes, returns result.

Each hand-off took ~800ms. With a typical 4-agent workflow, that's 3.2 seconds of pure overhead before any actual work happens.

Optimization 1: Context Compression

Workers don't need the full conversation history. They need the task and relevant context. We built a context compression layer that extracts only what's needed.

Before
~12K tokens per hand-off
After
~2K tokens per hand-off

Result: -180ms per hand-off from reduced token processing.

Optimization 2: Parallel Execution

Not all Worker tasks depend on each other. If the Manager needs data from both a CRM lookup and an email search, run them in parallel.

We built a dependency graph analyzer that identifies parallelizable tasks automatically. If two Workers don't share dependencies, they run simultaneously.

Result: -400ms average on multi-Worker workflows.

Optimization 3: Speculative Execution

This one's counterintuitive. We start Workers before the Manager finishes deciding.

Based on task patterns, we predict which Worker is likely to be called and pre-warm it with partial context. If the prediction is wrong (happens ~15% of the time), we discard the result.

Result: -120ms average (accounting for wasted compute on mispredictions).

The Combined Result

-300ms
per agent hand-off

On a 4-agent workflow, that's 900ms saved. Doesn't sound like much until you realize users notice anything over 2 seconds. We took our average workflow from 4.8s to 3.9s. That's the difference between "fast" and "acceptable."

What We're Working On Next

Streaming inter-agent communication. Instead of waiting for a Worker to complete, start streaming partial results back to the Manager. Early experiments show another 200ms+ potential reduction.

Want more research like this?

Subscribe to our research notes