Latency in Agent Swarms
Optimizing hand-offs between Manager and Worker agents. How we reduced execution time by rethinking the communication protocol.
Multi-agent systems are powerful. They're also slow. When you chain agents together, latency compounds. A 3-agent workflow can easily take 15+ seconds.
We spent two months optimizing hand-offs between our Manager and Worker agents. Here's what we learned.
The Problem
Our original architecture was simple: Manager agent receives task, decides which Worker to delegate to, sends the full context, Worker executes, returns result.
Each hand-off took ~800ms. With a typical 4-agent workflow, that's 3.2 seconds of pure overhead before any actual work happens.
Optimization 1: Context Compression
Workers don't need the full conversation history. They need the task and relevant context. We built a context compression layer that extracts only what's needed.
Result: -180ms per hand-off from reduced token processing.
Optimization 2: Parallel Execution
Not all Worker tasks depend on each other. If the Manager needs data from both a CRM lookup and an email search, run them in parallel.
We built a dependency graph analyzer that identifies parallelizable tasks automatically. If two Workers don't share dependencies, they run simultaneously.
Result: -400ms average on multi-Worker workflows.
Optimization 3: Speculative Execution
This one's counterintuitive. We start Workers before the Manager finishes deciding.
Based on task patterns, we predict which Worker is likely to be called and pre-warm it with partial context. If the prediction is wrong (happens ~15% of the time), we discard the result.
Result: -120ms average (accounting for wasted compute on mispredictions).
The Combined Result
On a 4-agent workflow, that's 900ms saved. Doesn't sound like much until you realize users notice anything over 2 seconds. We took our average workflow from 4.8s to 3.9s. That's the difference between "fast" and "acceptable."
What We're Working On Next
Streaming inter-agent communication. Instead of waiting for a Worker to complete, start streaming partial results back to the Manager. Early experiments show another 200ms+ potential reduction.
Want more research like this?
Subscribe to our research notes