The Economics of Token Usage
Cost modeling for enterprise agent deployments. When to use expensive models, when to use cheap ones, and how to blend them.
"Just use GPT-4 for everything" is expensive advice. "Just use the cheapest model" is bad advice. The right answer is a lot more nuanced.
We've processed over 50K agent tasks across our client base. Here's what the cost data actually shows.
The Model Hierarchy
Not all tasks need the same intelligence level. We categorize agent tasks into three tiers:
"Is this email a support request or a sales inquiry?" Use the cheapest model. GPT-4o-mini or Claude Haiku. Accuracy difference is negligible.
"Extract the key terms from this contract." Mid-tier models. GPT-4o or Claude Sonnet. The task requires reasoning but has clear success criteria.
"Draft a response to this nuanced customer complaint." Top-tier only. Claude Opus or GPT-4. The quality difference is measurable and matters.
The 3.2x Efficiency Gain
Most enterprises start by routing everything to GPT-4. When we implement tiered routing, costs drop dramatically:
Same outcomes. 3.2x lower cost. The secret: 70% of tasks are Tier 1 or Tier 2.
The Caching Layer
Here's what most teams miss: many agent tasks are repeated. The same classification question on similar inputs. The same extraction on templated documents.
We built a semantic cache. Before calling the LLM, we check if a sufficiently similar query was already processed. Cache hit rate across our deployments: 23%.
That's 23% of API calls eliminated entirely. Combined with tiered routing, total cost reduction: 4.1x from baseline.
The Framework
When planning an agent deployment, model your task distribution:
- What percentage are simple routing/classification?
- What percentage require structured extraction?
- What percentage need genuine reasoning?
- What percentage are likely cache hits?
Then do the math. Most enterprises overestimate their Tier 3 percentage by 2-3x. They're paying for intelligence they don't need.
Want more research like this?
Subscribe to our research notes