RAG vs. Long-Context Windows
We tested 1M token context windows against vector retrieval for legal document summarization. The winner was not what we expected.
When OpenAI released GPT-4 Turbo with a 128K context window, and Google followed with Gemini's 1M token capacity, a question emerged: do we still need RAG?
We ran the experiment. The results surprised us.
The Setup
We tested both approaches on a real client task: summarizing legal contracts for a mid-size law firm. The dataset included 200 contracts ranging from 5,000 to 80,000 tokens each.
Approach A: Traditional RAG with chunking (512 tokens), embeddings via text-embedding-3-large, and top-k retrieval (k=10) before synthesis.
Approach B: Full document loading into Claude's 200K context window. No chunking, no retrieval. Just the raw document and a summarization prompt.
The Results
Long-context won on accuracy. But here's where it gets interesting: RAG was 47% cheaper per document.
The Tradeoff Nobody Talks About
Long-context models charge by the token. When you load an 80K token document, you pay for 80K input tokens every single time. RAG only retrieves the relevant chunks, typically 5-10K tokens.
For documents under 20K tokens, long-context wins on both accuracy and cost. Above that threshold, the math flips. RAG becomes the economically rational choice, even with slightly lower accuracy.
Our Recommendation
Use a hybrid approach:
- Short documents (<20K tokens): Long-context. The accuracy gain is worth the marginal cost increase.
- Long documents (>20K tokens): RAG with high-quality chunking. Accept the 5% accuracy tradeoff for 40%+ cost savings.
- Mission-critical tasks: Long-context with human review. When accuracy matters more than cost, don't optimize prematurely.
The Bottom Line
"RAG is dead" is wrong. "Long-context solves everything" is also wrong. The right answer depends on your document size distribution and your accuracy-cost tolerance.
We've built our agents to automatically select the approach based on document length. No manual configuration needed.
Want more research like this?
Subscribe to our research notes