Context Compaction
In a multi-agent system, every agent maintains its own context window — and that context grows with each message exchanged between teammates. A team of five agents discussing a problem generates five times the context pressure of a single agent working alone. Without management, conversations quickly exceed the LLM's token limit and agents lose access to earlier information.
Context compaction solves this with a three-layer architecture that progressively compresses conversation history while preserving essential facts, decisions, and discoveries. It runs automatically — agents are not aware that compaction is happening.
Why Per-Agent Context Matters
Each agent in a team sees a different conversation. When Agent A sends a private message to Agent B, only those two agents see it. Agent C sees the team-wide messages but not the private exchange. This means every agent has its own unique view of the conversation history.
Compaction respects this. Each agent's context is managed independently — compacted summaries are tailored to what that specific agent actually saw. This is especially important for side conversations that break out of the main round table: two agents can have a detailed private discussion, and only their context windows absorb the cost.
The Three Layers
Each layer triggers at a different threshold and uses a different compression strategy. They work together — earlier layers reduce pressure so later layers fire less often.
| Layer | What It Does | Method | AI-Powered |
|---|---|---|---|
| 1. Tool Result Truncation | Shrinks stale tool outputs | Age-based token limits | No |
| 2. Conversation Compaction | Compresses older messages into dense summaries | LLM-powered summarization | Yes |
| 3. Task-Transition Compaction | Creates clean boundaries between workflow tasks | LLM-powered task summaries | Yes |
Layer 1: Tool Result Truncation
Tool calls often return large payloads (API responses, search results, file contents) that are critical when fresh but become stale quickly. This layer automatically truncates old tool results based on how many messages ago they occurred:
| Age | Truncation Limit |
|---|---|
| Recent (11–20 messages ago) | 2,000 tokens |
| Medium (21–40 messages ago) | 1,000 tokens |
| Old (41+ messages ago) | 500 tokens |
Truncated results include a retention notice so the agent knows information was compressed. This layer is not AI-powered — it uses simple token counting and adds no cost.
Layer 2: Conversation Compaction
This is the core layer. When the number of uncompacted messages exceeds the agent's compaction trigger, the engine selects the oldest messages (preserving the most recent ones) and uses an LLM to compress them into a dense summary.
The compression preserves:
- Facts and decisions — what was concluded, not how the discussion went
- Agent attribution — who said or discovered what
- Specific details — numbers, names, references, and tool results
- Narrative continuity — each summary builds on previous ones without repetition
Typical compression achieves 50–70% token reduction while retaining all actionable information. The agent continues working with full awareness of what happened — it just reads a summary instead of the raw conversation.
Layer 3: Task-Transition Compaction
When a workflow task completes and the next one begins, this layer creates a clean boundary. Instead of carrying raw conversation history from the previous task, each agent starts the new task with a focused summary of what was accomplished.
This prevents cross-task context bleeding — a common problem where agents see completion messages from earlier tasks and prematurely conclude the current task is also done.
Two summary modes are available:
| Mode | How It Works | Cost |
|---|---|---|
| Individual (default) | Each agent gets a personalized summary based on their view | One LLM call per agent |
| Shared | One agent generates a summary, shared with all others | One LLM call total |
Compaction Presets
Each agent is assigned a compaction preset that controls how aggressively compaction fires. Choose based on how much context the agent needs versus how long its conversations run.
| Preset | Trigger | Recent Kept | Best For |
|---|---|---|---|
| Aggressive | After 7 messages | 5 most recent | Long iterative stages, cost-sensitive runs |
| Strong | After 12 messages | 10 most recent | Multi-cycle stages with moderate context needs |
| Standard | After 17 messages | 15 most recent | General-purpose (default) |
| Careful | After 27 messages | 25 most recent | Detail-sensitive analysis, complex reasoning |
| None | Never | All | Short tasks where full history matters |
- Trigger — How many uncompacted messages before compaction fires. Lower = more frequent compaction = lower token costs but more summarization.
- Recent Kept — How many of the most recent messages are always preserved verbatim. These are never summarized.
Choosing a preset
- Aggressive/Strong — Use for agents in long-running iterative loops (e.g., a coding agent that runs 50+ tool calls per stage). Token costs stay low and the agent maintains a working summary of what happened.
- Standard — The safe default. Works well for most conversational agents with moderate interaction counts.
- Careful — Use for agents doing complex analysis where losing earlier details would degrade quality (e.g., a research agent cross-referencing multiple sources).
- None — Use only for short tasks where you know the conversation will fit in the context window. The agent sees full, unmodified history.
Setting compaction to None on an agent assigned to long-running stages risks context window overflow. When the window fills, the engine hard-truncates messages from the beginning — which is worse than summarized compaction because no information is preserved at all.
Context Window Sizing
The engine reads the agent's resolved LLM model to determine the actual context window size. There are no hardcoded limits — if you assign a model with a 128K context window, compaction calculations use that full capacity. If you override an agent with a smaller model, its compaction thresholds adjust accordingly.
This is configured per agent, not per stage. If the same agent is assigned to multiple stages, it uses the same compaction settings everywhere.
- Configure an Agent — includes compaction preset selection
- Communication Flow — how agents talk and why per-agent context matters