Skip to main content

Context Compaction

In a multi-agent system, every agent maintains its own context window — and that context grows with each message exchanged between teammates. A team of five agents discussing a problem generates five times the context pressure of a single agent working alone. Without management, conversations quickly exceed the LLM's token limit and agents lose access to earlier information.

Context compaction solves this with a three-layer architecture that progressively compresses conversation history while preserving essential facts, decisions, and discoveries. It runs automatically — agents are not aware that compaction is happening.

Context compaction overview

Why Per-Agent Context Matters

Each agent in a team sees a different conversation. When Agent A sends a private message to Agent B, only those two agents see it. Agent C sees the team-wide messages but not the private exchange. This means every agent has its own unique view of the conversation history.

Compaction respects this. Each agent's context is managed independently — compacted summaries are tailored to what that specific agent actually saw. This is especially important for side conversations that break out of the main round table: two agents can have a detailed private discussion, and only their context windows absorb the cost.

The Three Layers

Each layer triggers at a different threshold and uses a different compression strategy. They work together — earlier layers reduce pressure so later layers fire less often.

LayerWhat It DoesMethodAI-Powered
1. Tool Result TruncationShrinks stale tool outputsAge-based token limitsNo
2. Conversation CompactionCompresses older messages into dense summariesLLM-powered summarizationYes
3. Task-Transition CompactionCreates clean boundaries between workflow tasksLLM-powered task summariesYes

Layer 1: Tool Result Truncation

Tool calls often return large payloads (API responses, search results, file contents) that are critical when fresh but become stale quickly. This layer automatically truncates old tool results based on how many messages ago they occurred:

AgeTruncation Limit
Recent (11–20 messages ago)2,000 tokens
Medium (21–40 messages ago)1,000 tokens
Old (41+ messages ago)500 tokens

Truncated results include a retention notice so the agent knows information was compressed. This layer is not AI-powered — it uses simple token counting and adds no cost.

Layer 2: Conversation Compaction

This is the core layer. When the number of uncompacted messages exceeds the agent's compaction trigger, the engine selects the oldest messages (preserving the most recent ones) and uses an LLM to compress them into a dense summary.

The compression preserves:

  • Facts and decisions — what was concluded, not how the discussion went
  • Agent attribution — who said or discovered what
  • Specific details — numbers, names, references, and tool results
  • Narrative continuity — each summary builds on previous ones without repetition

Typical compression achieves 50–70% token reduction while retaining all actionable information. The agent continues working with full awareness of what happened — it just reads a summary instead of the raw conversation.

Layer 3: Task-Transition Compaction

When a workflow task completes and the next one begins, this layer creates a clean boundary. Instead of carrying raw conversation history from the previous task, each agent starts the new task with a focused summary of what was accomplished.

This prevents cross-task context bleeding — a common problem where agents see completion messages from earlier tasks and prematurely conclude the current task is also done.

Two summary modes are available:

ModeHow It WorksCost
Individual (default)Each agent gets a personalized summary based on their viewOne LLM call per agent
SharedOne agent generates a summary, shared with all othersOne LLM call total

Compaction Presets

Each agent is assigned a compaction preset that controls how aggressively compaction fires. Choose based on how much context the agent needs versus how long its conversations run.

PresetTriggerRecent KeptBest For
AggressiveAfter 7 messages5 most recentLong iterative stages, cost-sensitive runs
StrongAfter 12 messages10 most recentMulti-cycle stages with moderate context needs
StandardAfter 17 messages15 most recentGeneral-purpose (default)
CarefulAfter 27 messages25 most recentDetail-sensitive analysis, complex reasoning
NoneNeverAllShort tasks where full history matters
  • Trigger — How many uncompacted messages before compaction fires. Lower = more frequent compaction = lower token costs but more summarization.
  • Recent Kept — How many of the most recent messages are always preserved verbatim. These are never summarized.

Choosing a preset

  • Aggressive/Strong — Use for agents in long-running iterative loops (e.g., a coding agent that runs 50+ tool calls per stage). Token costs stay low and the agent maintains a working summary of what happened.
  • Standard — The safe default. Works well for most conversational agents with moderate interaction counts.
  • Careful — Use for agents doing complex analysis where losing earlier details would degrade quality (e.g., a research agent cross-referencing multiple sources).
  • None — Use only for short tasks where you know the conversation will fit in the context window. The agent sees full, unmodified history.
warning

Setting compaction to None on an agent assigned to long-running stages risks context window overflow. When the window fills, the engine hard-truncates messages from the beginning — which is worse than summarized compaction because no information is preserved at all.

Context Window Sizing

The engine reads the agent's resolved LLM model to determine the actual context window size. There are no hardcoded limits — if you assign a model with a 128K context window, compaction calculations use that full capacity. If you override an agent with a smaller model, its compaction thresholds adjust accordingly.

This is configured per agent, not per stage. If the same agent is assigned to multiple stages, it uses the same compaction settings everywhere.

How-to Guides