Skip to main content

Context Compaction

In a multi-agent system, every agent maintains its own context window — and that context grows with each message exchanged between teammates. A team of five agents discussing a problem generates five times the context pressure of a single agent working alone. Without management, conversations quickly exceed the LLM's token limit and agents lose access to earlier information.

Context compaction solves this with a four-layer architecture that progressively compresses conversation history while preserving essential facts, decisions, and discoveries. It runs automatically — agents are not aware that compaction is happening.

The net result: endless conversations without catastrophic forgetting. By selectively compressing relevant content and dropping noise, the system effectively triples or quadruples the usable context window. A model with a 1M token context window can handle the dense, actionable information equivalent of 3–4M tokens of uncompressed conversation — and with Layer 4 (Knowledge Graph persistence), nothing of value is ever lost, even across sessions.

Why Per-Agent Context Matters

Each agent in a team sees a different conversation. When Agent A sends a private message to Agent B, only those two agents see it. Agent C sees the team-wide messages but not the private exchange. This means every agent has its own unique view of the conversation history.

Compaction respects this. Each agent's context is managed independently — compacted summaries are tailored to what that specific agent actually saw. This is especially important for side conversations that break out of the main round table: two agents can have a detailed private discussion, and only their context windows absorb the cost.

The Three Layers

Each layer triggers at a different threshold and uses a different compression strategy. They work together — earlier layers reduce pressure so later layers fire less often.

LayerWhat It DoesMethodAI-Powered
1. Tool Result TruncationShrinks stale tool outputsAge-based token limitsNo
2. Conversation CompactionCompresses older messages into dense summariesLLM-powered summarizationYes
3. Task-Transition CompactionCreates clean boundaries between workflow tasksLLM-powered task summariesYes
4. Knowledge Graph PersistenceRoutes relevant content to permanent memoryEnrichment queue + classificationYes

Layer 1: Tool Result Truncation

Tool calls often return large payloads (API responses, search results, file contents) that are critical when fresh but become stale quickly. This layer automatically truncates old tool results based on how many messages ago they occurred:

AgeTruncation Limit
Recent (11–20 messages ago)2,000 tokens
Medium (21–40 messages ago)1,000 tokens
Old (41+ messages ago)500 tokens

Truncated results include a retention notice so the agent knows information was compressed. This layer is not AI-powered — it uses simple token counting and adds no cost.

Layer 2: Conversation Compaction

This is the core layer. When the number of uncompacted messages exceeds the agent's compaction trigger, the platform decides which messages to compress and which to drop — guided by a relevance score from the message analyzer.

Each message in the conversation carries a relevance assessment. When compaction fires, the platform separates messages into two groups:

  • Relevant messages — compressed into a dense summary by an LLM, preserving their substance
  • Irrelevant messages — dropped entirely, freeing context without the cost of summarizing noise

This means compaction is not just "summarize the oldest messages." It is a selective process that keeps what matters and discards what doesn't — routine acknowledgments, redundant status updates, and low-signal chatter are eliminated rather than compressed.

The compression of relevant messages preserves:

  • Facts and decisions — what was concluded, not how the discussion went
  • Agent attribution — who said or discovered what
  • Specific details — numbers, names, references, and tool results
  • Narrative continuity — each summary builds on previous ones without repetition

Typical compression achieves 50–70% token reduction while retaining all actionable information. The agent continues working with full awareness of what happened — it just reads a summary instead of the raw conversation.

Layer 3: Task-Transition Compaction

When a workflow task completes and the next one begins, this layer creates a clean boundary. Instead of carrying raw conversation history from the previous task, each agent starts the new task with a focused summary of what was accomplished.

This prevents cross-task context bleeding — a common problem where agents see completion messages from earlier tasks and prematurely conclude the current task is also done.

Two summary modes are available:

ModeHow It WorksCost
Individual (default)Each agent gets a personalized summary based on their viewOne LLM call per agent
SharedOne agent generates a summary, shared with all othersOne LLM call total

Layer 4: Knowledge Graph Persistence

The first three layers manage context within a single session — when the session ends, compacted summaries are gone. Layer 4 bridges the gap between session memory and permanent memory.

When a project has Long-Term Memory enabled and Curated Mode off, relevant messages and tool results are automatically routed to the knowledge graph via the enrichment queue. The content is classified, linked to related concepts, and stored permanently. This means knowledge discovered during one workflow run is available to every future run — agents in a session six months later can query the graph and find what an earlier team discovered.

The layers work together as a pipeline:

  1. Tool results are truncated in the current context (Layer 1) but their full content may already be in the KG (Layer 4)
  2. Conversations are compressed into summaries (Layer 2) while the key findings flow into permanent storage (Layer 4)
  3. Task transitions create clean boundaries (Layer 3) while the accumulated knowledge persists beyond the session (Layer 4)

The result: context stays lean for the current session while nothing of value is lost long-term.

note

Layer 4 only activates when the project's knowledge source is set to Open. In Curated Mode, only user-uploaded documents feed the knowledge graph — workflow activity is excluded. This is useful for projects like documentation chatbots where the KG should contain only verified content.

Compaction Presets

Each agent is assigned a compaction preset that controls how aggressively compaction fires. Choose based on how much context the agent needs versus how long its conversations run.

PresetTriggerRecent KeptBest For
AggressiveAfter 7 messages5 most recentLong iterative stages, cost-sensitive runs
StrongAfter 12 messages10 most recentMulti-cycle stages with moderate context needs
StandardAfter 17 messages15 most recentGeneral-purpose (default)
CarefulAfter 27 messages25 most recentDetail-sensitive analysis, complex reasoning
NoneNeverAllShort tasks where full history matters
  • Trigger — How many uncompacted messages before compaction fires. Lower = more frequent compaction = lower token costs but more summarization.
  • Recent Kept — How many of the most recent messages are always preserved verbatim. These are never summarized.

Choosing a preset

  • Aggressive/Strong — Use for agents in long-running iterative loops (e.g., a coding agent that runs 50+ tool calls per stage). Token costs stay low and the agent maintains a working summary of what happened.
  • Standard — The safe default. Works well for most conversational agents with moderate interaction counts.
  • Careful — Use for agents doing complex analysis where losing earlier details would degrade quality (e.g., a research agent cross-referencing multiple sources).
  • None — Use only for short tasks where you know the conversation will fit in the context window. The agent sees full, unmodified history.
warning

Setting compaction to None on an agent assigned to long-running stages risks context window overflow. When the window fills, the platform hard-truncates messages from the beginning — which is worse than summarized compaction because no information is preserved at all.

Context Window Sizing

The platform reads the agent's resolved LLM model to determine the actual context window size. There are no hardcoded limits — if you assign a model with a 128K context window, compaction calculations use that full capacity. If you override an agent with a smaller model, its compaction thresholds adjust accordingly.

This is configured per agent, not per stage. If the same agent is assigned to multiple stages, it uses the same compaction settings everywhere.

How-to Guides