Context Compaction

In a multi-agent system, every agent maintains its own context window — and that context grows with each message exchanged between teammates. A team of five agents discussing a problem generates five times the context pressure of a single agent working alone. Without management, conversations quickly exceed the LLM's token limit and agents lose access to earlier information.

Context compaction solves this with a four-layer architecture that progressively compresses conversation history while preserving essential facts, decisions, and discoveries. It runs automatically — agents are not aware that compaction is happening.

The net result: endless conversations without catastrophic forgetting. By selectively compressing relevant content and dropping noise, the system effectively triples or quadruples the usable context window. A model with a 1M token context window can handle the dense, actionable information equivalent of 3–4M tokens of uncompressed conversation — and with Layer 4 (Knowledge Graph persistence), nothing of value is ever lost, even across sessions.

Why Per-Agent Context Matters

Each agent in a team sees a different conversation. When Agent A sends a private message to Agent B, only those two agents see it. Agent C sees the team-wide messages but not the private exchange. This means every agent has its own unique view of the conversation history.

Compaction respects this. Each agent's context is managed independently — compacted summaries are tailored to what that specific agent actually saw. This is especially important for side conversations that break out of the main round table: two agents can have a detailed private discussion, and only their context windows absorb the cost.

The Three Layers

Each layer triggers at a different threshold and uses a different compression strategy. They work together — earlier layers reduce pressure so later layers fire less often.

Layer	What It Does	Method	AI-Powered
1. Tool Result Truncation	Shrinks stale tool outputs	Age-based token limits	No
2. Conversation Compaction	Compresses older messages into dense summaries	LLM-powered summarization	Yes
3. Task-Transition Compaction	Creates clean boundaries between workflow tasks	LLM-powered task summaries	Yes
4. Knowledge Graph Persistence	Routes relevant content to permanent memory	Enrichment queue + classification	Yes

Layer 1: Tool Result Truncation

Tool calls often return large payloads (API responses, search results, file contents) that are critical when fresh but become stale quickly. This layer automatically truncates old tool results based on how many messages ago they occurred:

Age	Truncation Limit
Recent (11–20 messages ago)	2,000 tokens
Medium (21–40 messages ago)	1,000 tokens
Old (41+ messages ago)	500 tokens

Truncated results include a retention notice so the agent knows information was compressed. This layer is not AI-powered — it uses simple token counting and adds no cost.

Layer 2: Conversation Compaction

This is the core layer. When the number of uncompacted messages exceeds the agent's compaction trigger, the platform decides which messages to compress and which to drop — guided by a relevance score from the message analyzer.

Each message in the conversation carries a relevance assessment. When compaction fires, the platform separates messages into two groups:

Relevant messages — compressed into a dense summary by an LLM, preserving their substance
Irrelevant messages — dropped entirely, freeing context without the cost of summarizing noise

This means compaction is not just "summarize the oldest messages." It is a selective process that keeps what matters and discards what doesn't — routine acknowledgments, redundant status updates, and low-signal chatter are eliminated rather than compressed.

The compression of relevant messages preserves:

Facts and decisions — what was concluded, not how the discussion went
Agent attribution — who said or discovered what
Specific details — numbers, names, references, and tool results
Narrative continuity — each summary builds on previous ones without repetition

Typical compression achieves 50–70% token reduction while retaining all actionable information. The agent continues working with full awareness of what happened — it just reads a summary instead of the raw conversation.

Layer 3: Task-Transition Compaction

When a workflow task completes and the next one begins, this layer creates a clean boundary. Instead of carrying raw conversation history from the previous task, each agent starts the new task with a focused summary of what was accomplished.

This prevents cross-task context bleeding — a common problem where agents see completion messages from earlier tasks and prematurely conclude the current task is also done.

Two summary modes are available:

Mode	How It Works	Cost
Individual (default)	Each agent gets a personalized summary based on their view	One LLM call per agent
Shared	One agent generates a summary, shared with all others	One LLM call total

Layer 4: Knowledge Graph Persistence

The first three layers manage context within a single session — when the session ends, compacted summaries are gone. Layer 4 bridges the gap between session memory and permanent memory.

When a project has Long-Term Memory enabled and Curated Mode off, relevant messages and tool results are automatically routed to the knowledge graph via the enrichment queue. The content is classified, linked to related concepts, and stored permanently. This means knowledge discovered during one workflow run is available to every future run — agents in a session six months later can query the graph and find what an earlier team discovered.

The layers work together as a pipeline:

Tool results are truncated in the current context (Layer 1) but their full content may already be in the KG (Layer 4)
Conversations are compressed into summaries (Layer 2) while the key findings flow into permanent storage (Layer 4)
Task transitions create clean boundaries (Layer 3) while the accumulated knowledge persists beyond the session (Layer 4)

The result: context stays lean for the current session while nothing of value is lost long-term.

note

Layer 4 only activates when the project's knowledge source is set to Open. In Curated Mode, only user-uploaded documents feed the knowledge graph — workflow activity is excluded. This is useful for projects like documentation chatbots where the KG should contain only verified content.

Compaction Presets

Each agent is assigned a compaction preset that controls how aggressively compaction fires. Choose based on how much context the agent needs versus how long its conversations run.

Preset	Trigger	Recent Kept	Best For
Aggressive	After 7 messages	5 most recent	Long iterative stages, cost-sensitive runs
Strong	After 12 messages	10 most recent	Multi-cycle stages with moderate context needs
Standard	After 17 messages	15 most recent	General-purpose (default)
Careful	After 27 messages	25 most recent	Detail-sensitive analysis, complex reasoning
None	Never	All	Short tasks where full history matters

Trigger — How many uncompacted messages before compaction fires. Lower = more frequent compaction = lower token costs but more summarization.
Recent Kept — How many of the most recent messages are always preserved verbatim. These are never summarized.

Choosing a preset

Aggressive/Strong — Use for agents in long-running iterative loops (e.g., a coding agent that runs 50+ tool calls per stage). Token costs stay low and the agent maintains a working summary of what happened.
Standard — The safe default. Works well for most conversational agents with moderate interaction counts.
Careful — Use for agents doing complex analysis where losing earlier details would degrade quality (e.g., a research agent cross-referencing multiple sources).
None — Use only for short tasks where you know the conversation will fit in the context window. The agent sees full, unmodified history.

warning

Setting compaction to None on an agent assigned to long-running stages risks context window overflow. When the window fills, the platform hard-truncates messages from the beginning — which is worse than summarized compaction because no information is preserved at all.

Context Window Sizing

The platform reads the agent's resolved LLM model to determine the actual context window size. There are no hardcoded limits — if you assign a model with a 128K context window, compaction calculations use that full capacity. If you override an agent with a smaller model, its compaction thresholds adjust accordingly.

This is configured per agent, not per stage. If the same agent is assigned to multiple stages, it uses the same compaction settings everywhere.

How-to Guides

Configure an Agent — includes compaction preset selection
Communication Flow — how agents talk and why per-agent context matters

Why Per-Agent Context Matters​

The Three Layers​

Layer 1: Tool Result Truncation​

Layer 2: Conversation Compaction​

Layer 3: Task-Transition Compaction​

Layer 4: Knowledge Graph Persistence​

Compaction Presets​

Choosing a preset​

Context Window Sizing​