Claude Code Auto-Compact: What Triggers It, What It Loses, and How to Fix It

Claude Code auto-compacts when context reaches ~95% of the 200K token window. The agent summarizes everything, loses file paths, error messages, and debugging state, then re-reads files to recover, filling context again. This page covers the exact trigger mechanism, what the 'context left until auto-compact' warning means, how to customize the compact prompt, and how FlashCompact reduces context waste so compaction fires 3-4x less often.

March 13, 2026 · 2 min read

You're mid-debugging. Claude Code has the stack trace, the relevant files, the narrowed hypothesis. Then the "context left until auto-compact" warning appears. Seconds later, the agent summarizes everything, forgets which files it modified, and starts re-reading code it already analyzed. This is auto-compact: Claude Code's built-in mechanism for keeping sessions alive when the 200K token context window fills up. It prevents crashes but destroys working memory in the process.

200K
Token context window
~95%
Auto-compact trigger threshold
60%
Agent time spent searching (Cognition)
3-4x
Less compaction with FlashCompact

What Is Auto-Compact?

Claude Code operates within a 200,000 token context window. Every message you send, every file Claude reads, every grep result, bash output, tool definition, system prompt, and CLAUDE.md file competes for that space. When the window fills up, Claude Code runs auto-compact: it summarizes the entire conversation history into a compressed form and continues from that summary.

The process has three phases:

  1. Tool outputs cleared. Old file reads, grep results, and command outputs are removed or truncated. These are the largest token consumers in any long session, often accounting for 60-80% of total context.
  2. Conversation summarized. The full chat history gets condensed into a structured summary: what was completed, what's in progress, which files were modified, and what the current task is.
  3. CLAUDE.md re-loaded. Your CLAUDE.md files are re-injected fresh from disk. This is the one thing that survives compaction intact, which is why putting critical instructions in CLAUDE.md matters.

The intent is sound: keep sessions running without crashing. The problem is that a 100K+ token conversation compressed to 5K-10K tokens cannot preserve every file path, error message, line number, or architectural decision the agent encountered. The summary captures the gist. The gist is not enough for precise code editing.

Auto-compact is not the same as /compact

The /compact command lets you trigger compaction manually with custom instructions: /compact preserve all file paths, test results, and the current debugging hypothesis. Auto-compact fires based on token count alone, with no awareness of task state. A separate page covers the /compact command and manual compaction strategies.

When Does Auto-Compact Trigger?

Auto-compact triggers when total context usage reaches approximately 95% of the context window. For the standard 200K window, that's roughly 190K tokens total. But that 200K is not all available for your conversation. It's shared with several fixed-cost components:

ComponentTypical tokensSurvives compaction?
System prompt + built-in tools~20,000Yes (always present)
MCP tool schemas900-51,000Yes (biggest variable cost)
CLAUDE.md files300-2,000Yes (re-loaded from disk)
Auto memory (MEMORY.md)~200 linesYes (first 200 lines)
Completion buffer (reserved)~10,000Reserved, not usable
Conversation + tool outputs~100K-140K usableNo (this is what gets summarized)

A session with a few MCP servers starts with only 100K-120K tokens available for actual conversation. Load four or five MCP servers and you can lose another 30K-50K to tool definitions alone. At that point, ten file reads and a few debugging loops push you past the threshold within 15-20 minutes.

~95%
Trigger threshold (% of window)
100-140K
Usable tokens after overhead
15-30 min
Typical time to first compact

The threshold has shifted over time. Early versions of Claude Code triggered at a lower percentage. The current threshold sits near 95%, leaving a small buffer for the model to finish its current response before compaction kicks in. You can monitor your exact position using /context, which displays a visual grid of what's consuming space, or by configuring a status line that continuously shows context percentage.

Check your context budget with /context

> /context

Context window usage:
████████████████████████████████░░░░░░░░  82%

System prompt + tools:  ████████  20,340 tokens
MCP tool schemas:       ██████    15,200 tokens
CLAUDE.md files:        █         1,420 tokens
Conversation + outputs: █████████████████  109,800 tokens
Free space:             ░░░░░░░░  33,240 tokens

⚠ Context approaching auto-compact threshold

What Gets Lost During Compaction

Auto-compact summarizes the conversation to free space. The summary captures high-level progress. It loses precision. Here is what survives and what does not:

InformationAfter compactionImpact
CLAUDE.md instructionsFully preserved (re-loaded from disk)None. This is why CLAUDE.md matters.
File paths modifiedSummarized: 'modified auth middleware'Agent doesn't know which file or where
Line numbersLost entirelyAgent re-reads entire files to find locations
Error messages / stack tracesSummarized or droppedAgent may re-run commands to get them back
Debugging hypotheses'Investigated auth bug' (no specifics)Agent restarts debugging from scratch
Task state / TODO listPartially preservedItems may be missed or repeated
Architecture decisions'Decided to use approach A' (reasoning lost)Agent may choose conflicting approach B
Test results'Tests passed' (no specifics)Agent re-runs tests to verify

The pattern across community reports on GitHub Issues and forums is consistent. Users see four categories of post-compaction degradation:

Forgotten file edits

The agent modified three files to implement a feature. After compaction, it doesn't remember which files it changed or what it changed in them. It re-edits the same file with different logic, creating conflicts.

Repeated completed work

The agent already fixed a bug, ran the tests, confirmed the fix. After compaction, it diagnoses the same bug from scratch and applies a different fix, sometimes reverting the original.

Lost debugging state

Mid-debug, the agent has narrowed the issue to a specific function and input. After compaction, the stack trace, error message, and narrowed hypothesis are gone. It starts over from the top.

Contradictory changes

The agent decided to use approach A based on analysis. After compaction, it picks approach B because the reasoning for A was lost. The codebase now has half of A and half of B.

Compaction compounds on itself

Each auto-compact cycle makes the next one more likely. The agent re-reads files to recover lost context, re-runs commands to verify state, and generates more conversation tokens in the process. This creates a feedback loop where sessions that compact once tend to compact 3-5 more times in rapid succession. Each summary gets further from the original details. By the third or fourth compaction, the agent is working from a summary of a summary of a summary.

The Re-Reading Loop: Why Compaction Gets Worse Over Time

This is the mechanism that makes auto-compact feel so frustrating in practice. It is not a single event but a self-reinforcing cycle:

  1. Context fills up. The agent reads files, runs searches, executes commands. Each operation adds thousands of tokens.
  2. Auto-compact fires. Conversation is summarized. Tool outputs cleared. The agent has a 5K-10K token summary of a 100K+ token session.
  3. Agent needs details the summary lost. The summary says "modified auth.ts" but not which function, which lines, or what the change was.
  4. Agent re-reads the files. It opens auth.ts (2K-5K tokens), re-traces the logic, re-discovers the function it already modified.
  5. Context fills up again. The re-reading dumps the same file contents back into context, consuming the space compaction just freed.
  6. Auto-compact fires again. Another summary, even more compressed. The cycle repeats.

Cognition (the team behind Devin) measured this directly: coding agents spend 60% of their time searching for code. Most of that search is reading files to find specific functions or patterns. After compaction, the agent repeats all that searching. The time between compaction events gets shorter with each cycle because the agent consumes tokens faster trying to recover what it lost.

This is the core insight behind context rot: the progressive degradation of useful information in the context window. Auto-compact is the most visible symptom, but the rot starts earlier, as context fills with full-file reads and verbose command outputs that displace earlier, more relevant information.

The "Context Left Until Auto-Compact" Warning

When context usage passes approximately 80%, Claude Code starts showing a warning: "context left until auto-compact." This is your signal that compaction is approaching. What you do in the next few minutes determines whether you lose critical state.

What to do when you see the warning

Option 1: Manual compact now

Run /compact with specific preservation instructions before auto-compact fires. Example: /compact preserve the file paths I modified, current test failures, and the auth refactoring plan. This produces a much better summary than the automatic one.

Option 2: Finish and clear

If you're near a logical stopping point, finish the current subtask, commit your changes, and run /clear to start a fresh session. The new session gets the full ~140K usable tokens instead of a post-compaction summary.

Option 3: Summarize from checkpoint

Press Esc twice to open the rewind menu. Select 'Summarize from here' on an earlier message. This compresses only the later part of the conversation, keeping your initial instructions and context intact.

Option 4: Delegate remaining work

Spawn a subagent for the remaining task. Subagents get their own fresh 200K context window. The verbose work happens in the subagent's context; only a clean summary returns to your main session.

Configure a status line to monitor context continuously

Instead of waiting for the warning, configure a status line to always show context percentage. Run /statusline show context percentage with a progress bar and Claude Code will set up a persistent display. The /context command shows a detailed one-time breakdown. The context_window.used_percentage field is available for custom status line scripts.

How to Customize the Compact Prompt

You cannot change when auto-compact fires, but you can influence how it summarizes. Two mechanisms let you guide compaction quality:

1. Add compact instructions to CLAUDE.md

Add a section to your project's CLAUDE.md specifying what should be preserved during any compaction (auto or manual). These instructions survive compaction because CLAUDE.md is re-loaded from disk after every compact cycle.

CLAUDE.md compact instructions

# Compact Instructions

When compacting, always preserve:
- All modified file paths with line numbers
- Current test results (pass/fail with file names)
- The active task plan and remaining TODO items
- Error messages and stack traces from the current debug session
- Architecture decisions with their reasoning

2. Use /compact with custom focus

Running /compact with a description tells Claude what to prioritize in the summary:

Manual compact with preservation instructions

> /compact Focus on code samples and API usage

> /compact Preserve all file paths I modified,
  the current test failures in auth.test.ts,
  and the rate limiting implementation plan

> /compact Keep the debugging session state:
  the error in processPayment(),
  the stack trace, and the hypothesis
  about the null reference on line 247

The difference between auto-compact (no instructions) and manual compact (with instructions) is substantial. Auto-compact produces generic summaries like "modified several auth-related files." Manual compact with good instructions preserves "modified src/middleware/auth.ts lines 45-67 (added rate limiter), src/routes/login.ts lines 12-30 (updated session handling)."

Manual /compact vs Auto-Compact

DimensionManual /compactAuto-compact
When it firesWhen you run /compactAutomatically at ~95% context usage
Custom instructionsYes: /compact preserve X, Y, ZNo: uses generic summarization
Task awarenessYou choose the logical breakpointFires based on token count, ignoring task state
Summary qualityHigh: guided by your instructionsVariable: depends on conversation structure
Can be disabledN/A (it's manual)No, cannot be disabled
Summarize from checkpointEsc+Esc lets you summarize from any pointSummarizes entire conversation
CLAUDE.md compact instructionsRespectedRespected (if present in CLAUDE.md)

The best strategy is to compact manually before auto-compact fires. Watch the context percentage (via /context or your status line) and run /compact at logical breakpoints: after finishing a feature, after fixing a bug, after completing a research phase. The summary will be better because you can tell Claude what matters, and because the conversation has a clean structure to summarize.

Alternatively, use the rewind menu (Esc+Esc) and select "Summarize from here" on an earlier message. This keeps your initial instructions intact and only compresses the later portion of the conversation, which is useful when the early context contains important setup instructions you want preserved verbatim.

How Other Coding Tools Handle Compaction

Every coding agent that runs long sessions faces the same context pressure. The approaches differ, but the tradeoff is universal: compress to continue, lose detail in the process.

ToolApproachTradeoff
Claude CodeLLM summarization at ~95% capacityLossy: file paths and debugging state compressed away
OpenAI CodexServer-side compaction after every turnAggressive: compresses continuously, not just at threshold
CursorTruncates old conversation historySimpler but loses early context completely
Devin (Cognition)Multi-model with external memoryComplex: separate search and planning models manage state
WindsurfCascade context management with checkpointsProprietary: limited user control over compaction

Claude Code's approach (LLM summarization) is more sophisticated than truncation but introduces its own risks: hallucinated details in summaries, over-compressed important state, and the re-reading loop described above. The compaction vs summarization tradeoff applies to all of these tools. For a deep comparison of every compaction method including LLMlingua token pruning, observation masking, and context distillation, see the FlashCompact technical comparison.

How to Delay Auto-Compact

You cannot disable auto-compact. But you can significantly extend your time before it fires by reducing how many tokens each operation consumes.

1. Put persistent instructions in CLAUDE.md

CLAUDE.md files load at the start of every session and survive every compaction cycle. Your coding conventions, project structure, key file paths, and workflow rules belong here. After compaction, the agent does not need to re-discover these instructions because they're re-loaded automatically from disk. Keep it under 200 lines for best adherence. See our CLAUDE.md guide for structuring effective instruction files.

2. Break work into focused sessions

One feature per session. When you finish implementing a feature, commit your changes and run /clear to start fresh. The context from task A is pure noise for task B. Starting clean gives you the full ~140K usable tokens instead of a post-compaction summary polluted with stale state from a completed task.

3. Compact manually at logical breakpoints

After finishing a feature or fixing a bug, run /compact with specific instructions before the automatic trigger fires. Manual compaction at clean breakpoints produces summaries that are significantly more useful than auto-compact summaries triggered mid-task.

4. Delegate verbose operations to subagents

Each subagent gets its own isolated 200K context window. Delegate tasks that produce large outputs (running full test suites, searching large codebases, processing log files) to subagents. Only the relevant result returns to your main session. Three parallel subagents give you an effective 600K tokens of working memory without polluting the main conversation. See Anthropic's documentation on subagent isolation.

5. Reduce MCP server overhead

Each MCP server adds tool definitions to your context, even when idle. Run /mcp to see per-server token costs and disable servers you're not actively using. When MCP tool descriptions exceed 10% of your context window, Claude Code automatically defers them via tool search. You can lower the threshold with ENABLE_TOOL_SEARCH=auto:5 to trigger deferral at 5%.

6. Use CLI tools instead of MCP when possible

Tools like gh, aws, gcloud, and sentry-cli are more context-efficient than their MCP equivalents because they don't add persistent tool definitions. Claude can run CLI commands directly without the overhead of maintaining tool schemas in context.

7. Write specific prompts

Vague requests like "improve this codebase" trigger broad scanning that dumps dozens of files into context. Specific requests like "add input validation to the login function in src/auth/login.ts" let Claude work with minimal file reads.

FlashCompact: Preventing the Need for Compaction

Every strategy above works around compaction. FlashCompact addresses the root cause: the context waste that makes compaction necessary in the first place.

The insight is simple. If agents spend 60% of their time searching (dumping entire files into context to find 10-line functions), and file rewrites echo the full file back into context even for 3-line changes, then most context tokens are wasted on content the agent does not need. Reduce the waste and compaction fires less often. When it does fire, the summary is working with higher-signal context, so it produces better results.

FlashCompact has three components:

WarpGrep (search)

Semantic code search that returns only relevant snippets with file paths and line numbers. One WarpGrep call replaces 5-10 sequential file reads. 0.73 F1 score. 8 parallel tool calls per turn.

Fast Apply (edit)

Compact diffs instead of full file rewrites. A 3-line edit in a 200-line file produces ~20 tokens of diff instead of ~2,000 tokens of full content. 10,500 tok/s throughput.

Morph Compact (clean)

Verbatim deletion of noise from context. Unlike summarization (lossy, slow, hallucinates), verbatim deletion is exact. 3,300+ tok/s. Removes what WarpGrep and Fast Apply didn't prevent.

OperationDefault (entire files)With FlashCompact
Finding a function across codebaseRead 5-8 files: 10K-40K tokensWarpGrep semantic search: 500-2K tokens
3-line edit in 200-line fileFull rewrite: ~2,000 tokensFast Apply diff: ~20 tokens
Tracing a dependency chainGrep + read each file: 5K-15K tokensScoped search: 1K-3K tokens
Cleaning verbose command outputStays in context until compactionMorph Compact removes noise: exact deletion
30-minute session token budgetCompacts 2-3 timesCompacts 0-1 times
0.73
WarpGrep F1 score
10,500
Fast Apply tok/s
3,300+
Morph Compact tok/s
3-4x
Fewer compaction events

The effect on the re-reading loop is direct. If searches consume 3K tokens instead of 20K, you can perform 6x more searches before hitting the compaction threshold. If edits consume 20 tokens instead of 2K, you can make 100x more edits. A 30-minute session that would have compacted 3 times with default tooling may not compact at all with FlashCompact. And when it does compact, the context contains higher-signal tokens, so the summary retains more useful detail.

FlashCompact is state-of-the-art on SWE-Bench Pro. The combination of targeted search, compact edits, and verbatim noise removal produces better results than any single compaction method applied after the fact.

Prevention beats compression

Every compaction method (LLM summarization, token pruning, observation masking, context distillation) is reactive: it compresses after context fills. FlashCompact is proactive: it reduces context waste at the source so the compression step is needed less often and works better when it is needed. For a full comparison of all eight compaction methods, see the FlashCompact technical comparison.

Frequently Asked Questions

What does "context left until auto-compact" mean?

It means Claude Code's 200K token context window is filling up. When context usage reaches approximately 95%, Claude Code will automatically summarize the conversation to free tokens. The warning tells you the agent is about to lose detailed memory of file reads, tool outputs, debugging steps, and earlier conversation. Run /compact manually with preservation instructions before it fires, or use /clear if you're at a stopping point.

Can I disable auto-compact in Claude Code?

No. Auto-compact cannot be disabled. It is a safety mechanism that prevents the context window from overflowing and crashing the session. You can delay it by reducing token waste (targeted file reads, compact diffs, subagents for verbose tasks) and by running /compact manually at logical breakpoints. But the automatic trigger cannot be turned off.

What is the auto-compact threshold?

Approximately 95% of the context window. For the standard 200K window, that is roughly 190K total tokens including system prompt, tool definitions, CLAUDE.md, and conversation. The usable conversation space is typically 100K-140K tokens after system overhead, depending on how many MCP servers you have loaded.

Does auto-compact delete my code changes?

No. Auto-compact only compresses the conversation history in memory. Files on disk, git commits, and all code changes are preserved. The risk is that the agent forgets what it changed and may make contradictory edits because the summary lost those details. Your code is safe; the agent's awareness of your code is not.

How do I know when auto-compact is about to fire?

Run /context for a one-time breakdown. Configure a status line with /statusline show context percentage for continuous monitoring. The context_window.used_percentage field is available for custom status line scripts. Context above 80% means compaction is approaching.

Can I customize what auto-compact preserves?

Yes. Add a "Compact Instructions" section to your CLAUDE.md file. These instructions are re-loaded from disk after every compaction, so they apply to both manual and automatic compact. Specify what matters: "preserve all modified file paths with line numbers, test results, and the current task plan."

Why does Claude Code repeat work after compaction?

The summary replaces detailed state with general descriptions. "Modified auth middleware" does not tell the agent which file, which function, or what it changed. The agent re-reads files, re-runs searches, and re-traces logic it already explored, consuming the tokens that compaction just freed. This is the re-reading loop.

What is micro compact?

Micro compact refers to targeted compaction of specific sections of the conversation rather than summarizing everything. The rewind menu (Esc+Esc, then "Summarize from here") is the closest built-in feature: it compresses messages after a selected point while keeping earlier context intact. This gives you more control than full auto-compact, which summarizes the entire conversation.

Related Resources

Stop Wasting Context Tokens

FlashCompact reduces context waste from both reads and writes. WarpGrep replaces 5-10 file reads with one semantic search. Fast Apply uses compact diffs instead of full rewrites. Auto-compact fires 3-4x less often. No configuration changes needed.