Lost in the Middle LLM: The U-Shaped Attention Problem Explained

The lost-in-the-middle effect is a well-documented phenomenon where LLMs perform significantly worse when relevant information sits in the middle of their context rather than at the beginning or end. Liu et al. (2024) measured a 30%+ accuracy drop on multi-document question answering when the answer document moved from position 1 to position 10 in a 20-document context. This is not a minor edge case. It affects every production LLM.

30%+

Accuracy drop when info is in the middle

Frontier models tested by Chroma (all affected)

U-shape

Attention curve across context positions

21.4%

Accuracy gain from prompt compression

What Is the Lost-in-the-Middle Effect

When you give an LLM a long input, it does not attend to all parts equally. Tokens at the beginning and end of the context receive disproportionately strong attention. Tokens in the middle receive less. The result: if the answer to your question happens to be located in the middle third of the context, the model is substantially more likely to miss it, hallucinate, or give a wrong answer.

This is the lost-in-the-middle effect, named after the 2024 TACL paper by Nelson F. Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang. The paper tested multiple frontier models on two tasks: multi-document question answering and key-value retrieval. Both tasks showed the same pattern. Performance is highest when relevant information is at the beginning or end. It drops significantly when information is in the middle.

The core finding

LLMs do not use long contexts uniformly. They exhibit a U-shaped performance curve: strong at the edges, weak in the middle. This holds across models, tasks, and context lengths.

The Original Research: Liu et al. (2024)

The study was first released as an arXiv preprint in July 2023 (arXiv:2307.03172) and published in Transactions of the Association for Computational Linguistics (TACL) in 2024. The researchers evaluated models on two controlled tasks:

Multi-document QA: The model receives 10, 20, or 30 documents, only one of which contains the answer. The researchers varied the position of the relevant document and measured accuracy at each position.
Key-value retrieval: The model receives a JSON object with up to 300 key-value pairs and must return the value for a specific key. This synthetic task isolates positional effects from semantic complexity.

On the multi-document QA task with 20 documents, models achieved roughly 75% accuracy when the answer was in document 1 (the start) and roughly 72% when it was in document 20 (the end). When the answer was in document 10 (the middle), accuracy dropped to around 55%. That is a 20+ percentage point drop from position alone.

Position	Accuracy	Observation
Position 1 (start)	~75%	Primacy effect: strong attention to first tokens
Position 5	~62%	Already declining from the start
Position 10 (middle)	~55%	Bottom of the U-curve: the blind spot
Position 15	~63%	Performance recovering toward end
Position 20 (end)	~72%	Recency effect: strong attention to last tokens

The key-value retrieval task showed even sharper degradation. Some models that achieved near-perfect accuracy at the start and end positions fell to below 40% in the middle. The effect was not model-specific. It appeared across GPT-3.5 Turbo, GPT-4, Claude 1.3, MPT-30B-Instruct, and LLaMA-2 variants.

The U-Shaped Attention Curve

The pattern Liu et al. documented mirrors a well-known phenomenon in human cognition: the serial position effect. Humans remember items at the beginning of a list (primacy) and the end (recency) better than items in the middle. LLMs exhibit the same bias, but for architectural rather than cognitive reasons.

When you plot accuracy against document position, the curve forms a U shape. High at the edges, low in the middle. This holds regardless of:

The number of documents (10, 20, or 30)
The specific model tested
Whether documents are shuffled randomly
The semantic content of the documents

A follow-up study, "Uncovering the Role of Initial Saliency in U-Shaped Attention Bias" (2024), confirmed that the U-shaped pattern persists even after randomly shuffling document order. The bias does not depend on document content. It is a property of how the model processes sequential positions.

Primacy Bias

Models attend strongly to the first tokens in the sequence. Information placed at the beginning of context gets disproportionate weight in the output.

Middle Blind Spot

Tokens in the center of the context receive the lowest attention weights. This is where information gets lost, even when the model's context window is far from full.

Recency Bias

The most recent tokens also receive elevated attention. Models weight the end of context similarly to the beginning, completing the U shape.

Why It Happens: RoPE and Positional Encoding Bias

The technical root cause lies in Rotary Position Embedding (RoPE), the positional encoding method used in most modern transformer-based LLMs including LLaMA, Mistral, Qwen, and their derivatives. RoPE encodes position by rotating query and key vectors in the attention mechanism. The dot product between queries and keys naturally decays for tokens that are far apart in the sequence.

This decay is by design. It helps models prioritize nearby tokens, which is useful for many language tasks. But it has a side effect: tokens in the middle of a long sequence end up in a low-attention zone. They are far from the beginning (where initial saliency is high) and far from the end (where recency effects dominate).

How RoPE creates the attention blind spot

Sequence: [tok_1, tok_2, ..., tok_500, ..., tok_999, tok_1000]

Attention from tok_1000 (the query position):
  → tok_1    : High (initial saliency / primacy)
  → tok_999  : High (close proximity / recency)
  → tok_500  : Low  (far from both ends / middle decay)

Result: Information at tok_500 gets the least attention weight,
regardless of its relevance to the task.

"Found in the Middle" (UW/MIT/Google, ACL 2024) proposed a calibration method to counteract this bias. By adjusting attention weights to compensate for positional bias, they improved retrieval accuracy on middle-positioned information by up to 15 percentage points. But this requires modifying the model's inference behavior, which is not possible through the API for most production models.

Subsequent Research Confirmations

The lost-in-the-middle effect has been confirmed and extended by multiple independent research groups since the original 2023 paper.

Chroma's Context Rot Study (2025)

Chroma Research tested 18 frontier models on context utilization tasks and found that performance degrades non-uniformly as input length increases. They coined the term "context rot" to describe this degradation. Key findings: accuracy drops of 20-50% from 10K to 100K tokens, Claude models decay the slowest but are not immune, and adding full conversation history (~113K tokens) can drop accuracy by 30% compared to a focused 300-token version.

Context Length Alone Hurts Performance (EMNLP 2025)

Du et al. (2025) proved something even more concerning: context length alone degrades performance, independent of retrieval quality. Even when irrelevant tokens are replaced with whitespace and models are forced to attend only to relevant tokens, performance still drops 13.9% to 85% as input length increases. This means the problem is not just about distraction. The sheer volume of tokens interferes with reasoning.

Study	Year	Key Finding
Liu et al. (Stanford/TACL)	2024	30%+ accuracy drop when info is in middle positions
Found in the Middle (UW/MIT/Google)	2024	Calibrating attention bias improves middle retrieval by 15%
Chroma Context Rot	2025	All 18 frontier models degrade with longer context
Du et al. (EMNLP)	2025	Context length alone hurts, even with perfect retrieval
LongLLMLingua (Microsoft)	2024	Prompt compression improves accuracy by 21.4% at 4x reduction

Impact on Coding Agents

Coding agents are the worst case for the lost-in-the-middle effect. Unlike single-turn QA, coding agents accumulate context over many steps. Each file read, grep result, error message, and dead-end exploration stays in the context window. The relevant information gets pushed to the middle as new tokens are appended.

How agents push relevant info into the blind spot

Turn 1: Read issue description              →    500 tokens  [START]
Turn 2: Search for relevant files            →  3,000 tokens
Turn 3: Read file A (wrong file)             →  2,000 tokens
Turn 4: Read file B (wrong file)             →  2,500 tokens
Turn 5: Read file C (THE RIGHT FILE)         →  1,800 tokens  [MIDDLE]
Turn 6: Read test file for context           →  3,000 tokens
Turn 7: Read config file                     →  1,500 tokens
Turn 8: Agent tries to edit file C           → needs to recall [MIDDLE]
                                                ↑ file C is now buried
                                                  in the attention blind spot

By turn 8, the model has accumulated over 14,000 tokens. The file it needs to edit (file C from turn 5) now sits in the middle of the context. The model attended to it strongly when it was first read, but that attention has been diluted by subsequent tokens. When the model needs to recall the exact function signatures, line numbers, and surrounding code from file C, it is working from a degraded representation.

Search Traces Accumulate

Agents spend over 60% of their first turn retrieving context. Every grep result and file read stays in the window, pushing earlier relevant findings into the middle.

Relevant Info Drifts to the Middle

The file the agent actually needs to edit is rarely the last thing it reads. It is usually discovered partway through exploration, placing it squarely in the U-curve blind spot.

This is why agents hallucinate file paths, misremember function signatures, and produce edits that reference code from the wrong file. The model is not incapable of the task. Its attention is focused on the wrong part of the context.

Practical Mitigations

Several strategies can reduce the impact of the lost-in-the-middle effect. They vary in effectiveness and practicality.

1. Strategic Context Ordering

Place the most important information at the beginning and end of the prompt. For RAG systems, this means putting the highest-ranked documents first and last, with lower-ranked documents in the middle. This works with the model's natural bias rather than against it.

2. Attention Calibration

Research methods like Found in the Middle and Ms-PoE (Multi-scale Positional Encoding) can recalibrate attention weights to reduce positional bias. These require model modifications and are not available through standard API access.

3. Aggressive Retrieval Filtering

Retrieve generously during the initial stage, then aggressively filter during reranking. Keep only the 3-5 most relevant documents for generation. Fewer documents means less middle region for information to get lost in.

4. Context Isolation via Subagents

Delegate search tasks to subagents operating in their own context windows. The parent model never sees the search traces. It receives only a condensed summary of results. This eliminates the accumulation pattern that pushes relevant information into the middle.

5. Context Compression

The most broadly applicable mitigation. Compressing context removes low-signal tokens, shrinking the total length and eliminating the large middle region where information gets lost. This works at the API level. No model modifications required.

Strategy	Effectiveness	Practical Access
Context ordering	Moderate (helps RAG, not agents)	Any API
Attention calibration	High (up to 15% improvement)	Requires model modification
Retrieval filtering	Moderate (limits recall)	Any RAG pipeline
Context isolation	High (prevents accumulation)	Multi-agent architectures
Context compression	High (eliminates the middle)	Any API, any model

Context Compression: Eliminating the Middle

Context compression addresses the lost-in-the-middle effect at its root. If your context is short enough, there is no middle region large enough for information to get lost in. A 2,000-token context does not have a meaningful U-curve problem. A 50,000-token context does.

Microsoft's LongLLMLingua was the first to demonstrate this connection directly. By compressing prompts at a 4x ratio, they improved accuracy by up to 21.4 percentage points on RAG tasks. The improvement came specifically from reducing the middle region where the original uncompressed prompts were losing information.

50-70%

Token reduction with Morph Compact

3,300+

Tokens per second processing speed

98%

Verbatim accuracy (no rewriting)

21.4%

Accuracy gain from compression (LongLLMLingua)

Morph Compact takes a deletion-based approach to compression. Instead of rewriting or summarizing context, it identifies low-signal tokens and removes them. Every surviving sentence is word-for-word identical to the original. This means no hallucination risk from the compression step itself, and the compressed output is short enough that the model's attention stays concentrated across the full context.

Why deletion beats rewriting for this problem

Summarization-based compression rewrites the original context, which can introduce errors. For the lost-in-the-middle problem specifically, you want to preserve exact quotes, code snippets, and file paths while removing filler. Deletion-based compaction achieves this without touching the content that survives.

The connection is direct: the lost-in-the-middle effect is a function of context length. Context compression reduces context length. Shorter context means no blind spot. For coding agents, this translates to fewer hallucinated file paths, more accurate edits, and faster task completion.

Effective context management combines compression with strategic ordering and isolation. But compression is the one technique that works universally, across any model and any API, without requiring architectural changes or multi-agent orchestration.

Frequently Asked Questions

What is the lost-in-the-middle effect in LLMs?

The lost-in-the-middle effect is a demonstrated phenomenon where LLMs perform significantly worse when relevant information is placed in the middle of the input context rather than at the beginning or end. Liu et al. (2024) showed performance drops of over 30% on multi-document QA when the answer document was at position 10 out of 20, compared to position 1 or 20.

What causes the U-shaped attention curve?

The U-shaped attention curve is caused by positional encoding biases in the transformer architecture. Rotary Position Embedding (RoPE), used in most modern LLMs, introduces a decay effect that makes models attend more strongly to tokens at the beginning and end of sequences. This mirrors the primacy and recency effects observed in human cognition, but the cause is architectural, not cognitive.

Have newer models fixed the lost-in-the-middle problem?

Newer models have reduced the severity but not eliminated it. Chroma's 2025 study tested 18 frontier models including GPT-4.1, Claude Opus 4, and Gemini 2.5. All showed performance degradation as input length increased. Claude models decayed the slowest, but no model was immune.

How does the lost-in-the-middle effect impact coding agents?

Coding agents accumulate context over multi-step tasks. Search results, file reads, and dead-end explorations pile up. Relevant information discovered mid-session gets pushed to the middle of the context as new tokens are appended. The model then struggles to recall exact code, function signatures, and file paths when it needs them for edits.

What is the best mitigation for the lost-in-the-middle effect?

Context compression is the most broadly effective mitigation. Shorter context eliminates the middle region where information gets lost. Microsoft's LongLLMLingua improved accuracy by up to 21.4% at 4x compression. Morph Compact achieves 50-70% token reduction with 98% verbatim accuracy, keeping context short enough to avoid the U-curve blind spot entirely.

Does context ordering help with the lost-in-the-middle problem?

Placing important information at the beginning and end of the prompt helps for RAG systems where you control document order. For coding agents, context ordering is less practical because relevant information is discovered during exploration and its position is determined by when it was found, not by design. Compression is more effective because it works regardless of information position.

Keep Context Short. Kill the Blind Spot.

Morph Compact compresses context by 50-70% through deletion, not rewriting. Shorter context means no middle region for information to get lost in. 3,300+ tokens/sec, 98% verbatim accuracy.

Try Morph Compact

Read: Context Compression

Morph Fast Apply

Morph WarpGrep

Morph Glance

Morph MCP

Morph Monitor

Lost in the Middle: Why LLMs Ignore What You Put in the Center of Context