long-context
2 articles
Dan McAteer's verdict: Opus 4.6 has no real competition at 1 million tokens
Dan McAteer shares his long-context observations: Opus 4.6 performs best at 1 million tokens with 78% accuracy, Sonnet 4.6 is the closest competitor, and GPT-5.4 actually regressed compared to GPT-5.2 at long context.
MIT Research: Making LLMs Recursively Call Themselves to Handle 10M+ Tokens
When you stuff too much into a context window, models get dumber — that's context rot. MIT proposes Recursive Language Models (RLMs), letting LLMs recursively call themselves in a Python REPL to handle massive inputs. GPT-5-mini + RLM beats vanilla GPT-5 on hard tasks, and it's cheaper too.