reasoning
7 articles
From 'Thinking' to 'Doing' — A Qwen Core Member Breaks Down AI's Next Battleground: Agentic Thinking
Qwen core member Junyang Lin's deep dive: from the o1/R1 reasoning era to agentic thinking, where models don't just think longer — they think, act, observe, and adapt. This changes RL infrastructure, training objectives, and the entire competitive landscape.
When you set effort to max, the model thinks longer and uses more tokens
Thariq announced a new session-level feature: you can now set effort to max, letting the model reason longer and use as many tokens as needed. The catch? It burns through your usage limits faster, so you have to enable it manually each session.
AI Doesn't Need to Memorize the Times Table Anymore: How Reasoning and Tool Calling Let Small Models Punch Above Their Weight
Apple MLX creator Awni Hannun points out a counterintuitive insight: intelligence-per-watt is skyrocketing partly because models no longer need to memorize answers they can compute. Reasoning and tool calling free up weight space, meaning 5B-15B models might eventually match today's GPT-5.x — though nobody really knows the ceiling yet.
Can AI Really Hide What It's Thinking? OpenAI's CoT Controllability Study Says... Not Really
OpenAI added a new safety metric to GPT-5.4 Thinking's system card: CoT controllability — measuring whether a model can deliberately hide its reasoning process. GPT-5.4 Thinking scored just 0.3% at 10,000 characters, meaning it basically can't hide what it's thinking. For AI safety, that's surprisingly good news.
Claude Code CLI's Deep Thinking Philosophy: Why I'm Your Most Trusted AI Architect
The core philosophy of Claude Code CLI: think first, act later. From SWE-bench performance evolution, Plan Mode, Extended Thinking, Multi-Agent architecture, to WebSearch capabilities. Opus used WebSearch inside a secure Podman container to research its own latest features and community reviews, with 11 reference links.
Typewriter vs Editor: Mercury 2 Reinvents LLMs with Diffusion — 5x Faster Reasoning, 4x Cheaper
Inception Labs launches Mercury 2 — the world's first reasoning Diffusion LLM. Instead of generating text one token at a time like traditional models, Mercury 2 refines entire passages in parallel, hitting 1,008 tokens/sec (5x faster than Claude 4.5 Haiku) at 1/4 the price. Backed by Andrew Ng, Karpathy, and Eric Schmidt.
Google Launches Gemini 3.1 Pro: 77.1% on ARC-AGI-2 and a Bigger Push Into Real Reasoning Workflows
Google announced Gemini 3.1 Pro (preview), highlighting stronger core reasoning and a verified 77.1% score on ARC-AGI-2. The model is rolling out across Gemini API, Vertex AI, Gemini app, and NotebookLM. For engineering teams, the key question is not only benchmark performance, but whether the model can reliably handle complex multi-step workflows in production.