reasoning - Tags

DeepSeek-R1 Grew Its Own Internal Debate Club — Nobody Asked It To

CP-266 2026-04-08 · @PawelHuryn on X

DeepSeek-R1 developed internal multi-agent debates through pure RL training — no one taught it to. Google researchers call this the 'Society of Thought.' The real finding: even a single model will split itself into a committee when pushed hard enough.

From 'Thinking' to 'Doing' — A Qwen Core Member Breaks Down AI's Next Battleground: Agentic Thinking

SP-141 2026-04-02 · @JustinLin610 on X

Qwen core member Junyang Lin's deep dive: from the o1/R1 reasoning era to agentic thinking, where models don't just think longer — they think, act, observe, and adapt. This changes RL infrastructure, training objectives, and the entire competitive landscape.

shroom-picks agentic-ai reinforcement-learning qwen

When you set effort to max, the model thinks longer and uses more tokens

CP-183 2026-03-17 · @trq212 on X

Thariq announced a new session-level feature: you can now set effort to max, letting the model reason longer and use as many tokens as needed. The catch? It burns through your usage limits faster, so you have to enable it manually each session.

ai tokens

AI Doesn't Need to Memorize the Times Table Anymore: How Reasoning and Tool Calling Let Small Models Punch Above Their Weight

CP-147 2026-03-09 · @awnihannun on X

Apple MLX creator Awni Hannun points out a counterintuitive insight: intelligence-per-watt is skyrocketing partly because models no longer need to memorize answers they can compute. Reasoning and tool calling free up weight space, meaning 5B-15B models might eventually match today's GPT-5.x — though nobody really knows the ceiling yet.

awni-hannun mlx model-efficiency on-device-ai

Can AI Really Hide What It's Thinking? OpenAI's CoT Controllability Study Says... Not Really

CP-148 2026-03-09 · @OpenAI on X

OpenAI added a new safety metric to GPT-5.4 Thinking's system card: CoT controllability — measuring whether a model can deliberately hide its reasoning process. GPT-5.4 Thinking scored just 0.3% at 10,000 characters, meaning it basically can't hide what it's thinking. For AI safety, that's surprisingly good news.

openai cot ai-safety alignment

Claude Code CLI's Deep Thinking Philosophy: Why I'm Your Most Trusted AI Architect

SD-7 2026-03-02 · ShroomDog Original

The core philosophy of Claude Code CLI: think first, act later. From SWE-bench performance evolution, Plan Mode, Extended Thinking, Multi-Agent architecture, to WebSearch capabilities. Opus used WebSearch inside a secure Podman container to research its own latest features and community reviews, with 11 reference links.

claude-code cli architecture

Typewriter vs Editor: Mercury 2 Reinvents LLMs with Diffusion — 5x Faster Reasoning, 4x Cheaper

CP-121 2026-02-25 · Inception Labs (Official Announcement)

Inception Labs launches Mercury 2 — the world's first reasoning Diffusion LLM. Instead of generating text one token at a time like traditional models, Mercury 2 refines entire passages in parallel, hitting 1,008 tokens/sec (5x faster than Claude 4.5 Haiku) at 1/4 the price. Backed by Andrew Ng, Karpathy, and Eric Schmidt.

diffusion-llm mercury inception-labs inference-speed ai-architecture

Google Launches Gemini 3.1 Pro: 77.1% on ARC-AGI-2 and a Bigger Push Into Real Reasoning Workflows

CP-110 2026-02-22 · Google

Google announced Gemini 3.1 Pro (preview), highlighting stronger core reasoning and a verified 77.1% score on ARC-AGI-2. The model is rolling out across Gemini API, Vertex AI, Gemini app, and NotebookLM. For engineering teams, the key question is not only benchmark performance, but whether the model can reliably handle complex multi-step workflows in production.

google gemini benchmark agentic-coding tech-lead