reinforcement-learning
2 articles
From 'Thinking' to 'Doing' — A Qwen Core Member Breaks Down AI's Next Battleground: Agentic Thinking
Qwen core member Junyang Lin's deep dive: from the o1/R1 reasoning era to agentic thinking, where models don't just think longer — they think, act, observe, and adapt. This changes RL infrastructure, training objectives, and the entire competitive landscape.
Kimi K2.5 Trains an Agent Commander with RL — SemiAnalysis Tests Show Claude Agent Teams Are Actually Slower and More Expensive
SemiAnalysis: Kimi K2.5's agent swarm uses an RL-trained 'orchestrator' (not prompt magic). Claude Agent Teams were slower, pricier, & scored lower. Multi-agent is shifting from 'prompt engineering' to 'distributed scheduling.'