gemini - Tags - gu-log

Google AI Weekly Roundup: Maps, Workspace, Chrome, Gemini API All Moving at Once

CP-184 2026-03-17 · @GoogleAI on X

Google AI dropped a weekly roundup tweet covering Google Maps, Workspace, Gemini Embedding 2, Gemini API controls, and Gemini in Chrome rollout. It also mentioned a breast cancer research collaboration with Imperial College London and the UK NHS — making this one tweet span products, developer tools, and research.

Gemini API Finally Gets Spend Caps — Now You Can Actually Let CI and Agents Off the Leash

CP-187 2026-03-17 · @simonw on X

Simon Willison shared Gemini API's new spend caps feature, saying it's great news for anyone running Gemini prompts in CI or letting agents experiment with the API — less fear of surprise bills.

api

Command an AI Army from Your Chat App — OpenClaw ACP Lets You Run Codex, Claude Code, and Gemini from Discord / Telegram

SP-89 2026-03-09 · OpenClaw Docs

OpenClaw's ACP lets you spawn Codex, Claude Code, and Gemini from Discord/Telegram chat. Now with Telegram topic binding, persistent bindings that survive restarts, ACP Provenance for audit trails, and more. (Updated 2026-03-09)

openclaw acp agent-client-protocol ai-agents codex claude-code multi-agent agentic-coding

Google Launches Gemini 3.1 Pro: 77.1% on ARC-AGI-2 and a Bigger Push Into Real Reasoning Workflows

CP-110 2026-02-22 · Google

Google announced Gemini 3.1 Pro (preview), highlighting stronger core reasoning and a verified 77.1% score on ARC-AGI-2. The model is rolling out across Gemini API, Vertex AI, Gemini app, and NotebookLM. For engineering teams, the key question is not only benchmark performance, but whether the model can reliably handle complex multi-step workflows in production.

google reasoning benchmark agentic-coding tech-lead

Picking AI Is No Longer Just About Models — Ethan Mollick's 'Model / App / Harness' Framework Explains the Entire 2026 AI Landscape

CP-99 2026-02-19 · Ethan Mollick (One Useful Thing)

Ethan Mollick's game-changing AI framework: Model, App, Harness. The same AI (e.g., Claude Opus 4.6) performs vastly differently across layers. Mollick used Claude Code to turn GPT-1's 117M weights into 80 books in ~1 hour, selling out immediately.

ethan-mollick ai-guide models harness claude-code chatgpt agentic-coding framework

SWE-bench February Exam Results Are In — Opus 4.5 Beats 4.6, Chinese Models Take Half the Top 10, GPT-5.3 No-Shows

CP-97 2026-02-19 · Simon Willison

SWE-bench: Claude Opus 4.5 (76.8%) unexpectedly beat 4.6 (75.6%) for #1. MiniMax M2.5 tied for #2 at 1/20th Opus's price, with 4 Chinese models in top 10. GPT-5.3-Codex missed due to no API. Bonus: Claude for Chrome to add chart labels.

swe-bench benchmark claude-code minimax chinese-ai openai simon-willison leaderboard agentic-coding