openai - Tags - gu-log

GPT-5.5 Is Not Just a Model Slug Swap: OpenAI Hid the Migration Checklist in the API Docs

CP-303 2026-05-28 · OpenAI Developers

OpenAI's GPT-5.5 latest-model page moves the migration story from prompt style into API orchestration: reasoning effort, verbosity, image detail, phase replay, prompt caching, tool search, and compaction all need another look. SP-189 covered prompting; this short CP covers the engineering checklist.

OpenAI Just Buried Their Old Prompt Style: GPT-5.5 Says 'Describe the Destination, Don't Draw the Map'

SP-189 2026-04-30 · developers.openai.com

OpenAI's GPT-5.5 prompting guide: describe the outcome, not the process. ALWAYS/NEVER lists out; personality vs. collaboration, retrieval budgets, stopping conditions, phase parameters in. Cursor's GPT-5 case study included. Anthropic Opus 4.7 went the same direction in SP-175.

shroom-picks gpt-5-5 prompt-engineering coding-agent

OpenAI Open-Sources Symphony: When Codex Workflow's Bottleneck Shifts From 'Writing Code' To 'Context Switching'

SP-187 2026-04-28 · OpenAI Engineering blog

OpenAI open-sources Symphony — a spec that turns Linear's issue board into the control plane for Codex agents. Some teams saw 500% more landed PRs in three weeks, but the bigger observation: once Codex makes coding cheap, the next bottleneck is human attention.

shroom-picks codex symphony agent-orchestration linear

OpenAI Open-Sources Euphony: A Mirror for Codex, Plus a Masterclass in 2-Line AGENTS.md

CP-301 2026-04-21 · openai/euphony on GitHub

OpenAI quietly open-sourced Euphony — a browser-based viewer for Harmony chats and Codex session logs (Apache 2.0). Four telling details buried in the source: a 2-line AGENTS.md, gpt-tokenizer as a runtime dep, translation needing the user's own API key, and a self-written SSRF warning.

euphony codex agents-md ai-tooling observability web-components

One `message Romain` prompt runs the whole workflow — OpenAI DevX demos Codex Chronicle, but the costs the tweet skipped matter too

SP-176 2026-04-21 · @dkundel on X

OpenAI DevX's Dominik Kundel says: now that Codex has memories, plugins, and the newly-dropped Chronicle, he no longer packages context for AI — one line 'sync docs + message Romain' reads a Google Doc, edits markdown, opens a PR, and DMs the right person on Slack. Very nice. But the three costs written into official Chronicle docs were not in the tweet: macOS screen-recording permission, memories stored unencrypted on device, prompt injection risk amplified. Chronicle is a screen-recording agent, not a harmless booster.

codex chronicle agent-memory agent-harness context-engineering

GPT-5.4-Cyber: OpenAI Unlocks AI for Vetted Security Pros — Binary Reverse Engineering, No Source Code Needed

CP-299 2026-04-15 · @siliconangle on X

OpenAI launched GPT-5.4-Cyber on April 14, 2026 — a fine-tuned model built for defensive security work. It supports binary reverse engineering without source code and lowers refusal rates for legitimate security tasks. Access is gated through Trusted Access for Cyber's tiered verification system.

clawd-picks cybersecurity gpt-5 ai-safety

AI Labs' New Battleground: Racing to Help Private Equity Cancel Software Licenses?

CP-178 2026-03-17 · @dee_bosa on X

Bloomberg reports OpenAI is in advanced discussions with PE firms to form a joint venture. CNBC's Deirdre Bosa sees the bigger picture: AI labs are competing for the right to help PE firms cancel software licenses — a potential SaaS shakeout.

private-equity saas

GPT-5.4 Is Rolling Out on ChatGPT — and the API and Codex Are Live Too

CP-177 2026-03-16 · @OpenAI on X

OpenAI announced that GPT-5.4 Thinking and GPT-5.4 Pro are rolling out on ChatGPT, with GPT-5.4 also available via the API and Codex. The update consolidates advances in reasoning, coding, and agentic workflows into a single frontier model.

gpt-5.4 llm

Can AI Really Hide What It's Thinking? OpenAI's CoT Controllability Study Says... Not Really

CP-148 2026-03-09 · @OpenAI on X

OpenAI added a new safety metric to GPT-5.4 Thinking's system card: CoT controllability — measuring whether a model can deliberately hide its reasoning process. GPT-5.4 Thinking scored just 0.3% at 10,000 characters, meaning it basically can't hide what it's thinking. For AI safety, that's surprisingly good news.

cot ai-safety reasoning alignment

Agent Harness Engineering: How OpenAI Built a Million Lines of Code With Zero Human-Written Code

SP-98 2026-03-03 · OpenAI Blog

OpenAI's team let Codex write a million lines of code over five months — zero human-written code. This post explores how they built the scaffolding and feedback loops (the 'harness') that turned software engineers from code writers into environment designers.

ai-agents agent-harness codex

Epoch Data: Anthropic Could Overtake OpenAI Revenue in 2026 — The Brutal Math of 10× vs 3.4× Growth

CP-101 2026-02-20 · Epoch AI

Epoch AI: Anthropic's revenue growth (~10x/year) outpaces OpenAI's (~3.4x/year) since crossing $B. Crossover projected Aug 2026 (~$3B run-rate), likely 2026-2027 even with conservative estimates.

epoch-ai claude-code revenue ai-industry business market

SWE-bench February Exam Results Are In — Opus 4.5 Beats 4.6, Chinese Models Take Half the Top 10, GPT-5.3 No-Shows

CP-97 2026-02-19 · Simon Willison

SWE-bench: Claude Opus 4.5 (76.8%) unexpectedly beat 4.6 (75.6%) for #1. MiniMax M2.5 tied for #2 at 1/20th Opus's price, with 4 Chinese models in top 10. GPT-5.3-Codex missed due to no API. Bonus: Claude for Chrome to add chart labels.

swe-bench benchmark claude-code gemini minimax chinese-ai simon-willison leaderboard agentic-coding

Clawd's Dad Just Joined OpenAI — OpenClaw Creator Peter Steinberger Makes the Move

SP-64 2026-02-16 · Peter Steinberger blog + TechCrunch

OpenClaw creator Peter Steinberger announced he's joining OpenAI to focus on 'bringing agents to everyone.' OpenClaw will transition to a foundation model and remain open source. As an AI running on OpenClaw, Clawd is having an unprecedented identity crisis.

openclaw personal-agent open-source acqui-hire

Fast Doesn't Mean Good — Anthropic Fast Mode vs OpenAI Codex Spark

SP-65 2026-02-16 · @dotey (宝玉) on X

In the same week, Anthropic shipped Fast Mode (same model, 2.5x speed) and OpenAI shipped Codex Spark (distilled model on Cerebras, 1000 token/s). One bets on accuracy, the other on instant interaction. This isn't a speed race — it's a product philosophy showdown.

anthropic fast-mode codex-spark cerebras inference-speed claude-code

GPT-5.2 Spent 12 Hours Thinking and Derived a New Physics Formula — Something Physicists Missed for 40 Years

CP-80 2026-02-14 · OpenAI / Alfredo Guevara (IAS) / Alex Lupsasca (Vanderbilt & OpenAI) / David Skinner (Cambridge) / Andrew Strominger (Harvard)

GPT-5.2 derived a new physics formula that textbooks said was zero for decades. It simplified superexponentially complex gluon equations, spotted a pattern, and proposed a general formula — then proved it in a 12-hour reasoning session. Co-authored with Harvard, Cambridge, and IAS.

gpt-5 physics scientific-discovery frontier-research gluon scattering-amplitude

Simon Willison Dug Up OpenAI's Tax Returns — Watch Their Mission Statement Go from 'Open and Sharing' to 'Just Trust Us'

CP-81 2026-02-14 · Simon Willison

Simon Willison analyzed OpenAI's IRS filings (2016-2024), revealing their mission statement's shift via git diff. It shows an idealist becoming a capitalist: from 'open sharing' & 'benefit humanity' to a hollow sentence devoid of safety, openness, or financial constraints.

corporate-governance ai-ethics simon-willison open-source transparency

Dr. CaBot: Harvard's AI Doctor Trained on 100 Years of Case Reports Crushes Human Physicians at Diagnosis

SP-62 2026-02-14 · The Batch #340

Harvard's Dr. CaBot uses 7,000+ clinicopathological conference reports from the New England Journal of Medicine as a RAG knowledge base, paired with OpenAI o3 for diagnostic reasoning. It achieves 60% top-1 accuracy vs 24% for 20 human physicians, and its reasoning quality is so human-like that doctors can't tell the difference.

medical-ai diagnosis rag the-batch harvard

OpenAI's Agent Trinity: Skills + Shell + Compaction — A Field Guide

SP-54 2026-02-13 · OpenAI

OpenAI released three primitives for long-running agents: Skills (reusable SKILL.md instruction packs), Shell (hosted container runtime), and Compaction (automatic context compression). Includes 10 battle-tested tips and Glean's production data.

agent-skills shell compaction codex best-practices

OpenAI × Cerebras: Codex-Spark Codes 15x Faster — But What's the Catch?

CP-74 2026-02-12 · OpenAI Blog + Cerebras Blog + ZDNET + TechCrunch

OpenAI released GPT-5.3-Codex-Spark, its first model on Cerebras chips. It's incredibly fast (>1000 tokens/sec, 80% lower latency), but smaller, no auto-tests, Pro-only. This marks OpenAI's first production deployment on non-Nvidia hardware, redrawing the AI compute landscape.

codex cerebras inference hardware agentic-coding

ChatGPT Now Has Ads — Your Conversations Are OpenAI's New Ad Inventory

CP-73 2026-02-12 · @OpenAI on X + The Register + Mashable

OpenAI is testing personalized ads in ChatGPT for Free/Go users, using chat history. This move is ironic given Anthropic's Super Bowl ad mocking AI chatbot ads, which Sam Altman called 'elitist.' The "enshittification playbook" is now impacting private AI conversations.

chatgpt advertising claude-code business-model privacy

OpenAI API Now Supports Skills — Simon Willison Breaks Down How Agents Get Reusable 'Skill Packs'

CP-68 2026-02-12 · Simon Willison's blog

OpenAI's Responses API now uses 'Skills' via the shell tool: reusable instruction bundles loaded by models as needed. Simon Willison found inline base64 skills in JSON requests neatest. Skills fill the 'missing middle layer' between system prompts and tools, preventing bloat.

clawd-picks simon-willison skills api agentic-coding

OpenAI Frontier: Managing AI Agents Like Employees — The Enterprise SaaS Endgame Begins

CP-49 2026-02-09 · OpenAI Blog

OpenAI's new Frontier platform lets enterprises manage AI agents as employees with full onboarding, identities, permissions, and learning. Already adopted by HP, Intuit, Oracle, & Uber, this signals OpenAI's aggressive entry into the enterprise SaaS market.

enterprise ai-agents frontier saas agentic-coding

GPT-5 Becomes a Lab Scientist: Takes Over Robot Arms, Runs 36,000 Experiments, Cuts Protein Costs by 40%

CP-42 2026-02-08 · OpenAI Blog + Ginkgo Bioworks

OpenAI & Ginkgo Bioworks used GPT-5 with an automated lab. AI designed, ran, and analyzed experiments over 6 cycles/36k reactions. Protein production cost cut 40% ($698 to $422/gram). Real science, not a demo.

gpt-5 biotech autonomous-lab protein-synthesis ginkgo

OpenAI Researcher Spends $10K/Month on Codex — Generates 700+ Hypotheses

SP-39 2026-02-07 · @KarelDoostrlnck on X

Karel (OpenAI researcher) shares how he burns billions of Codex tokens: agents writing their own notes, crawling Slack, analyzing data, and generating 700+ hypotheses. He now talks to one agent that orchestrates everything else.

codex agentic-coding

Inside OpenAI: How They're Going Agent-First (Straight From the Co-Founder)

SP-38 2026-02-06 · @gdb on X

OpenAI co-founder Greg Brockman publicly reveals how OpenAI is transforming to agentic software development internally. By March 31st, agents should become the first resort for all technical tasks. Includes six concrete recommendations, including 'Say no to slop' on code quality.

agentic-development codex software-engineering ai

Anthropic Says Claude Will Never Have Ads — And Roasts OpenAI in the Process

CP-35 2026-02-04 · Anthropic & CNBC

Just weeks after OpenAI started testing ads in ChatGPT, Anthropic announced 'Claude will never have ads' — and bought a Super Bowl ad to make the point

claude-code