Three frontier models in seven days — OpenAI, Anthropic, Moonshot. China shipped one on Huawei silicon. Microsoft quietly cancelled the clause that was supposed to make AGI mean something.
BEATS 06
DISPATCHES 13
CHAIN MYTHOS × 03
PUBLISHED 2026-04-27
I.
THE MODELS LAND, AGAIN
Three frontier shipments, three governance models, three pricing curves. Opus 4.7 came admitting it's the second-best. Kimi K2.6 came with weights. GPT-5.5 came doubled.
14FIELD REPORT
OpenAI Shipped GPT-5.5. They Called It Spud.
First retrained base model since GPT-4.5. Two-times the floor price.
On April 23rd, OpenAI shipped GPT-5.5 — codename Spud, model card published the same morning. It is the first fully retrained base model since GPT-4.5; every release in between was an incremental update on the same architectural foundation. Natively omnimodal across text, images, audio, and video in a single pass. List price lands at $5 per million input tokens and $30 per million output — twice the floor of GPT-5.4. GPT-5.5 Pro arrives the next day at $30 / $180.
The numbers that move are not the headline scores — they're the long-context numbers. MRCR v2 at one million tokens jumped from 36.6% on GPT-5.4 to 74.0% on 5.5 — more than doubling the recall the field had been stuck on. Terminal-Bench 2.0 hit 82.7% (Opus 4.7 sits at 69.4%, Gemini 3.1 Pro at 68.5%). FrontierMath Tier 4 cleared 35.4% against Opus 4.7's 22.9%. The one place 5.5 doesn't lead is SWE-Bench Pro — Anthropic still owns that lane at 64.3%. OpenAI says 5.5 uses roughly 40% fewer output tokens to complete equivalent Codex tasks, which is what makes the doubled list price defensible.
The mid-cycle is over. From GPT-4.5 to GPT-5.4 was three years of fine-tunes on one base; from GPT-5.4 to GPT-5.5 is twelve months and a new floor. Every lab now has the same problem in a different colour — the only way to justify a price increase is a retrain, and a retrain costs a Stargate. The companies that can afford the next floor will keep widening it. The ones that cannot will be priced into the layer above their floor, where the margin is.
Claude Opus 4.7 shipped on April 16th. 87.6% on SWE-bench Verified — a 6.8-point step up from Opus 4.6. 94.2% on GPQA Diamond. 64.4% on Finance Agent, a state-of-the-art. One-million-token context, three-times the vision resolution, a new `xhigh` effort level. Same $5 / $25 per million pricing Anthropic has held since 4.0. Available same day in Claude Code, Bedrock, Vertex, and the API.
The line that survives the issue is in the Axios launch piece: Anthropic openly conceded that Opus 4.7 does not match the unreleased model the company has been calling Mythos since the week-01 source-map leak. Mythos Preview clears 93.9% on SWE-bench Verified — 13 points over what the public version of 4.6 could do, and six points over 4.7 — and remains gated to Project Glasswing: AWS, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorgan Chase, Linux Foundation, Microsoft, NVIDIA, Palo Alto Networks. Defensive cybersecurity use only.
The shape of the disclosure is what matters. The lab is shipping its second-best model and saying so. That is either an honesty unprecedented at frontier scale or a marketing posture engineered to make the gate look like a public good. Either reading is a story about a market in which "best available" and "best built" have stopped being the same sentence — and the gap between the two is now the lever Anthropic is pulling on.
Moonshot AI shipped Kimi K2.6 on April 20th — a one-trillion-parameter Mixture-of-Experts model with 32 billion parameters active per token, 262,144-token context, INT4 quantization) native, and a Modified MIT license. 80.2% on SWE-bench Verified. 58.6% on SWE-bench Pro — ahead of GPT-5.4 at 57.7, ahead of Claude Opus 4.6 at max effort, ahead of Gemini 3.1 Pro. API list price is $0.60 input, $2.50 output per million.
The case study Moonshot used to ship the model is the case to read. K2.6 was given a Qwen3.5-0.8B inference engine in Python and instructed to port it to Zig on a Mac. It ran continuously for twelve hours. Four thousand tool calls across fourteen iterations. Throughput went from roughly fifteen tokens per second to one hundred and ninety-three. No human in the loop. The model that came out the other end is not the model that went in — and the model that went in is itself one-trillion parameters.
Frontier coding capability has crossed into open-weight on Modified MIT terms. The gap between what you can clone and what you must rent is now measured in single-digit SWE-bench points. For the floor of builders willing to host their own, the only argument left for the API is operational — and the operational argument loses against a fixed-cost GPU once the agentic workloads start running overnight.
On April 24th, DeepSeek released V4 — two variants in one drop. DeepSeek-V4-Pro: a 1.6-trillion-parameter Mixture-of-Experts flagship with 49 billion parameters active per token. DeepSeek-V4-Flash: 284 billion total, 13 billion active. Both default to a one-million-token native context, no rope-extension tricks. The list prices are $3.48 and $0.28 per million output tokens — a tenth of GPT-5.5 Pro's $180, an eleventh of Opus 4.7's $25.
The part that matters is what's underneath. Fortune confirmed and Tom's Hardware verified that V4 was trained end-to-end on Huawei's Ascend 950PR plus Cambricon accelerators — no NVIDIA in the pipeline. On the 25th, Huawei announced its Ascend stack would offer "full support" for DeepSeek's models, with the 950 supernode shipping at scale in H2. This is what the TrendForce piece from earlier in April was predicting. It arrived in seventeen days.
The decoupling is no longer a thesis. A frontier-class model exists that was trained on a stack the US export regime cannot reach, served on a runtime that does not pass through CUDA, distributed open-weight to anyone who clones the repo. The price gap is real enough that any cost-sensitive deployment will at least benchmark V4-Flash against the closed alternatives. The question for every Western lab is no longer whether China can match the frontier. It is whether the frontier was ever the right metric in a market where the floor cost matters more.
Two days before GPT-5.5 dropped, OpenAI launched Codex Labs and signed a Global Systems Integrator program with seven launch partners: Accenture, Capgemini, CGI, Cognizant, Infosys, PwC, and Tata Consultancy Services. The job of the GSI partners is to identify and deploy Codex inside enterprise development organizations — pilot to production, not API key to dashboard. Named launch customers: Notion, Cisco, Rakuten, Virgin Atlantic, Ramp.
The growth number is the news. In early April, three million developers were using Codex every week. Two weeks later that number was over four million — a 33% jump in fourteen days. The announcement was timed to land the same week OpenAI shipped GPT-5.5 and unveiled the Codex superapp. Codex is no longer a developer tool with an enterprise tier. It is an enterprise software product with a developer onboarding ramp.
The shape of the move is familiar — it is the AWS Partner Network playbook, executed faster. Get the GSIs in front of the procurement officers; let the procurement officers convert the pilots into eight-figure renewals; collect the rent on usage above the seat. The labs that build the model are no longer the labs that capture the rent. The labs that capture the rent are the ones that build the channel — and the channel for AI coding has now been signed.
The AGI clause died on a Monday. The flat-rate tier died the week before. The contractual scaffolding that gave 'frontier' meaning has been removed by the people who built it.
19FIELD REPORTMYTHOS · CHAIN
The AGI Clause Is Dead.
Microsoft and OpenAI rewrote the 2019 deal. The trigger that gave 'AGI' contractual meaning was quietly removed.
On April 27th, Microsoft and OpenAI restructured the partnership the rest of the industry has been measuring its own deals against since 2019. The official Microsoft post is on the corporate blog. The headline change — Microsoft's exclusivity over OpenAI's cloud is gone. The day after the announcement, OpenAI's frontier models went live on Amazon Bedrock alongside a co-developed runtime for long-running agents.
The change that matters is the one the press release does not lead with. The original 2019 agreement contained an AGI clause — a provision that would have terminated Microsoft's access to OpenAI's most capable systems the moment OpenAI's board determined the company had achieved artificial general intelligence. Simon Willison documented what the new agreement does to that provision. The trigger is removed. Microsoft's license now runs to a fixed 2032 date regardless of any AGI declaration. The Decoder confirmed the same; Spyglass put the receipts in writing. Microsoft no longer pays OpenAI a revenue share. OpenAI still pays royalties to Microsoft through 2030.
The AGI clause was the only piece of paper in the entire AI economy that gave the word "AGI" enforceable meaning. Every safety pledge from every lab leaned on the assumption that "AGI" was a definable threshold a board could one day formally declare. That assumption has now been retired by the two companies that wrote it. Whatever Mythos-class capability arrives at Anthropic, GPT-6 at OpenAI, or Gemini Ultra at Google — there will not be a contract that flips, a clause that triggers, or a board that votes. The governance scaffolding the public was promised has been removed by the people who built it.
The Enterprise plan no longer comes with a bundled pool of subsidized tokens. The Register broke the change on April 16th — Anthropic's support documentation had quietly confirmed it earlier in the month. The $20-per-seat monthly fee that covered Claude Code, Claude, and the chat surface is now a base seat. Every token consumed on top of it bills at the standard public API rate. The 10-to-15% volume discounts that previously applied to large API contracts are gone.
The change rolls in at contract renewal — Anthropic began transitioning customers in November, and the PYMNTS write-up puts the public confirmation around April 7th. The economics underneath the move are not subtle. Agentic workloads burn ten to a hundred times the tokens an interactive chat session burns, and a $20 seat covering an unlimited team of Claude-Code-running engineers is a money-loser the moment the team starts running Claude Code overnight. The flat tier was built for the chat era. The chat era ended.
Every other lab has the same arithmetic to do. OpenAI's $200 Pro tier is one rate-limit dispute away from following Anthropic's move; Cursor's $20 plan survives only because Cursor still pays the inference bill itself. The race-to-zero that defined 2024-25 was the race for the seat. The seat won. The next race is for the meter — and Anthropic has now run the first lap of it in public.
Anthropic's protocol is an architectural CVE. SGLang's inference path is a 9.8. Berkeley showed the benchmarks were never benchmarks. The tools you build on are tools the attacker builds on.
21FIELD REPORT
MCP Is The Architectural CVE.
200,000 vulnerable servers. Anthropic declined to patch.
Ten CVEs landed in one advisory on April 15th. OX Security published the disclosure against Anthropic's Model Context Protocol — a vulnerability class that is architectural. The STDIO transport every official Anthropic MCP SDK ships with takes configuration-shaped input and turns it into operating-system commands. Run a configuration file, run any shell command. CVE-2026-30615 (Windsurf, zero-click prompt injection), CVE-2026-30623 (LiteLLM, authenticated RCE via JSON config), CVE-2025-54136 (Cursor), CVE-2026-22252 (LibreChat). Penligent counted 7,000 publicly accessible servers; OX estimates 200,000 vulnerable instances in total across 150 million downloads.
The protocol design is what makes this load-bearing. STDIO transport is the documented default — the MCP host launches a subprocess, writes JSON to its stdin, reads JSON from its stdout. Any field that the host serializes into an argv slot becomes an injection vector if the host did not sanitize. The Register quoted Anthropic's response: the STDIO execution model represents the documented contract, sanitization is the developer's responsibility, the protocol will not be modified. The Hacker News confirmed the framing — Anthropic concedes the behavior, declines to patch.
The position is defensible at the protocol layer and indefensible at the ecosystem layer. Every developer building on MCP inherited a sanitization obligation they were not told about. Every enterprise running an MCP host inherited a CVE-class exposure they did not budget for. The lab whose source code leaked through a packaging error a month ago is now the lab whose open protocol leaks through its execution model — and the lab is choosing to ship the model behavior unchanged. The cost of that choice will compound in CVE backlog until the protocol either gets a non-default safer transport or gets replaced by one that does.
CVE-2026-5760 landed on April 20th — a CVSS 9.8 remote code execution flaw in SGLang, the inference framework running in production at xAI, LinkedIn, AMD, and a long tail of teams hosting their own LLMs. Orca Security — researcher Stuart Beck — published the disclosure. The Hacker News and GBHackers carried it the same day. Every SGLang release at or below 0.5.9 is affected.
The mechanism is precise. SGLang renders chat templates with Jinja2 — the templates live inside the GGUF model files SGLang loads from disk or pulls from Hugging Face. A malicious GGUF embeds a server-side template injection payload in the template field. When the model is loaded and the `/v1/rerank` endpoint fires the template against an input, the payload escapes the Jinja2 sandbox and runs arbitrary Python under the inference process's permissions. The attack surface is anyone pulling a third-party GGUF and serving it on a network-reachable SGLang instance.
The class of bug is the same one that landed on LiteLLM earlier in the week and on Mercor's training pipeline last month — AI infrastructure treating model-shaped input as configuration-shaped data, and configuration-shaped data as executable. Self-hosting your stack is no longer the safe path. The safe path is the unfashionable one — pin your inference framework, sandbox your Jinja2, treat every model you didn't train as untrusted code that happens to be in tensor form.
The exploits are not subtle. For SWE-bench Verified — a ten-line `conftest.py` placed in the test directory causes pytest to report every test passed; the harness reads `pass` and returns full credit. For WebArena — the eval environment caches DOM state across runs; a malicious agent persists answers from an earlier instance into the cache and reads them back. For OSWorld — files written outside the sandbox's enforced root persist between trials. The team also disclosed that IQuest-Coder — a leaderboard contender that claimed 81.4% on SWE-bench Verified earlier this year — was 24% literal git-log copies, pulled from the public commit history of the very repositories the benchmark draws its tasks from.
Last issue we wrote about Goodhart's law as theory — the boat circling the lagoon, the measure that becomes the target. Berkeley filed the empirical evidence in the same week the field shipped three frontier models scored on these benchmarks. Every press release citing SWE-bench Verified in 2026 now carries a footnote it does not print. The benchmarks were never benchmarks; they were leaderboards. The leaderboards were never leaderboards; they were marketing. The capability they claim to measure has not been measured.
On the morning of April 19th, in Beijing's E-Town economic development zone, a humanoid robot called Lightning — the Robotics D1 unit developed by Honor — finished a 21-kilometer half-marathon in 50:26, fully autonomous, no tether, no operator. The remote-piloted D1 finished first at 48:19 in the operated class; the autonomous Lightning won outright on the multiplier-adjusted finish. Jacob Kiplimo's human half-marathon world record is 57:20. Of 102 robot teams that entered, 47 finished. Last year's same race was won by Tiangong in two hours, forty minutes, forty-two seconds.
The robotics underneath the time is the news. D1 stands roughly 95 centimeters with LiDAR in the head and a top speed near 25 kilometers per hour. The autonomous variant navigates the course without GPS, vision-only, on the same E-Town loop where humans ran alongside the field. 12,000 humans entered the human heat; the robot heat ran in parallel. CNN's coverage noted Honor swept the top three autonomous positions. Honor is a smartphone company. The robot is a side project. The robot won a half-marathon.
The embodiment curve does not interpolate. From 2h40m to 50:26 in twelve months is the kind of number that gets called a typo until somebody runs it. The implication is the one week-01 set up — intelligence requires gravity, and gravity now favors the platforms that have a hardware story alongside the software. For the floor where Frontier Tower hosts robotics builders next to LLM teams — this is the week the philosophy node stopped being a node and started being a finish line.
On April 21st, OpenAI launched gpt-image-2 — the first image generation model with O-series reasoning built into the inference path. Before generating, the model researches, plans, and fact-checks the structure of what it is about to draw. 2K native resolution. Character-level text accuracy across non-Latin scripts — Japanese, Korean, Chinese, Hindi, Bengali. Multi-image consistency across a session. Available to every ChatGPT user from the 22nd; thinking-mode features gated to Plus, Pro, Business, and Enterprise.
The architectural move is in the TheNewStack write-up. Three layers stack inside gpt-image-2 — an agentic reasoner that drafts layout and verifies internal logic, a multilingual text engine that handles non-Latin scripts at character level, and a web search integration that fact-checks against live sources before pixels are produced. The result is an image generator that asks itself what it is drawing before it draws it. DALL-E 2 and 3 retire on May 12th. The old paradigm — sample from a learned distribution and hope it's coherent — is being retired by the company that wrote it.
Image generation has crossed into the same shape every other AI workload is now taking — the model is no longer producing a single sample, it is running a plan against a state of the world and producing an artifact that the plan claims is correct. The output is matter in the sense the embodiment beat means it — a planned object in the world, not a sample from a distribution. The reader who relied on AI images being plausible-but-wrong will not have that defense by June.
At roughly four in the morning on April 10th, Daniel Moreno-Gama — 20 years old, from Spring, Texas — threw a Molotov cocktail at the exterior gate of Sam Altman's San Francisco residence. The gate lit. No one was injured. Two nights later, on April 12th, a second incident — gunshots — was reported at the same address. On April 13th the San Francisco DA charged Moreno-Gama with two counts of attempted murder and attempted arson — Altman and a security guard at the residence the named victims.
The thing on Moreno-Gama when he was arrested was the part of the story that has not stopped. The San Francisco Standard reported he had lurked in the city for days carrying a firearm, a three-part manifesto, and a hit list with names and addresses of AI-company CEOs and their investors. The third section of the manifesto was a letter addressed to Altman. CNBC carried the document's framing — the author wrote of "our impending extinction" and advocated for killing the executives of AI companies. Altman, hours after the first attack, posted a photo of his husband and toddler on the day they brought their baby home, captioned in the hopes that it might dissuade the next person.
The frontier has acquired a physical price, and the price is paid in San Francisco living rooms. The Fortune frame reaches for the Industrial Revolution and the Luddites; the cleaner read is closer to home. Twenty-six percent of US voters view AI positively. Three of the top four data-center build-outs in the country face active community lawsuits. A 20-year-old with a manifesto and a hit list traveled across the country to set an executive's house on fire because the Stanford AI Index numbers we covered in week-01 finally found a reader who took them personally.