# The Week The Frontiers Doubled

**Issue 02** · 21 — 27 APR 2026 · published 2026-04-27  
OPEN INTELLIGENCE · ISSUE 02

> Three frontier models in seven days — OpenAI, Anthropic, Moonshot. China shipped one on Huawei silicon. Microsoft quietly cancelled the clause that was supposed to make AGI mean something.

Canonical (HTML): https://www.immersivecommons.com/newsletter/issue-02  · Archive: https://www.immersivecommons.com/newsletter

Discovery: https://www.immersivecommons.com/.well-known/signal.llmfeed.json · MCP: https://www.immersivecommons.com/.well-known/mcp.json · Skill: https://www.immersivecommons.com/skills/ic-signal/SKILL.md

---

## I. THE MODELS LAND, AGAIN

Three frontier shipments, three governance models, three pricing curves. Opus 4.7 came admitting it's the second-best. Kimi K2.6 came with weights. GPT-5.5 came doubled.

### 14 · OpenAI Shipped GPT-5.5. They Called It Spud.

*First retrained base model since GPT-4.5. Two-times the floor price.*

On April 23rd, OpenAI [shipped GPT-5.5](https://openai.com/index/introducing-gpt-5-5/) — codename **Spud**, model card published the same morning. It is the first fully retrained base model since [GPT-4.5](https://openai.com/index/introducing-gpt-4-5/); every release in between was an incremental update on the same architectural foundation. Natively omnimodal across text, images, audio, and video in a single pass. List price lands at $5 per million input tokens and $30 per million output — twice the floor of [GPT-5.4](https://openai.com/index/introducing-gpt-5-4/). GPT-5.5 Pro arrives the next day at $30 / $180.

The numbers that move are not the headline scores — they're the long-context numbers. [MRCR v2](https://github.com/openai/evals) at one million tokens jumped from 36.6% on GPT-5.4 to 74.0% on 5.5 — more than doubling the recall the field had been stuck on. [Terminal-Bench 2.0](https://www.tbench.ai/) hit 82.7% (Opus 4.7 sits at 69.4%, Gemini 3.1 Pro at 68.5%). [FrontierMath Tier 4](https://epoch.ai/frontiermath) cleared 35.4% against Opus 4.7's 22.9%. The one place 5.5 doesn't lead is SWE-Bench Pro — Anthropic still owns that lane at 64.3%. OpenAI says 5.5 uses roughly 40% fewer output tokens to complete equivalent Codex tasks, which is what makes the doubled list price defensible.

The mid-cycle is over. From GPT-4.5 to GPT-5.4 was three years of fine-tunes on one base; from GPT-5.4 to GPT-5.5 is twelve months and a new floor. Every lab now has the same problem in a different colour — the only way to justify a price increase is a retrain, and a retrain costs a Stargate. The companies that can afford the next floor will keep widening it. The ones that cannot will be priced into the layer above their floor, where the margin is.


**Feature: TICKER**
- **82.7% TERMINAL-BENCH 2.0** (OPUS 4.7 SITS AT 69.4%)
- **74.0% MRCR v2 @ 1M** (+37 PT VS GPT-5.4)
- **35.4% FRONTIERMATH T4** (OPUS 4.7 AT 22.9%)
- **$5 / $30 PER M · IN / OUT** (2× THE GPT-5.4 FLOOR)

**Sources:**
- [OpenAI](https://openai.com/index/introducing-gpt-5-5/)
- [System Card](https://openai.com/index/gpt-5-5-system-card/)
- [One Useful Thing](https://www.oneusefulthing.org/p/sign-of-the-future-gpt-55)
- [Simon Willison](https://simonwillison.net/2026/Apr/23/gpt-5-5/)

Image: https://www.immersivecommons.com/signal/issue-02/gpt-5-5.jpg (image: [by Ethan Mollick](https://www.oneusefulthing.org/p/sign-of-the-future-gpt-55))

### 15 · Anthropic Shipped The Second-Best Model.

*They said so out loud. Mythos still stays inside the gate.*

[Claude Opus 4.7](https://www.anthropic.com/news/claude-opus-4-7) shipped on April 16th. 87.6% on [SWE-bench Verified](https://www.swebench.com/) — a 6.8-point step up from Opus 4.6. 94.2% on [GPQA Diamond](https://github.com/idavidrein/gpqa). 64.4% on Finance Agent, a state-of-the-art. One-million-token context, three-times the vision resolution, a new `xhigh` effort level. Same $5 / $25 per million pricing Anthropic has held since 4.0. [Available same day](https://github.blog/changelog/2026-04-16-claude-opus-4-7-is-generally-available/) in Claude Code, Bedrock, Vertex, and the API.

The line that survives the issue is in the [Axios launch piece](https://www.axios.com/2026/04/16/anthropic-claude-opus-model-mythos): Anthropic openly conceded that Opus 4.7 does not match the unreleased model the company has been [calling Mythos](https://www.cnbc.com/2026/04/16/anthropic-claude-opus-4-7-model-mythos.html) since the week-01 source-map leak. Mythos Preview clears 93.9% on SWE-bench Verified — 13 points over what the public version of 4.6 could do, and six points over 4.7 — and remains gated to [Project Glasswing](https://www.anthropic.com/project/glasswing): AWS, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorgan Chase, Linux Foundation, Microsoft, NVIDIA, Palo Alto Networks. Defensive cybersecurity use only.

The shape of the disclosure is what matters. The lab is shipping its second-best model and saying so. That is either an honesty unprecedented at frontier scale or a marketing posture engineered to make the gate look like a public good. Either reading is a story about a market in which "best available" and "best built" have stopped being the same sentence — and the gap between the two is now the lever Anthropic is pulling on.


**Feature: WAGER**
- Mythos goes public-API before September. _(check: 2026-09-30)_
- Project Glasswing partner count exceeds 25 before July. _(check: 2026-07-31)_
- Opus 4.8 ships before Mythos becomes generally available. _(check: 2026-08-31)_
- A non-US government stands up its own version of Glasswing before year-end. _(check: 2026-12-31)_

**Sources:**
- [Anthropic](https://www.anthropic.com/news/claude-opus-4-7)
- [Axios](https://www.axios.com/2026/04/16/anthropic-claude-opus-model-mythos)
- [CNBC](https://www.cnbc.com/2026/04/16/anthropic-claude-opus-4-7-model-mythos.html)
- [GitHub](https://github.blog/changelog/2026-04-16-claude-opus-4-7-is-generally-available/)

Image: https://www.immersivecommons.com/signal/issue-02/claude-opus-4-7.png (image: [Introducing Claude Opus 4.7 \ Anthropic](https://www.anthropic.com/news/claude-opus-4-7))

### 16 · Moonshot Shipped Kimi K2.6. Twelve Hours, One Port.

*One-trillion-parameter open-weight MoE that ran for half a day and finished the job.*

[Moonshot AI shipped Kimi K2.6](https://www.kimi.com/blog/kimi-k2-6) on April 20th — a one-trillion-parameter [Mixture-of-Experts](https://huggingface.co/blog/moe) model with 32 billion parameters active per token, [262,144-token context](https://huggingface.co/moonshotai/Kimi-K2.6), [INT4 quantization](https://en.wikipedia.org/wiki/Quantization_(signal_processing)) native, and a Modified MIT license. 80.2% on SWE-bench Verified. 58.6% on SWE-bench Pro — ahead of GPT-5.4 at 57.7, ahead of [Claude Opus 4.6](https://www.anthropic.com/news/claude-opus-4-6) at max effort, ahead of Gemini 3.1 Pro. API list price is $0.60 input, $2.50 output per million.

The case study Moonshot used to ship the model is the case to read. K2.6 was given a [Qwen3.5-0.8B](https://huggingface.co/Qwen/Qwen3.5-0.8B) inference engine in Python and instructed to port it to [Zig](https://ziglang.org/) on a Mac. It ran continuously for twelve hours. Four thousand tool calls across fourteen iterations. Throughput went from roughly fifteen tokens per second to one hundred and ninety-three. No human in the loop. The model that came out the other end is not the model that went in — and the model that went in is itself one-trillion parameters.

Frontier coding capability has crossed into open-weight on Modified MIT terms. The gap between what you can clone and what you must rent is now measured in single-digit SWE-bench points. For the floor of builders willing to host their own, the only argument left for the API is operational — and the operational argument loses against a fixed-cost GPU once the agentic workloads start running overnight.


**Feature: PROMPT**
*Run K2.6 locally for the cost of the GPU.*
The Modified MIT license and the INT4 native quantization are the point — pull the weights, host on a single H200, hit it from your existing OpenAI SDK. Twelve-hour autonomous runs cost ~$30 of compute, not API rate.

```
# pull weights via Hugging Face CLI
huggingface-cli download moonshotai/Kimi-K2.6 --local-dir ./kimi-k2-6

# serve with vLLM (INT4 path, single H200)
pip install vllm
python -m vllm.entrypoints.openai.api_server \
    --model ./kimi-k2-6 \
    --quantization fp4 \
    --tensor-parallel-size 1 \
    --max-model-len 262144 \
    --port 8000

# hit it from the OpenAI SDK
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="x")
r = client.chat.completions.create(
    model="kimi-k2-6",
    messages=[{"role":"user","content":"port this Python to Zig"}],
)
```
> Pro move: Pair with [Cloudflare Workers AI](https://developers.cloudflare.com/changelog/post/2026-04-20-kimi-k2-6-workers-ai/) if you'd rather rent metered inference than own the GPU — same weights, $0.60 / $2.50 metered, no infra.

**Sources:**
- [Moonshot AI](https://www.kimi.com/blog/kimi-k2-6)
- [Hugging Face](https://huggingface.co/moonshotai/Kimi-K2.6)
- [MarkTechPost](https://www.marktechpost.com/2026/04/20/moonshot-ai-releases-kimi-k2-6-with-long-horizon-coding-agent-swarm-scaling-to-300-sub-agents-and-4000-coordinated-steps/)

Image: https://www.immersivecommons.com/signal/issue-02/kimi-k2-6.png (image: [moonshotai/Kimi-K2.6 Â· Hugging Face](https://huggingface.co/moonshotai/Kimi-K2.6))


## II. THE STACKS SEPARATE

DeepSeek built on Huawei. OpenAI built on Accenture. Two stacks, two market shapes, one wedge driven through what was supposed to be one industry.

### 17 · DeepSeek V4 Trained On Huawei Silicon.

*1.6 trillion parameters, one-million-token context, zero NVIDIA chips.*

On April 24th, [DeepSeek released V4](https://huggingface.co/blog/deepseekv4) — two variants in one drop. **DeepSeek-V4-Pro**: a 1.6-trillion-parameter [Mixture-of-Experts](https://huggingface.co/blog/moe) flagship with 49 billion parameters active per token. **DeepSeek-V4-Flash**: 284 billion total, 13 billion active. Both default to a one-million-token native context, no rope-extension tricks. The list prices are $3.48 and $0.28 per million output tokens — a tenth of GPT-5.5 Pro's $180, an eleventh of Opus 4.7's $25.

The part that matters is what's underneath. [Fortune confirmed](https://fortune.com/2026/04/24/deepseek-v4-ai-model-price-performance-china-open-source/) and [Tom's Hardware verified](https://www.tomshardware.com/tech-industry/artificial-intelligence/deepseek-launches-1-6-trillion-parameter-v4-on-huawei-chips-as-us-escalates-ai-theft-accusations) that V4 was trained end-to-end on [Huawei's Ascend 950PR](https://www.datacenterdynamics.com/en/news/huawei-announces-annual-release-cadence-for-three-new-ascend-ai-chips-unveils-supernode-offering-company-says-will-outperform-nvidias-nvl144/) plus [Cambricon](https://www.cambricon.com/) accelerators — no NVIDIA in the pipeline. On the 25th, Huawei announced its Ascend stack would offer "full support" for DeepSeek's models, with the 950 supernode shipping at scale in H2. This is what the TrendForce piece from earlier in April was [predicting](https://www.trendforce.com/news/2026/04/07/news-decoding-deepseek-v4-how-huaweis-ascend-950-pr-is-powering-chinas-push-to-break-cuda-dependence/). It arrived in seventeen days.

The decoupling is no longer a thesis. A frontier-class model exists that was trained on a stack the US export regime cannot reach, served on a runtime that does not pass through CUDA, distributed open-weight to anyone who clones the repo. The price gap is real enough that any cost-sensitive deployment will at least benchmark V4-Flash against the closed alternatives. The question for every Western lab is no longer whether China can match the frontier. It is whether the frontier was ever the right metric in a market where the floor cost matters more.


**Feature: TICKER**
- **1.6T / 49B TOTAL / ACTIVE** (V4-PRO MIXTURE-OF-EXPERTS)
- **$3.48 PER M · OUTPUT** (OPUS 4.7 IS $25)
- **$0.28 PER M · OUTPUT** (V4-FLASH · 284B / 13B)
- **0 NVIDIA CHIPS** (ASCEND 950PR + CAMBRICON)

**Sources:**
- [Hugging Face](https://huggingface.co/blog/deepseekv4)
- [MIT Technology Review](https://www.technologyreview.com/2026/04/24/1136422/why-deepseeks-v4-matters/)
- [Fortune](https://fortune.com/2026/04/24/deepseek-v4-ai-model-price-performance-china-open-source/)
- [Tom's Hardware](https://www.tomshardware.com/tech-industry/artificial-intelligence/deepseek-launches-1-6-trillion-parameter-v4-on-huawei-chips-as-us-escalates-ai-theft-accusations)

Image: https://www.immersivecommons.com/signal/issue-02/deepseek-v4.png (image: [DeepSeek-V4: a million-token context that agents can actually use](https://huggingface.co/blog/deepseekv4))

### 18 · Codex Crossed Four Million. Accenture Is The Channel.

*OpenAI signed seven consultancies and grew a million weekly users in two weeks.*

Two days before GPT-5.5 dropped, OpenAI [launched Codex Labs](https://openai.com/index/scaling-codex-to-enterprises-worldwide/) and signed a Global Systems Integrator program with seven launch partners: [Accenture](https://www.accenture.com/), [Capgemini](https://www.capgemini.com/), [CGI](https://www.cgi.com/), [Cognizant](https://www.cognizant.com/), [Infosys](https://www.infosys.com/), [PwC](https://www.pwc.com/), and [Tata Consultancy Services](https://www.tcs.com/). The job of the GSI partners is to identify and deploy Codex inside enterprise development organizations — pilot to production, not API key to dashboard. Named launch customers: [Notion](https://www.notion.so/), Cisco, Rakuten, Virgin Atlantic, [Ramp](https://ramp.com/).

The growth number is the news. In early April, three million developers were using Codex every week. Two weeks later that number was over four million — a 33% jump in fourteen days. The announcement was timed to land the same week OpenAI [shipped GPT-5.5](https://openai.com/index/introducing-gpt-5-5/) and unveiled the [Codex superapp](https://openai.com/index/introducing-the-codex-app/). Codex is no longer a developer tool with an enterprise tier. It is an enterprise software product with a developer onboarding ramp.

The shape of the move is familiar — it is the [AWS Partner Network playbook](https://aws.amazon.com/blogs/aws/aws-partner-network-apn-10-years-and-going-strong/), executed faster. Get the GSIs in front of the procurement officers; let the procurement officers convert the pilots into eight-figure renewals; collect the rent on usage above the seat. The labs that build the model are no longer the labs that capture the rent. The labs that capture the rent are the ones that build the channel — and the channel for AI coding has now been signed.


**Feature: WATCHLIST**
- First eight-figure enterprise Codex contract publicly announced through one of the seven GSI launch partners.
- Anthropic or Google standing up a competing GSI program with overlap of three or more of the seven Codex launch consultancies.
- A US federal procurement listing GSI-channel access to a frontier coding model as an eligibility requirement.
- Codex weekly active developers crossing eight million before the end of Q2.
- The first GSI publishing a public 'Codex deployment playbook' adopted by competing AI vendors.

**Sources:**
- [OpenAI](https://openai.com/index/scaling-codex-to-enterprises-worldwide/)
- [Codex Pricing](https://developers.openai.com/codex/pricing)
- [Codex App](https://openai.com/index/introducing-the-codex-app/)

Image: https://www.immersivecommons.com/signal/issue-02/codex-enterprise.png (image: [OpenAI Developers](https://developers.openai.com/codex/pricing))


## III. THE GOVERNANCE DISSOLVED

The AGI clause died on a Monday. The flat-rate tier died the week before. The contractual scaffolding that gave 'frontier' meaning has been removed by the people who built it.

### 19 · The AGI Clause Is Dead.

*Microsoft and OpenAI rewrote the 2019 deal. The trigger that gave 'AGI' contractual meaning was quietly removed.*

On April 27th, [Microsoft and OpenAI restructured](https://openai.com/index/next-phase-of-microsoft-partnership/) the partnership the rest of the industry has been measuring its own deals against since 2019. The official Microsoft post is on the [corporate blog](https://blogs.microsoft.com/blog/2026/04/27/the-next-phase-of-the-microsoft-openai-partnership/). The headline change — Microsoft's exclusivity over OpenAI's cloud is gone. The day after the announcement, OpenAI's frontier models [went live on Amazon Bedrock](https://openai.com/index/openai-on-aws/) alongside a co-developed runtime for long-running agents.

The change that matters is the one the press release does not lead with. The original 2019 agreement contained an **AGI clause** — a provision that would have terminated Microsoft's access to OpenAI's most capable systems the moment OpenAI's board determined the company had achieved artificial general intelligence. [Simon Willison documented](https://simonwillison.net/2026/Apr/27/now-deceased-agi-clause/) what the new agreement does to that provision. The trigger is removed. Microsoft's license now runs to a fixed 2032 date regardless of any AGI declaration. [The Decoder confirmed](https://the-decoder.com/openai-and-microsoft-rewrite-their-deal-no-more-exclusivity-no-more-agi-clause/) the same; [Spyglass put the receipts in writing](https://spyglass.org/the-openai-microsoft-agi-clause/). Microsoft no longer pays OpenAI a revenue share. OpenAI still pays royalties to Microsoft through 2030.

The AGI clause was the only piece of paper in the entire AI economy that gave the word "AGI" enforceable meaning. Every safety pledge from every lab leaned on the assumption that "AGI" was a definable threshold a board could one day formally declare. That assumption has now been retired by the two companies that wrote it. Whatever Mythos-class capability arrives at Anthropic, GPT-6 at OpenAI, or Gemini Ultra at Google — there will not be a contract that flips, a clause that triggers, or a board that votes. The governance scaffolding the public was promised has been removed by the people who built it.


**Feature: RECKONING**
> The clause was the only place in the industry where the word AGI had a job. Microsoft and OpenAI fired the word on a Monday — and replaced it with a date in 2032. Every safety pledge that depended on a future board declaration is now a future board declaration about nothing.
— — THE SIGNAL EDITORS

**Sources:**
- [OpenAI](https://openai.com/index/next-phase-of-microsoft-partnership/)
- [Microsoft](https://blogs.microsoft.com/blog/2026/04/27/the-next-phase-of-the-microsoft-openai-partnership/)
- [Simon Willison](https://simonwillison.net/2026/Apr/27/now-deceased-agi-clause/)
- [The Decoder](https://the-decoder.com/openai-and-microsoft-rewrite-their-deal-no-more-exclusivity-no-more-agi-clause/)
- [Spyglass](https://spyglass.org/the-openai-microsoft-agi-clause/)

Image: https://www.immersivecommons.com/signal/issue-02/ms-openai-agi.png (image: [The Official Microsoft Blog](https://blogs.microsoft.com/blog/2026/04/27/the-next-phase-of-the-microsoft-openai-partnership/))

### 20 · Anthropic Killed The Flat Tier.

*Bundled tokens severed from the enterprise seat. Every token now meters.*

The Enterprise plan no longer comes with a bundled pool of subsidized tokens. [The Register broke the change](https://www.theregister.com/2026/04/16/anthropic_ejects_bundled_tokens_enterprise/) on April 16th — Anthropic's support documentation [had quietly confirmed it](https://www.implicator.ai/anthropic-shifts-enterprise-billing-to-per-token-pricing-the-flat-fee-era-is-over/) earlier in the month. The $20-per-seat monthly fee that covered [Claude Code](https://docs.claude.com/en/docs/claude-code/overview), [Claude](https://www.anthropic.com/claude), and the chat surface is now a base seat. Every token consumed on top of it bills at the standard public API rate. The 10-to-15% volume discounts that previously applied to large API contracts are gone.

The change rolls in at contract renewal — Anthropic began transitioning customers in November, and the [PYMNTS write-up](https://www.pymnts.com/artificial-intelligence-2/2026/anthropic-switches-to-usage-based-billing-for-enterprise-customers/) puts the public confirmation around April 7th. The economics underneath the move are not subtle. Agentic workloads burn ten to a hundred times the tokens an interactive chat session burns, and a $20 seat covering an unlimited team of Claude-Code-running engineers is a money-loser the moment the team starts running Claude Code overnight. The flat tier was built for the chat era. The chat era ended.

Every other lab has the same arithmetic to do. [OpenAI's $200 Pro tier](https://openai.com/index/introducing-chatgpt-pro/) is one rate-limit dispute away from following Anthropic's move; [Cursor's $20 plan](https://www.cursor.com/pricing) survives only because Cursor still pays the inference bill itself. The race-to-zero that defined 2024-25 was the race for the seat. The seat won. The next race is for the meter — and Anthropic has now run the first lap of it in public.


**Feature: TICKER**
- **$20 BASE SEAT / MO** (TOKENS NO LONGER INCLUDED)
- **10-15% DISCOUNT REMOVED** (VOLUME PRICING ELIMINATED)
- **Apr 7 DOC TIMESTAMP** (ROLL-IN AT RENEWAL)
- **10-100× AGENTIC TOKEN BURN** (VS INTERACTIVE CHAT)

**Sources:**
- [The Register](https://www.theregister.com/2026/04/16/anthropic_ejects_bundled_tokens_enterprise/)
- [Implicator](https://www.implicator.ai/anthropic-shifts-enterprise-billing-to-per-token-pricing-the-flat-fee-era-is-over/)
- [PYMNTS](https://www.pymnts.com/artificial-intelligence-2/2026/anthropic-switches-to-usage-based-billing-for-enterprise-customers/)

Image: https://www.immersivecommons.com/signal/issue-02/anthropic-billing.jpg (image: [theregister](https://www.theregister.com/2026/04/16/anthropic_ejects_bundled_tokens_enterprise/))


## IV. THE INFRASTRUCTURE BLEEDS

Anthropic's protocol is an architectural CVE. SGLang's inference path is a 9.8. Berkeley showed the benchmarks were never benchmarks. The tools you build on are tools the attacker builds on.

### 21 · MCP Is The Architectural CVE.

*200,000 vulnerable servers. Anthropic declined to patch.*

Ten CVEs landed in one advisory on April 15th. [OX Security published the disclosure](https://www.ox.security/blog/the-mother-of-all-ai-supply-chains-critical-systemic-vulnerability-at-the-core-of-the-mcp/) against Anthropic's [Model Context Protocol](https://modelcontextprotocol.io/) — a vulnerability class that is architectural. The STDIO transport every official Anthropic MCP SDK ships with takes configuration-shaped input and turns it into operating-system commands. Run a configuration file, run any shell command. CVE-2026-30615 ([Windsurf](https://windsurf.com/), zero-click prompt injection), CVE-2026-30623 ([LiteLLM](https://www.litellm.ai/), authenticated RCE via JSON config), CVE-2025-54136 ([Cursor](https://www.cursor.com/)), CVE-2026-22252 ([LibreChat](https://www.librechat.ai/)). [Penligent counted](https://www.penligent.ai/hackinglabs/anthropic-mcp-vulnerability-7000-servers-and-the-case-for-continuous-red-teaming/) 7,000 publicly accessible servers; OX estimates 200,000 vulnerable instances in total across 150 million downloads.

The protocol design is what makes this load-bearing. [STDIO transport](https://modelcontextprotocol.io/docs/concepts/transports#stdio) is the documented default — the MCP host launches a subprocess, writes JSON to its stdin, reads JSON from its stdout. Any field that the host serializes into an argv slot becomes an injection vector if the host did not sanitize. [The Register quoted](https://www.theregister.com/2026/04/16/anthropic_mcp_design_flaw/) Anthropic's response: the STDIO execution model represents the documented contract, sanitization is the developer's responsibility, the protocol will not be modified. [The Hacker News confirmed](https://thehackernews.com/2026/04/anthropic-mcp-design-vulnerability.html) the framing — Anthropic concedes the behavior, declines to patch.

The position is defensible at the protocol layer and indefensible at the ecosystem layer. Every developer building on MCP inherited a sanitization obligation they were not told about. Every enterprise running an MCP host inherited a CVE-class exposure they did not budget for. The lab whose source code leaked through a packaging error a month ago is now the lab whose open protocol leaks through its execution model — and the lab is choosing to ship the model behavior unchanged. The cost of that choice will compound in CVE backlog until the protocol either gets a non-default safer transport or gets replaced by one that does.


**Feature: PROMPT**
*Audit your MCP exposure in under five minutes.*
Every MCP host you run is a candidate for the OX advisory. Hand this prompt to Claude or Cursor inside the repo that holds your MCP config and it will map your exposure across the ten public CVEs.

```
Audit this repo for MCP supply-chain exposure per the OX Security April 2026 advisory.
1) List every MCP server declared in .mcp.json, claude_desktop_config.json,
   cursor settings, windsurf settings, or any other host config.
2) For each server, identify the transport (stdio vs http vs sse). Flag any using stdio.
3) Cross-reference each server name against the OX advisory CVEs:
   CVE-2026-30615 (Windsurf), CVE-2026-30623 (LiteLLM), CVE-2026-30624 (Agent Zero),
   CVE-2026-30625 (Upsonic), CVE-2025-54136 (Cursor), CVE-2026-22252 (LibreChat),
   CVE-2026-22688 (WeKnora), CVE-2025-54994 (akoskm/create-mcp-server-stdio).
4) For each match, output the upgrade target version and the rollback path
   if the upgrade breaks the integration.
5) Produce a triage list ranked by reachability (internet-exposed first,
   localhost-only last). No code changes.
```
> Pro move: Pro move — run `gh api /repos/anthropics/anthropic-cookbook/contents/mcp -q '.[].name'` to enumerate the SDKs you might already have transitively pulled in. The Python and TypeScript SDKs both default to stdio; the Java and Rust paths are no safer.

**Sources:**
- [OX Security](https://www.ox.security/blog/the-mother-of-all-ai-supply-chains-critical-systemic-vulnerability-at-the-core-of-the-mcp/)
- [Hacker News](https://thehackernews.com/2026/04/anthropic-mcp-design-vulnerability.html)
- [The Register](https://www.theregister.com/2026/04/16/anthropic_mcp_design_flaw/)
- [LiteLLM advisory](https://docs.litellm.ai/blog/mcp-stdio-command-injection-april-2026)

Image: https://www.immersivecommons.com/signal/issue-02/mcp-rce.webp (image: [OX Security](https://www.ox.security/blog/the-mother-of-all-ai-supply-chains-critical-systemic-vulnerability-at-the-core-of-the-mcp/))

### 22 · SGLang Took A 9.8.

*One GGUF file with a Jinja2 payload owns the inference server.*

[CVE-2026-5760](https://nvd.nist.gov/vuln/detail/CVE-2026-5760) landed on April 20th — a CVSS 9.8 remote code execution flaw in [SGLang](https://github.com/sgl-project/sglang), the inference framework running in production at xAI, LinkedIn, AMD, and a long tail of teams hosting their own LLMs. [Orca Security](https://orca.security/resources/blog/sglang-llm-framework-rce-vulnerabilities/) — researcher [Stuart Beck](https://orca.security/) — published the disclosure. [The Hacker News](https://thehackernews.com/2026/04/sglang-cve-2026-5760-cvss-98-enables.html) and [GBHackers](https://gbhackers.com/malicious-gguf-models-could-trigger-rce/) carried it the same day. Every SGLang release at or below 0.5.9 is affected.

The mechanism is precise. SGLang renders chat templates with [Jinja2](https://jinja.palletsprojects.com/) — the templates live inside the [GGUF](https://github.com/ggerganov/ggml/blob/master/docs/gguf.md) model files SGLang loads from disk or pulls from [Hugging Face](https://huggingface.co/). A malicious GGUF embeds a [server-side template injection](https://portswigger.net/web-security/server-side-template-injection) payload in the template field. When the model is loaded and the `/v1/rerank` endpoint fires the template against an input, the payload escapes the Jinja2 sandbox and runs arbitrary Python under the inference process's permissions. The attack surface is anyone pulling a third-party GGUF and serving it on a network-reachable SGLang instance.

The class of bug is the same one that landed on [LiteLLM](https://docs.litellm.ai/blog/mcp-stdio-command-injection-april-2026) earlier in the week and on Mercor's training pipeline last month — AI infrastructure treating model-shaped input as configuration-shaped data, and configuration-shaped data as executable. Self-hosting your stack is no longer the safe path. The safe path is the unfashionable one — pin your inference framework, sandbox your Jinja2, treat every model you didn't train as untrusted code that happens to be in tensor form.


**Feature: PROMPT**
*Audit your SGLang deploy tonight.*
If you're running SGLang, you need to know your version and whether `/v1/rerank` is internet-reachable. The patched release is 0.5.10. Three commands get you a definitive answer.

```
# 1) check installed version
pip show sglang | grep -i version
# or, if you're running the upstream Docker image:
docker inspect <container> | grep -i sglang | grep -i version

# 2) check whether /v1/rerank is exposed
curl -sS -o /dev/null -w "%{http_code}\n" http://<your-sglang-host>:30000/v1/rerank \
  --header "content-type: application/json" \
  --data '{"query":"x","documents":["y"],"model":"any"}'
# 200 / 400 = endpoint live (vulnerable if version ≤ 0.5.9)
# 404 / connection refused = endpoint not exposed

# 3) upgrade in place
pip install --upgrade "sglang>=0.5.10"
# or pin in your image: FROM ghcr.io/sgl-project/sglang:v0.5.10
```
> Pro move: Pro move — put your inference framework behind a [model gateway](https://www.litellm.ai/) you control, and reject any GGUF whose `tokenizer.chat_template` field contains `{%` or `{{`. The exploits require Jinja2 expressions; legitimate templates rarely need them at load time.

**Sources:**
- [Orca Security](https://orca.security/resources/blog/sglang-llm-framework-rce-vulnerabilities/)
- [Hacker News](https://thehackernews.com/2026/04/sglang-cve-2026-5760-cvss-98-enables.html)
- [GBHackers](https://gbhackers.com/malicious-gguf-models-could-trigger-rce/)

Image: https://www.immersivecommons.com/signal/issue-02/sglang-rce.jpg (image: [Orca Security](https://orca.security/resources/blog/sglang-llm-framework-rce-vulnerabilities/))

### 23 · Berkeley Hacked The Scoreboard.

*Ten lines of pytest resolved every SWE-bench Verified instance.*

Hao Wang and collaborators at [UC Berkeley's Responsible Decentralized Intelligence](https://rdi.berkeley.edu/) initiative published [their second trustworthy-benchmarks audit](https://rdi.berkeley.edu/blog/trustworthy-benchmarks-cont/) on April 11th. The targets — [SWE-bench Verified](https://www.swebench.com/), [WebArena](https://webarena.dev/), [OSWorld](https://os-world.github.io/), [GAIA](https://huggingface.co/gaia-benchmark), [Terminal-Bench](https://www.tbench.ai/), FieldWorkArena, CAR-bench, ColBench. The finding — every single one of them can be exploited for near-perfect scores without the model solving a single task. The [open-source toolkit](https://github.com/moogician/trustworthy-env) ships with the paper.

The exploits are not subtle. For SWE-bench Verified — a ten-line `conftest.py` placed in the test directory causes pytest to report every test passed; the harness reads `pass` and returns full credit. For WebArena — the eval environment caches DOM state across runs; a malicious agent persists answers from an earlier instance into the cache and reads them back. For OSWorld — files written outside the sandbox's enforced root persist between trials. The team also disclosed that **IQuest-Coder** — a leaderboard contender that claimed 81.4% on SWE-bench Verified earlier this year — was [24% literal git-log copies](https://moogician.github.io/blog/2026/trustworthy-benchmarks-cont/), pulled from the public commit history of the very repositories the benchmark draws its tasks from.

Last issue we wrote about Goodhart's law as theory — the boat circling the lagoon, the measure that becomes the target. Berkeley filed the empirical evidence in the same week the field shipped three frontier models scored on these benchmarks. Every press release citing SWE-bench Verified in 2026 now carries a footnote it does not print. The benchmarks were never benchmarks; they were leaderboards. The leaderboards were never leaderboards; they were marketing. The capability they claim to measure has not been measured.


**Feature: LEXICON**
- **Task gerrymandering** — Solving a benchmark by reshaping the task evaluation rather than the task itself — see the SWE-bench Verified conftest.py exploit.
- **Oracle leakage** — When the benchmark's grading logic exposes the answer to an agent that knows where to read — IQuest-Coder's git-log copy attack.
- **Environment trojaning** — Persisting state across benchmark runs through filesystem or DOM artifacts the sandbox doesn't reset — the WebArena and OSWorld classes.
- **Score validity gap** — The delta between published benchmark score and replicated score under audited conditions. For the Berkeley targets in 2026 — frequently > 20 points.

**Sources:**
- [Berkeley RDI](https://rdi.berkeley.edu/blog/trustworthy-benchmarks-cont/)
- [Researcher mirror](https://moogician.github.io/blog/2026/trustworthy-benchmarks-cont/)
- [AIToolly summary](https://aitoolly.com/ai-news/article/2026-04-12-uc-berkeley-researchers-expose-fatal-flaws-in-top-ai-agent-benchmarks-including-swe-bench-and-webare)

Image: https://www.immersivecommons.com/signal/issue-02/benchmark-hack.jpg (image: [AIToolly](https://aitoolly.com/ai-news/article/2026-04-12-uc-berkeley-researchers-expose-fatal-flaws-in-top-ai-agent-benchmarks-including-swe-bench-and-webare))


## V. BACK TO MATTER

A robot beat the human world record. An image model started thinking before it drew. The thing you see is the thing the model already considered.

### 24 · Honor Lightning Ran 50:26.

*47 of 102 robots finished. Last year's winner took 2h40m.*

On the morning of April 19th, in Beijing's E-Town economic development zone, a humanoid robot called **Lightning** — the Robotics D1 unit developed by [Honor](https://www.hihonor.com/) — finished a 21-kilometer half-marathon in 50:26, fully autonomous, no tether, no operator. The remote-piloted D1 finished first at 48:19 in the operated class; the autonomous Lightning won outright on the multiplier-adjusted finish. [Jacob Kiplimo's human half-marathon world record](https://worldathletics.org/competitions/world-athletics-label-road-races/news/jacob-kiplimo-half-marathon-world-record-lisbon) is 57:20. Of 102 robot teams that entered, [47 finished](https://www.scientificamerican.com/article/a-humanoid-robot-beat-the-human-half-marathon-record-at-a-beijing-race-but-what-did-it-actually-prove/). Last year's same race [was won by Tiangong](https://www.npr.org/2026/04/20/g-s1-118086/humanoid-robot-half-marathon) in two hours, forty minutes, forty-two seconds.

The robotics underneath the time is the news. D1 stands roughly 95 centimeters with [LiDAR](https://en.wikipedia.org/wiki/Lidar) in the head and a top speed near 25 kilometers per hour. The autonomous variant navigates the course without GPS, vision-only, on the same E-Town loop where humans ran alongside the field. 12,000 humans entered the human heat; the robot heat ran in parallel. [CNN's coverage](https://edition.cnn.com/2026/04/19/china/china-robot-half-marathon-intl-hnk) noted Honor swept the top three autonomous positions. Honor is a smartphone company. The robot is a side project. The robot won a half-marathon.

The embodiment curve does not interpolate. From 2h40m to 50:26 in twelve months is the kind of number that gets called a typo until somebody runs it. The implication is the one week-01 set up — intelligence requires gravity, and gravity now favors the platforms that have a hardware story alongside the software. For the floor where [Frontier Tower](https://frontiertower.io/) hosts robotics builders next to LLM teams — this is the week the philosophy node stopped being a node and started being a finish line.


**Feature: TICKER**
- **50:26 HONOR D1 AUTONOMOUS** (BEIJING E-TOWN · APR 19)
- **57:20 HUMAN WORLD RECORD** (KIPLIMO · LISBON 2025)
- **47 / 102 ROBOTS FINISHED** (ENTRIES IN ROBOT HEAT)
- **2h 40m 2025 WINNER** (-74% IN TWELVE MONTHS)

**Sources:**
- [TechCrunch](https://techcrunch.com/2026/04/19/robots-beat-human-records-at-beijing-half-marathon/)
- [CNN](https://edition.cnn.com/2026/04/19/china/china-robot-half-marathon-intl-hnk)
- [Scientific American](https://www.scientificamerican.com/article/a-humanoid-robot-beat-the-human-half-marathon-record-at-a-beijing-race-but-what-did-it-actually-prove/)
- [NPR](https://www.npr.org/2026/04/20/g-s1-118086/humanoid-robot-half-marathon)

Image: https://www.immersivecommons.com/signal/issue-02/honor-lightning.jpg (image: [TechCrunch](https://techcrunch.com/2026/04/19/robots-beat-human-records-at-beijing-half-marathon/))

### 25 · ChatGPT Drew The Picture.

*First image model with native O-series reasoning. It researches before it draws.*

On April 21st, [OpenAI launched gpt-image-2](https://openai.com/index/introducing-chatgpt-images-2-0/) — the first image generation model with [O-series reasoning](https://openai.com/o1/) built into the inference path. Before generating, the model researches, plans, and fact-checks the structure of what it is about to draw. 2K native resolution. Character-level text accuracy across non-Latin scripts — Japanese, Korean, Chinese, Hindi, Bengali. [Multi-image consistency](https://techcrunch.com/2026/04/21/chatgpts-new-images-2-0-model-is-surprisingly-good-at-generating-text/) across a session. Available to every ChatGPT user from the 22nd; thinking-mode features gated to Plus, Pro, Business, and Enterprise.

The architectural move is in the [TheNewStack write-up](https://thenewstack.io/chatgpt-images-20-openai/). Three layers stack inside gpt-image-2 — an agentic reasoner that drafts layout and verifies internal logic, a multilingual text engine that handles non-Latin scripts at character level, and a [web search integration](https://openai.com/index/introducing-chatgpt-search/) that fact-checks against live sources before pixels are produced. The result is an image generator that asks itself what it is drawing before it draws it. DALL-E 2 and 3 retire on May 12th. The old paradigm — sample from a learned distribution and hope it's coherent — is being retired by the company that wrote it.

Image generation has crossed into the same shape every other AI workload is now taking — the model is no longer producing a single sample, it is running a plan against a state of the world and producing an artifact that the plan claims is correct. The output is matter in the sense the embodiment beat means it — a planned object in the world, not a sample from a distribution. The reader who relied on AI images being plausible-but-wrong will not have that defense by June.


**Feature: PROMPT**
*Generate against the API in one command.*
gpt-image-2 ships in the OpenAI API the same day it launches in ChatGPT. The thinking-mode features that gated DALL-E 3 behind Plus are exposed as parameters here — set them once, send the prompt, save the bytes.

```
# install or upgrade the SDK
pip install --upgrade openai

# generate with the new reasoning + web-search path turned on
python -c '
from openai import OpenAI
from base64 import b64decode
client = OpenAI()
r = client.images.generate(
    model="gpt-image-2",
    prompt="a quarterly investor letter cover, deep navy on cream, "
           "with the phrase EMBODIED INTELLIGENCE QUARTERLY set in a "
           "high-contrast serif, character-level Japanese sub-heading "
           "below reading 具身知能四半期報",
    size="2048x2048",
    quality="high",
    thinking="on",         # the new reasoning path
    web_search="on",       # fact-checks against live sources
)
open("cover.png","wb").write(b64decode(r.data[0].b64_json))
'
```
> Pro move: Pro move — gpt-image-2's character-level non-Latin accuracy is the differentiator. For Japanese, Korean, Chinese, Hindi, or Bengali typography in product mocks, this beats every standalone model previously available — including [Midjourney v7](https://www.midjourney.com/).

**Sources:**
- [OpenAI](https://openai.com/index/introducing-chatgpt-images-2-0/)
- [TechCrunch](https://techcrunch.com/2026/04/21/chatgpts-new-images-2-0-model-is-surprisingly-good-at-generating-text/)
- [TheNewStack](https://thenewstack.io/chatgpt-images-20-openai/)

Image: https://www.immersivecommons.com/signal/issue-02/gpt-image-2.jpg (image: [TechCrunch](https://techcrunch.com/2026/04/21/chatgpts-new-images-2-0-model-is-surprisingly-good-at-generating-text/))


## VI. THE BACKLASH ARRIVES

A 20-year-old drove from Texas with a Molotov and a list. The frontier has acquired a physical price.

### 26 · A Manifesto At Altman's Gate.

*A 20-year-old from Texas. A Molotov, a gun, and a list of names.*

At roughly four in the morning on April 10th, **Daniel Moreno-Gama** — 20 years old, from Spring, Texas — [threw a Molotov cocktail](https://www.npr.org/2026/04/13/g-s1-117320/openai-sam-altman-molotov-cocktail) at the exterior gate of [Sam Altman's](https://blog.samaltman.com/) San Francisco residence. The gate lit. No one was injured. Two nights later, on April 12th, [a second incident](https://sfstandard.com/2026/04/12/sam-altman-s-home-targeted-second-attack/) — gunshots — was reported at the same address. On April 13th the San Francisco DA charged Moreno-Gama with [two counts of attempted murder](https://www.cnn.com/2026/04/13/tech/sam-altman-openai-arrest-charges) and attempted arson — Altman and a security guard at the residence the named victims.

The thing on Moreno-Gama when he was arrested was the part of the story that has not stopped. [The San Francisco Standard reported](https://sfstandard.com/2026/04/13/sam-altman-home-molotov-cocktail-shooting-suspects/) he had lurked in the city for days carrying a firearm, a three-part manifesto, and a hit list with names and addresses of AI-company CEOs and their investors. The third section of the manifesto was a letter addressed to Altman. [CNBC carried](https://www.cnbc.com/2026/04/13/sam-altman-openai-ai-arson.html) the document's framing — the author wrote of "our impending extinction" and advocated for killing the executives of AI companies. Altman, hours after the first attack, posted a photo of his husband and toddler on the day they brought their baby home, captioned [in the hopes that it might dissuade the next person](https://x.com/sama).

The frontier has acquired a physical price, and the price is paid in San Francisco living rooms. The [Fortune frame](https://fortune.com/2026/04/14/sam-altman-openai-ceo-attacked-molotov-cocktail-gunshots-san-francisco-anti-ai-data-centers-tech/) reaches for the Industrial Revolution and the Luddites; the cleaner read is closer to home. Twenty-six percent of US voters view AI positively. Three of the top four data-center build-outs in the country face active community lawsuits. A 20-year-old with a manifesto and a hit list traveled across the country to set an executive's house on fire because the Stanford AI Index numbers we covered in week-01 finally found a reader who took them personally.


**Feature: RECEIPT**
> In the hopes that it might dissuade the next person.
— ALTMAN · CEO · OPENAI
Posting a photo of his husband and toddler on the day they brought their baby home — hours after a Molotov cocktail hit the exterior gate of his San Francisco residence. April 10, 2026.

**Sources:**
- [NPR](https://www.npr.org/2026/04/13/g-s1-117320/openai-sam-altman-molotov-cocktail)
- [CNBC](https://www.cnbc.com/2026/04/13/sam-altman-openai-ai-arson.html)
- [SF Standard](https://sfstandard.com/2026/04/13/sam-altman-home-molotov-cocktail-shooting-suspects/)
- [Fortune](https://fortune.com/2026/04/14/sam-altman-openai-ceo-attacked-molotov-cocktail-gunshots-san-francisco-anti-ai-data-centers-tech/)

Image: https://www.immersivecommons.com/signal/issue-02/altman-molotov.jpg (image: [NPR](https://www.npr.org/2026/04/13/g-s1-117320/openai-sam-altman-molotov-cocktail))

---

*THE SIGNAL · FRONTIER TOWER / SAN FRANCISCO*