Payback's a Bitch, Baby

Playback speed

Share post at current time

Share from 0:00

0:00

Transcript

Payback's a Bitch, Baby

Government takes over AI product release schedule and scope

Keith Teare

Jun 27, 2026

This week’s video transcript summary is here. You can click on any bulleted section to see the actual transcript. Thanks to Granola for its software.

Editorial

Payback’s a bitch, baby. Apologies for the blatantly sexist language but it seemed to capture the theme for the week, and possibly the year.

The government is not merely getting involved in the conversation about AI. That has been true for years. No, the frontier AI companies invited the government into the room, and now the government is beginning to behave as if it owns the door, the guest list, the schedule, and the product roadmap.

This is not payback as punishment from critics. It is payback as the consequence of getting what you asked for. Especially Dario Amodei, but now, with the delayed roll out of GPT 5.6 and constraints on who gets it, the wheel has moved full circle.

For several years the leaders of the frontier labs have argued that AI is unusually powerful, unusually risky, and unusually deserving of state attention. Sam Altman asked government to take the risks seriously. Dario Amodei has done the same, and has also tried to shape the acceptable uses of AI in sensitive domains such as defense. Whether each intervention was made in good faith is not the point. The point is that the companies helped create the premise that frontier AI is not normal technology. It is too important to be left to ordinary markets, ordinary product judgment, or ordinary scientific process.

Government heard the second half of that argument. It heard danger, exceptionalism, and national consequence. It heard an invitation. To be honest it was reluctant to play the requested role. But with Anthropic acting tartly, and OpenAI being obliging, it took the bait.

Now the answer is arriving in the bluntest possible form: product release control. Who and when are both under Government control as of yesterday.

OpenAI’s own GPT-5.6 system card presents the model family as a broad-access technology that is wrapped in safety gates. It says Sol, Terra, and Luna remain below OpenAI’s Critical thresholds, while still reaching High capability in biological and chemical domains, cybersecurity, and AI self-improvement. The intended compromise is visible in the language of access:

broad access protected by baseline systems

and for riskier use:

stronger verification, accountability, and trust signals

That sounds like OpenAI was trying to keep responsibility inside the product and its operating environment. That is the right course I believe.

But this week the story moved beyond the company. Two new reports make the control point concrete.

Axios has the OpenAI side. GPT-5.6 is now being rolled out first to about 20 companies whose participation has been approved by the government. The article says OpenAI is “limiting access to all three versions of the new model at the behest of the U.S. government.” OpenAI itself adds the warning label: “We don’t believe this kind of government access process should become the long-term default.”

Semafor has the Anthropic side. Commerce has lifted its block on Claude Mythos 5 for more than 100 US institutions, after deciding that “appropriate safeguards are in place to permit certain trusted partners” to access the model. Fable 5 remains unresolved, but Anthropic has committed to work with the US government on “protocols and standards and releases” for its models.

TechCrunch frames the two stories as the same new regime, with access to frontier models approved:

customer by customer

The Washington Post sharpens the point. It reports that the federal government will vet companies seeking access to OpenAI’s latest ChatGPT upgrade. Semafor then completes the picture from the Anthropic side: Commerce has lifted the block on Mythos for more than 100 US institutions, while Anthropic has committed to work with the US government on “protocols and standards and releases” for its models.

That is a very different control point. It is not a model gateway, a usage policy, a payment rail, an audit trail, a security review, or a liability regime. It is permission to receive the model at all.

Dean Ball names the procedural problem. If government is going to control release, then it must say what standard it is applying. Although I believe we have to trust the companies to be in charge of release, clearly that moment has passed.

Why oppose the Government stepping in?

The question is not whether safety matters. Of course it does. The question is whether release decisions are governed by clear tests, review procedures, appeal paths, and public criteria, or whether the default answer becomes no because nobody can say what yes requires. A frontier-model regime without legible standards is not just safety policy. It is discretionary industrial policy. Not to mention chaos and friction in the way of innovation.

Henry Farrell’s piece explains why this matters politically. AI regulation is not disappearing. It is migrating into national-security tools, export controls, executive discretion, trade restrictions, industrial policy, immigration rules, research funding, and state preemption. In that world, regulation does not always look like a statute or a notice-and-comment process. It can look like strategic ambiguity, where companies comply before the line is even drawn.

The intervention is at the wrong layer.

The right question is not whether frontier AI needs governance. It obviously does. The right question is where governance should live. This week’s strongest AI pieces point toward a different answer: responsibility should sit where AI operates in the world. At the point of sale and consumption.

Raphael’s payment-rail essay gets this exactly right. Once software can act on behalf of people, the issue is no longer a prettier chatbot or a smoother checkout page. It is permission, settlement, identity, fraud, and liability. The valuable layer decides whether an agent may spend money, how much, where, how often, with which credential, and with what audit trail. That is governance at the point of action.

Alex Lazarow makes the same point through the language of trust. Earlier technologies produced auditors, insurers, certificate authorities, underwriters, and standards bodies. AI is doing the same now because the question is changing from whether a model’s output looks right to whether someone can stand behind the work an autonomous system just performed. The trust layer is not decorative. It is the civilizational plumbing that lets powerful systems become ordinary enough to use. So far OpenAI and Anthropic (as well as Google and the Chinese models) have done a good job of that.

Kong CEO Augusto Marietti brings it down to enterprise infrastructure. The issue is who controls cost, policy, routing, and observability when thousands of employees and agents call models all day.

An AI gateway is a control surface. It can route simple prompts to cheaper models, compress context, cache common requests, enforce security rules, and prevent every request from defaulting to the most expensive or most dangerous capability. It can also police outputs and actions.

Andre Karpathy’s post makes a version of the same argument. The next AI interface is not merely a website or local app. It is:

a self-contained, persistent, asynchronous entity with org-wide tools and context, working alongside teams of humans.

If that is right, then the governance problem is not solved by deciding who gets a model preview. The governance problem becomes how organizations grant memory, context, permissions, tools, budget, identity, persistence, and authority to a new class of software actor. That is not model-release policy. That is operating policy. It is also about the agent entering the team as part of it and engaging with human processes.

Nathan Lambert’s TMax post shows the research version. He argues that progress in terminal agents now depends on recipe work: data, algorithms, harnesses, infrastructure, pitfalls, baselines, and reproducible decision steps. Agent systems do not become reliable because someone writes a better slogan about safety. They become reliable because the environment around them can measure, test, retry, evaluate, and improve.

Nilesh Barla’s self-improving agent piece says the same thing in production terms. The improvement is not in the model weights alone. It is in the harness around the model: tools, memory, evaluation loops, telemetry, and environment design. The system that learns is the system whose operating environment can observe, judge, retain, and reuse behavior.

This is why the government-vetting model feels both inappropriate and premature. It tries to control distribution before we have properly built the downstream institutions of use.

It also risks confusing access with responsibility. If a company receives GPT 5.6, has responsibility been solved? No. If another company is denied access, has risk disappeared? No. Models matter, but agency happens when models are connected to tools, money, code, data, workflows, identity, permissions, and people. That is where the consequences appear. That is where the controls should be strongest. Much of my own work with AI (Codex and Claude code mainly) is iterating skills through trial and error until they are honed. The entire process creates an operating canvas I determine.

Exponential View estimates that generative AI produced $110 billion in customer revenue over the past 12 months and is running above a $175 billion annualized rate. It also says AI-linked CapEx has added $535 billion above the pre-AI trend by 2026, while 2026 depreciation approaches $111 billion. The demand is real, but so is the bill. If AI becomes a utility, the utility must pay for itself. Government interjection will slow down the re-payment of the loan.

The venture pieces fit the same pattern. Alfred Lin reminds us that:

You make the most money when you are right and contrarian.

But contrarianism only matters if it is attached to operating capability. If the Government sees its role as consensus, don’t bet on contrarianism.

That is also the lesson for AI governance. Capital is not enough. Models are not enough. Government permission is not enough. The system has to work. Self-governance across borders may be the right path.

The human question is still present. Andrew Keen’s interview with Kate O’Neill opens with the warning:

They profit when you think the chatbot cares.

That line is important because it names the moral hazard beneath the product excitement. The more human the interface feels, the more important it becomes to know who is responsible behind it. Humanism in the AI age cannot mean pretending machines have inner life. It has to mean designing systems that keep human agency, accountability, and meaning intact even as software gets more capable.

So yes, payback’s a bitch. The frontier labs asked government to treat AI as exceptional. Government is now doing exactly that. The danger is that the state will choose the easiest visible lever: model access. That may satisfy political anxiety, but it will not build the trust layer, payment layer, audit layer, gateway layer, liability layer, security layer, or operational layer that real deployment requires.

The companies should not outsource responsibility upward and then act surprised when the state keeps the keys. The government should not pretend release permission is the same as governance. And the rest of us should not confuse fear with seriousness.

AI needs controls. But the controls should follow the work. They should live where agents act, where money moves, where code ships, where infrastructure strains, where risk is insured, where failures are audited, and where humans decide what kind of civilization this technology is supposed to serve.

Editorial
Essays
AI
Venture
Regulation
Infrastructure
Interview of the Week
- What Makes Us Human?
Startup of the Week
- He made your free video player run smoothly. Now he’s doing that for robots.
Post of the Week
- TMax: An open RL recipe for terminal agents

Essays

Technology, Capital and Skills

Paul Krugman revisits the old economics question of whether powerful new technology helps workers or hurts them, using David Ricardo’s change of mind about industrial machinery as the frame. Ricardo first dismissed workers’ fears about mechanization, then later wrote that substituting machinery for labor could be “very injurious to the interests of the class of labourers.” Krugman translates that into modern terms as capital-biased technological change: innovation that can raise GDP while reducing demand for labor and pushing wages down.

The post’s intent is to connect that 19th century debate to AI. Krugman says his own views have shifted since he last wrote about the subject a few months earlier. He is now somewhat less pessimistic about the most extreme scenarios in which AI simply transfers income from labor to capital, but more concerned about how AI changes the returns to skills that have traditionally been valuable. The visible portion of the post sets up two questions for the full essay: whether AI will be capital-biased, and how it will affect the market for skill or skills.

The caveat is access. The full argument is partly behind Substack’s paywall, so the summary here is limited to the public opening and its stated questions rather than Krugman’s detailed answers.

How not to forget what matters

Author: Henrik Karlsson Published: June 22, 2026

Henrik Karlsson writes about why important insights fade and how older practices of self-writing were designed to keep them active. The essay begins with a familiar experience: reading something that feels life-changing, remembering it for a few days, and then losing it as attention returns to small problems, news, and new ideas. Karlsson links that condition to the Latin term stultia, drawing on Foucault’s description of a mind pulled forward by novelty and unable to keep “a fixed point” in acquired truth.

The practical center of the essay is the ancient practice of hypomnemata: notebooks of quotations, examples, conversations, and observations that readers returned to each morning as a form of meditation. Karlsson says the point was not collection for its own sake. It was to make a truth present enough to become character. He uses Epictetus’s idea of having the right reaction “ready at hand” to explain why character habits are harder than ordinary habits: a piano practice can be chained to brushing your teeth, but patience with children or courage in a difficult moment has to be available when the trigger arrives unpredictably.

The essay’s caveat is aimed at writing itself. Karlsson says his own essays often function as meditations on ideas he wants to live by, but essay-writing is too slow and too craft-oriented to be a full substitute for a daily practice that keeps principles top of mind. His conclusion is narrower and more practical: if the goal is to remember what matters, choose a practice that actually returns attention to it again and again.

The First Principles of Space Warfighting

Author: Christian Keil and Alex Oliver Published: June 22, 2026

Christian Keil and Alex Oliver argue that space is already a warfighting domain, and that the United States should stop treating orbital conflict as a future contingency. Their thesis is that American strategy has to defend the orbital commons as an ordered system, because communications, reconnaissance, targeting, positioning, navigation, and timing now shape modern warfare as much as ships, aircraft, or terrestrial networks.

The killer detail is the range of hostile behavior they treat as evidence, not speculation. Russia is jamming GPS receivers across Eastern Europe, Iran launches ballistic missiles through outer space, China has discussed destroying Starlink satellites, and Chinese spacecraft have already maneuvered aggressively around other satellites, including one that moved another object out of geostationary orbit. For the authors, those examples show that the question is not whether space will be militarized, but whether the U.S. can preserve access when adversaries are already contesting it.

The pull is that space deterrence is less about building a single exquisite weapon than about designing resilient, proliferated, commercially informed systems that make orbital coercion harder to execute and easier to answer.

Read more: Source

China is winning the other tech race

Author: Noah Smith Published: June 24, 2026

Noah Smith argues that the decisive technology race is not only about AI or semiconductors. Using Paul Kennedy’s framework that great powers rise by mastering the era’s key technologies, Smith says the United States still leads in AI and chips, but China is ahead in what he calls the parallel “electric revolution”: power systems, batteries, electric vehicles, factories, and the industrial stack around electrification.

The essay widens the strategic frame. Smith treats AI as one race and electrification as another, then argues that a country can lose the broader technological century if it focuses too narrowly on models and chips while a rival dominates the physical systems that power industry, transport, and infrastructure. In his account, China’s strength is not just one product category. It is a manufacturing and deployment system that connects batteries, renewables, grids, vehicles, and machinery.

Smith’s caveat is that the United States still has major advantages, especially in AI, advanced semiconductors, software, capital markets, and scientific talent. His claim is that those advantages are not enough if the complementary industrial base shifts elsewhere. The electric economy is a hard-technology race, not a slogan, and the United States needs to treat it as a central front rather than a secondary climate or manufacturing story.

GPT-5.6 Preview System Card

OpenAI | OpenAI Deployment Safety Hub | June 26, 2026

OpenAI’s GPT-5.6 system card describes Sol, Terra, and Luna as a new model family, with Sol as the flagship, Terra as a lower-cost capable model, and Luna as the fastest and most cost-efficient option. OpenAI says it plans broad access, but is initially deploying GPT-5.6 through a limited preview while preparing safeguards for wider release. Under its Preparedness Framework, the company treats all three models as High capability in biological and chemical domains, cybersecurity, and AI self-improvement, and says all remain below its Critical thresholds.

The card’s purpose is to document model data and training, model safety, robustness, health, hallucination, alignment, bias, and preparedness testing. OpenAI says GPT-5.6 Sol is the most capable model it has deployed and that its safeguards are built around two ideas: broad access protected by baseline systems, and more permissive access for higher-risk dual-use capabilities through stronger verification, accountability, and trust signals. The system card describes real-time monitoring, automated jailbreak red-teaming, actor-level enforcement, Trusted Access for Biology Research, Trusted Access for Cyber, and security controls for protecting model weights and sensitive data.

The preparedness sections give the most detailed caveats. In cyber, OpenAI says GPT-5.6 Sol sustained multi-day vulnerability research and identified vulnerabilities, but did not independently produce a functional full exploit chain against hardened software projects in its VulnLMP testing. In biology, the card says the models exceed or approach expert baselines on some troubleshooting and knowledge tasks, but still fall below some thresholds in protein-binding and DNA-sequence-design evaluations. OpenAI also says external evaluators found incremental gains in agentic biology tasks, no evidence of certain sandbagging risks, and on-par or slightly stronger cyber capability relative to recent predecessors, while noting that the testing is one part of an evolving deployment-safety process.

The state of the AI economy

Authors: Azeem Azhar, William Gildea, Hannah Petrovic, Nathan Warren and Marija Gavrilov Published: June 25, 2026

Exponential View publishes a bottom-up model of the generative AI economy, estimating that AI generated $110 billion in customer revenue over the past 12 months and is now running at an annualized rate above $175 billion. The report’s stated purpose is to measure real demand rather than supply-side spending, because the supply side is visible through public chipmakers and hyperscalers while the demand side is obscured by private labs and public companies that bury AI revenue inside segment totals. Exponential View says that demand matters because AI now underpins $22.7 trillion of stock market valuation and has driven US GDP growth over the past six quarters.

The methodology is the core of the report. Exponential View says it built a proprietary line-level revenue model across 1,000+ firms, tracing revenue lines to filings, audited accounts, transcripts, credible reporting, cloud-attribution signals, executive comments, proxy metrics, and other soft signals. Each line receives a confidence score, company models are triangulated against silicon, build cost, segment mix, traffic, capacity, and industry research, and revenue is deduplicated by value-add. In its example, a $100 app spend that sends $60 to a model provider and $30 to hosting is counted as $100 of end-customer demand, not $190 of stacked revenue.

The report’s central finding is that AI demand is real, large, and growing faster than earlier technology waves, but still early. It says hyperscalers and neoclouds have committed to $2 trillion of cumulative CapEx, with AI-linked CapEx adding $535 billion above the pre-AI trend by 2026, and that the 2026 estimated depreciation charge approaches $111 billion. On the report’s assumptions, current AI infrastructure revenue now just clears the quarterly depreciation hurdle, though cumulative revenue has not yet covered the full cumulative buildout. It also argues that falling token prices have so far expanded demand rather than reducing total spend: every 10% price cut is associated with 12-18% more token use. The report says tokens are a useful billing metric, but not yet the right economic unit, and proposes quality-adjusted output tokens as a better measure of the useful intelligence moving through the economy.

The report’s scope is explicit. It is global ex-China, counts app, model, and infrastructure-hosting revenue, and excludes chips, hardware, AI ad uplift, legacy-software features, professional services, systems integration, CapEx, and financing from revenue.

Read the report, the landing page, and the accompanying essay

The Self-Improving AI Agent Is a Production Pattern Now

Author: Nilesh Barla Published: June 20, 2026

Nilesh Barla argues that the self-improving AI agent has moved from research curiosity to production pattern. The key claim is that the improvement is not happening inside the model weights. It is happening in the harness around the model: the tools, memory, evaluation loops, production telemetry, and environment design that let an agent learn from what happens after it ships.

The killer detail is the comparison between NVIDIA’s 2023 Voyager work and Airbnb’s 2025 customer-support flywheel. Voyager improved at Minecraft by writing programs, testing them, keeping successful behaviors in a skill library, and reusing them later. Airbnb applied a related pattern to real customer interactions, capturing production traces, scoring them against evaluation criteria, and feeding the signal back into the next version of the system. Barla calls the discipline “agentic harness engineering” and says Claude Code shows the same lesson in practice: the same model can behave like a different system when given a filesystem, shell, tool surface, context, and a write-test-fix loop.

His pull is that durable AI advantage may come less from model selection than from the closed loop around the model. The production agent that improves is the one whose operating environment can observe, judge, retain, and reuse its own behavior.

Read more: Source

Karpathy on Claude as a persistent teammate

Andrej Karpathy argues that Claude’s next UI/UX shift is not another chat box or desktop app, but a persistent, asynchronous entity embedded into the organization’s normal flow of work. He says that once the engineering is done across tools, integrations, compute environments, memory, and security, “Claude basically joins the team in a seamless way” and can be spoken to as one would speak to a person.

The post frames this as the third major redesign of LLM interaction. The first paradigm was the LLM as a website. The second was the LLM as a local app. The third, in Karpathy’s words, is “a self-contained, persistent, asynchronous entity with org-wide tools and context, working alongside teams of humans.”

The pull is that this turns the agent debate from interface design into organizational design. If an AI system has memory, context, tools, permissions, and asynchronous persistence, the important question is no longer where the chat window sits. It is how the team works with a new kind of colleague-like software.

Read more: Source

The machine wants a payment rail

Author: Raphael Published: June 24, 2026

Raphael argues that AI commerce is not mainly a chatbot or checkout-interface story. It is a permission, settlement, identity, fraud, and liability story. Once software can act on behalf of people, the valuable layer becomes the system that decides whether an agent may spend money, how much, where, how often, with which credential, and with what audit trail.

The killer detail is the cluster of institutional moves he reads as one signal. Visa linked with OpenAI on secure AI commerce, Mastercard launched Agent Pay for Machines, and the Bank of England softened its stablecoin stance by replacing individual holding caps with a temporary issuance guardrail. In his framing, these are not isolated product announcements. They show payments networks and regulators preparing for transactions that may be continuous, conditional, high-frequency, and invisible until something fails.

The pull is that the most important agent-commerce companies may not make agents look smarter. They will make agent behavior boring, bounded, revocable, and auditable enough for merchants, banks, users, and regulators to trust.

Read more: Source

GLM-5.2 Is The New Best Open Model

Author: Zvi Mowshowitz Published: June 22, 2026

Zvi Mowshowitz argues that GLM-5.2 is probably the strongest open model now available, but that its benchmark strength should be read as a ceiling rather than a full measure of real-world usefulness. The post’s thesis is that GLM-5.2 narrows the visible gap with frontier systems more than previous open releases, while still leaving important practical gaps in generalization, features, and deployment fit.

The killer detail is his estimate of the frontier lag. Setting aside missing capabilities, inferior generalization, and the fact that the model is distilled from Claude, Mowshowitz says GLM-5.2 can be argued to sit roughly four to seven months behind the public frontier on core tasks. That is closer than prior open-model moments, including the DeepSeek shock, and enough to change the update cadence for people asking where the next step-change was.

The pull is that open models may now be close enough to matter strategically but awkward enough to resist simple adoption. GLM-5.2 looks like a cost-benefit contender, not an automatic replacement for closed frontier systems.

Read more: Source

From Open Source Software to Open Source Strategy

Author: Bill Gurley Published: May 9, 2026

Bill Gurley argues that open source has moved beyond a software-development model into a deliberate business and geopolitical strategy. He calls the newer pattern “Open Source Strategy”: using open source to neutralize a stronger competitor, commoditize an expensive input, align an industry around a shared standard, or head off regulatory pressure. In his framing, most of these moves are defensive at first, but the offensive payoff often follows once the ecosystem coordinates around the open standard.

The essay builds the argument from open source history. Gurley starts with Stallman, Torvalds, Raymond’s The Cathedral and the Bazaar, Red Hat, and the classic economic model of open source companies. He then says four later developments changed the meaning of open source: neutral foundations such as the Linux Foundation and CNCF became referees for cross-company collaboration; CIOs became “open source first”; AWS showed what happens when the underlying stack is commoditized and the operator wins; and China embraced open source as national strategy, including in AI through DeepSeek R1, Moonshot’s Kimi, Zhipu’s GLM, and Alibaba’s Qwen.

The examples are the substance of the strategy claim. Gurley lists Android as Google’s answer to possible Apple control of mobile; Meta’s Open Compute Project as a way to commoditize data-center hardware; Kubernetes as Google’s answer to AWS lock-in; LF Networking as telecom’s shared response to proprietary networking stacks; RISC-V as an open instruction-set alternative to proprietary chip architectures; and Overture Maps as a neutral mapping-data effort backed by companies that did not want Google to own that strategic layer. The AI relevance is that open models can be read through the same lens: when a follower wants to weaken a frontier leader’s advantage, openness can turn the leader’s proprietary edge into an industry commodity.

Read this before you vibe-code another app

Author: Yael Grauer Published: June 22, 2026

Yael Grauer reports on the security risks created when AI-generated personal software moves from local experiments into hosted apps with real data. The article opens with Bob Starr’s “Boomberg” project, a vibe-coded site about U.S. tax money going to tech companies. Starr launched it quickly and only months later discovered a hidden SQL injection risk that could have exposed or altered data. Grauer uses that example to distinguish between the value of amateur software creation and the higher security standard required once an app touches customer logs, medical data, financial records, internal documents, or other people’s personal information.

The article’s evidence comes from both anecdotes and security research. It cites a coding agent wiping a production database, a demo app that attracted hackers, a critical flaw found before launch in a crypto-reward running app, and the viral AI-agent social network Moltbook, whose production database was reportedly found wide open by Wiz researchers. It also cites Wired’s reporting on Red Access finding roughly 5,000 publicly accessible apps built with popular vibe-coding tools that lacked authentication, with close to 2,000 apparently leaking sensitive material such as medical and financial information, strategy documents, and chatbot logs.

Grauer’s sources stress that AI-generated code is not uniquely insecure; plenty of human-written software has serious flaws too. The difference is scale and overconfidence. Security reviews often have to be explicitly invoked, and tools such as Claude Code’s /security-review, Codex Security, OWASP guidance, and security-oriented skills help only when they are actually set up and given the right threat model. Gabriel Bernadett-Shapiro of SentinelOne says the biggest concern is not only buggy code but the moment a local app is pushed to the cloud without authentication or with configuration options the builder does not understand. Jack Cable of Corridor puts the practical line simply: prototypes and low-risk personal tools are one thing; public apps with sensitive data need a different standard.

What Should Be Done

Author: Dean W. Ball Published: June 26, 2026

Dean W. Ball argues that the Trump administration’s emerging frontier-model regime is beginning to look like a licensing system without a clear public standard for approval. His central complaint is that access to top models is being restricted while labs, users, and policymakers still lack concrete criteria for what counts as safe enough to release.

The essay focuses on institutional ambiguity. Ball says that if no one can specify the threshold for approval, the default answer to new model releases will keep becoming no. In that world, frontier AI policy becomes a discretionary bottleneck rather than a governed process. The result is not only safety review. It is a form of industrial policy that decides who can train, release, access, or build on frontier capability.

Ball’s proposed direction is procedural rather than anti-safety. He argues for making the release regime explicit: define the tests, define the review process, define appeal or revision paths, and make the government say what standard it is applying. The point is not that every model should be released. The point is that a frontier-model regime without legible standards will push major AI decisions into ad hoc negotiation between government and a small number of companies.

Every Technology Wave Builds A Trust Layer. AI Is Building Its Now.

Author: Alex Lazarow Published: June 26, 2026

Alex Lazarow argues that AI is entering the same institutional gap earlier technologies created before auditors, underwriters, certification labs, and certificate authorities appeared. His thesis is that the next important venture category may be the AI trust layer: companies that verify, insure, audit, and stand behind work done by autonomous systems rather than merely checking whether a model’s answer looks accurate.

The killer detail is the shift in the verification question. In the first wave, tools tried to detect hallucinations, prompt injection, and deepfakes because a human was still expected to read the output and decide what to do. In the second wave, AI systems are booking trades, issuing refunds, filing articles, preparing taxes, and shipping code. The question becomes not “is this output accurate?” but “can someone stand behind the work that just happened?” Lazarow points to AIUC’s insurance for agents, Oath’s AI-native audit firm for financial work, and journalism-verification networks as early signs of that new layer.

The pull is that AI’s trust market may look less like another model wrapper and more like Lloyd’s, UL, SSL, or the Big Four rebuilt for autonomous work.

Read more: Source

Augusto Marietti, CEO of Kong, on the end of tokenmaxxing

Author: Sacra Published: June 25, 2026

Sacra interviews Kong CEO Augusto Marietti about why the model-routing market is splitting into distinct businesses as AI moves from demos into enterprise consumption. His thesis is that “AI gateways” are not one category. OpenRouter is closer to a Costco-for-tokens marketplace for developers, Cloudflare and Vercel offer public gateways for startups that want many models behind one API key, and Kong sits behind the firewall where large companies need control over internal LLM usage.

The killer detail is the cost curve behind the phrase “the end of tokenmaxxing.” Sacra says top-end frontier model token prices have fallen 25% since 2022, but total spend is up 700x because agents use larger contexts, reasoning chains, and multi-step tool loops. Marietti argues that at enterprise scale, employees do not choose models with procurement discipline in mind. A gateway can route simple prompts to cheaper models, compress prompts, cache common queries, enforce security rules, and prevent every request from defaulting to the most expensive model.

The pull is that the AI infrastructure buyer is no longer only asking which model is best. The harder question is who controls cost, policy, routing, and observability once 10,000 employees and agents start calling models all day.

Read more: Source

Venture

Investing is Hard

Alfred Lin shared an X Article titled “Investing is Hard.” The visible preview states the core premise directly: “You make the most money when you are right and contrarian.” It opens by contrasting that lesson with the admiration many investors have for Warren Buffett and Charlie Munger, especially their teaching on consistent compounding and playing the long game.

The Strebulaev-Jackson VC ranking, and a capital-efficiency critique

Ilya Strebulaev announces the 2026 Strebulaev-Jackson Venture Ranking, a data-driven ranking of the top 100 U.S. venture-capital firms built with Blake Jackson. Strebulaev frames the project as an attempt to move founders and LPs beyond the same mental shortlist of firms that “matter” toward a transparent way to check which firms show up strongly in the data.

Credistick responds by arguing that any ranking based on cumulative score will tend to reward scale and activity rather than investment quality. The post uses Tiger Global ranking above Founders Fund as the prompt, and says that if the objective is impact or relevance, a cumulative ranking may be fine, but it should not be read as a performance ranking.

The proposed alternative is a capital-efficiency lens. Credistick suggests dividing the sum of individual net profits by total capital deployed, creating a quality score similar to TVPI but adjusted for issues such as overvaluation, monitoring, dilution, and human-capital decay. The post also proposes a “horizon capital efficiency” version that looks at decayed net profit and capital deployed over the last 10 years, on the theory that venture firms should be rewarded for successful exits delivered inside a normal fund term.

The post says a rough approximation, using the Strebulaev-Jackson score against an exponentially weighted moving average of fund size, produces a more intuitive ranking if the goal is firm quality. In that version, the top five become SV Angel, Ribbit Capital, Benchmark, Sequoia, and Kleiner Perkins; the largest upward moves are USV, First Round, SV Angel, Inflection Ventures, and Lux Capital. Credistick stresses that this is not a criticism of Strebulaev and Jackson’s work, but a different perspective on what the ranking is measuring.

Keith Teare adds a SignalRank-specific response: efficiency can be measured in several ways, and SignalRank’s investor score puts it at the center. He says the other key is measured time: SignalRank uses the past three years, stage-specific scores, and a daily computed leaderboard to elevate continuous success over time.

Read the ranking announcement, the capital-efficiency response, and Keith Teare’s SignalRank note

Software’s Biggest Winners, Losers, and Zombies

Author: OnlyCFO Published: June 20, 2026

Software winners losers and zombies chart

OnlyCFO argues that the post-2021 software market has been brutal enough to separate durable compounders from companies that only looked strong in the zero-rate valuation boom. The headline number is stark: only seven of 75 public software companies have a higher stock price today than they did five years ago, and unless an investor owned one of the top three performers, they likely underperformed the broader market.

The analysis uses public software stocks, IPO performance, revenue multiples, and growth durability to explain why the sector reset has been so uneven. The bottom 10 software names have all lost more than 90% of their value, with Domo down 97%. Day-one IPO enthusiasm also looks poor in hindsight: the average software IPO was down 5% six months after listing while the broader market rose more than 6%. Figma is the cautionary example, with a 250% first-day pop followed by a 79% decline six months later. Palantir is the counterexample: it direct-listed, slipped on day one, and later became the defining AI software winner.

The pull is that software valuation recovery is not just a question of multiples coming back. Companies need growth endurance strong enough to outrun multiple compression, and the AI era is making that divide sharper.

Read more: Source

Every Moat Becomes Moot

Author: Kyle Harrison Published: June 20, 2026

Kyle Harrison argues that competitive moats are less like walls than engines that require constant discomfort, maintenance, and renewal. The thesis turns the usual investor language inside out: defensibility is not a static asset a company accumulates, but an operating muscle it keeps exercising. Once the company stops accepting the pain required to preserve that muscle, the moat starts decaying.

The killer detail is Harrison’s pairing of two notes from Brad Stone’s Amazon history. One says Amazon’s willingness to accept the slimmest margins and build an organization lean enough not to need margin was itself a competitive advantage. The other notes that Amazon’s ebook share fell from 90% in 2010 to under 60% by 2012. Thin margins were not weakness in Bezos’s model. They attracted customers, repelled margin-sensitive giants, and forced Amazon to productize its own cost structure.

The pull is that founders and investors should stop asking whether a moat exists. The harder question is whether the company can build the next moat faster than the current one decays.

Read more: Source

Why Pre-Seed Investing Has Never Been Harder

Author: Turner Novak Published: June 25, 2026

Charles Hudson / Precursor Ventures episode image

Turner Novak interviews Charles Hudson of Precursor Ventures about how pre-seed investing has changed since Hudson started the firm in 2015. Hudson says “pre-seed” was once closer to a round of $1 million or less, then maybe under $1.5 million, but now can describe anything from a $750,000 first check to a $10 million round if the company needs that much to prove the first hypothesis. His practical definition is not the label but the work being financed: pre-seed is product-market-fit finding and hypothesis validation, not management-team scale-out or ARR growth.

The conversation’s main venture-market point is that seed and pre-seed have become structurally harder. Hudson says multi-stage firms moved from dabbling in seed to treating it as a permanent product because they want lifecycle access and AUM growth. That changes pricing and incentives. A dedicated seed fund has to care about entry price, ownership, and cash-on-cash return on the initial check; a multi-stage firm can treat seed as access to a company where it hopes to deploy much more capital later. Hudson says that makes multi-stage seed and dedicated seed “two different businesses with different dynamics.”

The episode also covers founder and LP behavior. Hudson says round names are now partly optics: founders may call a large round “pre-seed” to preserve the ability to call the next one seed, and some companies rename rounds for recruiting or fundraising signaling. He argues that LPs are not irrational for backing mega-funds, because a large allocator may prefer a few broad relationships to the work of diligencing, monitoring, and re-upping many small seed funds. For founders, the later sections turn to raising as a non-consensus founder, what Precursor looks for, the grind of raising Precursor’s first fund through more than 300 LP meetings, and the “last $250k effect.”

Blue Origin: can capital plus time fix execution?

Author: Edoardo Zarghetta Published: June 24, 2026

Edoardo Zarghetta argues that Blue Origin’s possible first external fundraise should not be underwritten as “the next SpaceX,” but as a narrower question: whether new capital can turn an already well-funded space company into a reliable, high-cadence operator. The thesis is that capital plus time has produced real assets, but not yet the execution evidence an outside investor would need at a halo valuation.

The killer detail is the two-column ledger. Blue Origin has put New Glenn into orbit, landed a large reusable booster on its second flight, won a $3.4 billion NASA Artemis V lander contract, supplied BE-4 engines to ULA’s Vulcan, and filed for a 51,600-satellite Project Sunrise constellation aimed at orbital AI data centers. But in 2026 New Glenn also suffered an upper-stage failure, then lost a first stage in a static-fire explosion that damaged its only operational New Glenn pad and may slow launch cadence for more than a year.

The pull is that survival is not the same as return. Bezos’s backing may lower existential risk, but the investment case still depends on cadence, reliability, disclosure, and an entry price that leaves room for the outcome to matter.

Read more: Source

Regulation

Sovereign Is He Who Decides the Safeguards

Author: MTS Published: June 26, 2026

MTS argues that the Fable 5 export directive has made “sovereign AI” urgent for countries outside the United States and China, but also exposed how vague that phrase usually is. His thesis is that frontier access matters for economic and national-security reasons, yet middle powers should stop treating a domestic frontier model as the default answer and instead identify the specific capabilities they need to control.

The killer detail is the cost-capability mismatch. The UK has announced roughly $2 billion for sovereign AI efforts, including a state-owned data center, while European and Australian officials are talking about local AI capability after the Fable 5 restriction. MTS contrasts that ambition with the full-stack version sometimes implied by the slogan: chips, fabs, models, software, robotics, talent, power, and data. If the real goal is resilience, he argues, the more useful policy question is which safeguards, routing decisions, domain capabilities, and access rules a country must be able to decide for itself.

The pull is that sovereignty in AI may not mean owning every layer. It may mean knowing which decisions cannot be outsourced and building enough capability to make those decisions stick.

Read more: Source

OpenAI releases powerful new GPT-5.6 model under restrictions

Authors: Ina Fried and Ashley Gold Published: June 26, 2026

Axios reports that OpenAI is rolling out GPT-5.6 under US government restrictions, with all three versions of the model family initially limited at the government’s request. The three versions are Sol, the most powerful model; Terra, a balance of efficiency and capability; and Luna, the speed-and-cost option. OpenAI says the limited preview is going first to around 20 companies whose participation has been approved by the government, with broader access expected in the coming weeks.

The killer detail is OpenAI’s own discomfort with the precedent. The company says it does not believe “this kind of government access process should become the long-term default,” because it keeps powerful tools away from users, developers, enterprises, cyber defenders, and global partners. But it also says it is taking the short-term step while the administration develops a cyber Executive Order framework and a repeatable process for future model releases.

The caveat is cyber risk. Axios says the administration must establish a classified process by August to assess AI models’ cyber capabilities and determine which systems qualify as covered frontier models. OpenAI’s position is that GPT-5.6 Sol can substantially help legitimate defensive work while remaining below its Critical threshold for cyber misuse.

US releases powerful Anthropic model Mythos to some US companies

Authors: Reed Albergotti and Ben Smith Published: June 26, 2026

Semafor reports that the US government lifted its block on Anthropic’s Claude Mythos 5 model, allowing release to more than 100 US institutions, including major companies and government agencies. The decision partially de-escalates the confrontation that began when the Trump administration imposed export controls on Mythos and Fable 5 after warnings that the models could be jailbroken for malicious purposes.

The killer detail is the Commerce Department letter. Commerce Secretary Howard Lutnick wrote that “appropriate safeguards are in place” for certain trusted partners to access Mythos, and that Anthropic has committed to work with the US government on “protocols and standards and releases” for its models. The arrangement says a license will no longer be required for the entities listed in an annex, their foreign-national employees, or Anthropic’s foreign-national employees.

The open question is Fable 5. Semafor says the letter is silent on the weaker model, though people close to the talks say release discussions are moving forward. That makes the new regime look less like a single emergency action and more like a frontier-model access system being assembled in real time.

It’s not about Anthropic vs. OpenAI anymore

Author: Russell Brandom Published: June 26, 2026

Russell Brandom argues that the AI release-control story should no longer be treated as Anthropic versus OpenAI. Two weeks after the U.S. government pulled Anthropic’s Fable and Mythos models, TechCrunch says OpenAI’s GPT 5.6 is headed into the same kind of restricted preview, with release approved “customer by customer” until a broader launch is allowed. The point is that both leading labs now face the same problem: frontier models are becoming politically consequential enough that the government is starting to control who gets access and when.

The Washington Post reports the same shift in harder institutional terms: the federal government will vet companies seeking access to OpenAI’s latest ChatGPT upgrade. That makes the story less about one lab’s release timing and more about a new distribution layer, where access to frontier capability can depend on government approval.

The article’s central concern is process. Brandom says a temporary preview might be manageable, but Mythos has already been in preview for months, and even a few weeks of review can limit the economic upside of a costly model. He writes that a haphazard approval process for every frontier model would be expensive for the whole industry and could chill the data-center buildout if model-development cadence slows.

The piece cites Dean Ball’s argument that a release regime needs legible standards: what risks are being tested, who has the expertise to test them, and what assurance would satisfy regulators. Brandom does not dismiss safety concerns; he names cybersecurity, biorisk, and alignment as real issues. His conclusion is that restricting model releases cannot be the whole answer, and that labs will need collective action rather than treating safety and regulation as ways to disadvantage rivals.

What the Anthropic fight says about AI regulation

Author: Henry Farrell Published: June 24, 2026

Henry Farrell argues that the Anthropic/Fable fight should not be reduced to a simple argument between AI regulation and deregulation. His thesis is that national-security tools such as export controls are becoming a central form of AI regulation, and that they give the executive branch wide discretion over who may access models, compute, and technical knowledge.

The killer detail is Farrell’s use of Alondra Nelson’s warning that the Trump administration is not abolishing AI regulation but rearranging it through executive discretion, trade restrictions, industrial policy, immigration controls, equity stakes, research funding, and preemption of state authority. Farrell extends that frame backward to Biden’s proposed tiered system for global AI access and forward to Trump’s looser language of AI “dominance.” Both approaches treat American technological hegemony as the goal, but they differ on whether power should be exercised through technocratic architecture or raw dealmaking.

The pull is that AI regulation may increasingly look less like public rulemaking and more like strategic ambiguity: a state-controlled access regime where companies comply before the line is even drawn.

Read more: Source

Infrastructure

Yes, Transformers Are a Problem...

Dana Golden, writing in ChinaTalk, argues that the grid supply chain problem is much deeper than the widely discussed shortages of transformers, gas turbines, and permitting. Her thought experiment starts by granting an infinite supply of transformers, then gas turbines, then streamlined permitting, and still concludes that data center and grid buildout would face major constraints. The reason, she says, is that power electronics, HVDC systems, wide-bandgap semiconductors, gallium, silicon carbide, advanced magnets, converter stations, DC filters, and other inputs have their own concentrated and geopolitically exposed supply chains.

The essay’s central distinction is between shortages that money can solve and chokepoints that money cannot solve quickly. Golden separates “gallium from copper”: in a crisis, commodity traders and price signals can redirect many metals, but gallium and some rare earths used in power electronics are harder to replace because China dominates supply and has already used export controls. She connects that to grid modernization technologies such as HVDC transmission, solid-state circuit breakers, grid-forming inverters, FACTS devices, and solid-state transformers. Solid-state transformers can be far smaller and lighter than traditional equipment and can actively regulate voltage, but they trade exposure to grain-oriented electrical steel for exposure to gallium, silicon carbide, and advanced magnets.

Golden also links the grid problem directly to AI infrastructure. NVIDIA’s move toward 800 VDC distribution in AI data centers, she writes, uses the same broad power electronics ecosystem as grid-scale HVDC. The technologies that can make AI data centers more efficient can also intensify demand for the same wide-bandgap semiconductors needed by utilities. She cites Wolfspeed’s Chapter 11 as evidence of how brutal semiconductor supercycles can be, even for technically important suppliers.

The policy section calls for treating power electronics as a security priority, widening the CHIPS Act’s focus beyond leading-edge logic, accelerating allied gallium diversification, considering stockpiles, shortening mine lead times, building integrated analysis across DOE, Commerce, and national security agencies, and investing through the supply chain. Golden’s caveat is that this does not require nationalization or indiscriminate subsidies. Her argument is that markets can do much of the work, but only if government helps create demand, analysis, and resilience around the right bottlenecks.

Transmission Dominance with Chinese Characteristics

Dana Golden, writing in ChinaTalk, compares China’s high-voltage transmission buildout with the United States’ fragmented grid and argues that the common “China builds fast, America litigates” explanation is true but shallow. China has built more high-voltage transmission in the last 15 years than the United States has in its entire history, including the 3,293-kilometer Changji-Guquan 1,100 kV UHVDC link and 45 ultra-high-voltage projects in operation by late 2025. The United States, by contrast, has built only about 2,370 miles of HVDC lines and added just 55 miles of 345 kV-plus transmission in 2023.

Golden’s main claim is institutional. China built a modern, centrally planned grid after the 1990s, optimized to move bulk power from western resource regions to coastal load centers. The United States inherited a century-old patchwork of utilities, state regulators, RTOs, ISOs, and limited federal authority. It has sophisticated market tools such as locational marginal pricing, congestion pricing, and financial transmission rights, but those price signals do not translate into coordinated interregional buildout. China has weaker price signals but stronger planning authority. Golden writes that China is building the hardware while the United States has the software, and neither has both.

The AI relevance is explicit. Data center power demand could reach 12% of US electricity by 2028, and training and inference loads make energy a siting constraint more than a simple cost input. Golden says transmission lets energy production decouple from consumption location, which is exactly what AI infrastructure needs when training wants sustained cheap power and inference wants proximity to population centers. She also notes that batteries and transmission solve different problems: batteries shift energy across time, while transmission moves it across space. They are complements, not substitutes.

The essay does not argue that the United States should copy China’s grid governance. Golden lists advantages in the American model, including market-driven adoption of cheap renewables in Texas, regional experimentation, and containment of planning failures. Her policy conclusion is narrower: the United States needs interregional transmission, institutions that can plan it, and supply-chain capacity for HVDC equipment and power electronics. The harder constraint may be institutional architecture rather than capital, permitting, or technology alone.

Critical Mineral Security: The Endgame

Author: Farrell Gregory Published: June 22, 2026

Farrell Gregory argues in ChinaTalk that U.S. critical-minerals policy has failed because it treats too many materials as an undifferentiated category. After years of executive orders, legislation, and funding, he writes, the United States remains heavily dependent on China for minerals that Beijing has shown it is willing to restrict. He points to China’s rare-earth cutoff to Japan in 2010, export controls on gallium and germanium in 2023, restrictions on graphite and antimony, and rare-earth restrictions in 2025 that still left U.S. manufacturers scrambling even after a bilateral agreement to lift some limits.

The essay’s main proposal is prioritization. Gregory says the USGS critical minerals list has expanded from 35 to 50 to 60 materials, diluting benefits and policy attention across materials with very different risk profiles. Instead, he proposes a shorter list of 25 priority materials, selected by proven Chinese ability and willingness to interfere with those supply chains and by whether U.S. support can make a meaningful short-term difference. The list includes 16 rare earth elements and nine other minerals such as gallium, germanium, tungsten, magnesium metal, antimony, indium, natural graphite, synthetic graphite, and bismuth. He excludes materials such as copper and potash from the priority framework because their risks, market size, or source countries call for different tools.

Gregory’s policy section distinguishes categorical support from targeted intervention. He criticizes programs such as the 45X Advanced Manufacturing Production Credit and some DPA III deployments for giving similar treatment to strategically different minerals. For the prioritized set, he proposes stockpiling, equity investments, direct loans, price floors, and offtake agreements matched to each supply chain’s economics. His measurable goal is that domestic and other ex-China sources should equal 100% of U.S. consumption for the priority materials within ten years. The caveat is that financing mines and processing capacity is complex, and each mineral needs a different instrument. The claim is not that all industrial policy is clean or efficient, but that continued reliance on weaponized Chinese chokepoints is costlier.

US Grid Constraints: Towards 40GW+ of Behind-The-Meter Datacenter by 2028?

Authors: Jeremie Eliahou Ontiveros, Sebastian Orejas, Ellie Holbrook and Dylan Patel Published: June 25, 2026

SemiAnalysis argues that the U.S. grid is approaching a data-center power bottleneck that will push a large share of new AI load behind the meter by 2028. The post’s premise is that grid-served data centers still dominate today, but AI labs and hyperscalers are asking for power on a timeline that transmission interconnection, utility planning, and regional queues cannot satisfy. The authors frame behind-the-meter generation as a practical workaround: colocate load and generation, meter the net exposure to the grid, and use grid connection as a supplement rather than the only source of power.

The article’s main evidence comes from the interconnection mechanics now emerging in Texas. It highlights ERCOT’s private-use network structures, including Withdrawal-Limited Private Use Networks, where a site can connect more total load than the transmission system could otherwise support if it caps grid withdrawals and relies on onsite generation for the rest. It also describes Provisional Controllable Load Resources, where a flexible load can connect at its requested size but ERCOT can dispatch it down in real time during constraints until transmission catches up.

The caveat is that these structures are bridges, not magic shortcuts. ERCOT explicitly treats them as ways to energize more load sooner within existing transmission limits, not as faster interconnection. SemiAnalysis says the result will create winners and losers across turbine makers, fuel-cell and reciprocating-engine vendors, independent power producers, grid equipment suppliers, and data-center developers. The broader claim is that power procurement is becoming a strategic AI infrastructure layer, with behind-the-meter economics determining which builders can secure enough energy while the grid is still catching up.

What Makes Us Human?

Keen On America

What Makes Us Human?

“AI companies are taking advantage of our natural tendency to ascribe an inner life to our interlocutors. They profit when you think the chatbot cares.” — Kate O’Neill…

Listen now

7 days ago · 2 likes · 1 comment · Andrew Keen

Andrew Keen interviews Kate O’Neill, the self-styled “Tech Humanist,” in a piece framed around whether humanism still has meaning in an AI age. The post opens with O’Neill’s warning that AI companies benefit from people’s tendency to ascribe inner life to chatbots: “They profit when you think the chatbot cares.” Keen then asks how to use the word humanist without sounding like a chatbot, and presents O’Neill’s answer as a claim that humans are distinguished by the quest for meaning.

The setup is skeptical rather than celebratory. Keen notes that “humanism” is fashionable in AI discourse and asks whether O’Neill can pass what he calls the “Keen Test,” the reverse of the Turing Test: saying something that would elude Claude or ChatGPT. He invokes Kazuo Ishiguro’s Klara and the Sun as a caveat, warning that bots may become better than humans at extracting or simulating meaning. The interview is therefore less a simple profile than a test of whether the language of humanity can survive automation without becoming another corporate slogan.

He made your free video player run smoothly. Now he’s doing that for robots.

TechCrunch profiles Kyber, a Paris-based infrastructure startup from VLC lead developer Jean-Baptiste Kempf. Kyber is building an SDK for controlling remote devices in real time by synchronizing video, audio, sensor data, and control inputs with minimal latency. Kempf tells TechCrunch that if “hundreds of millions of robots and drones” are operating in the world, remote control and speed become foundational infrastructure because “every millisecond matters.”

The article ties Kyber to physical AI but says its market is broader than AI. Kempf describes the platform as useful wherever the operator, compute, and physical action are in different places. Kyber began as a side project while he was CTO of cloud gaming startup Shadow, and its technical approach draws on video streaming as well as IoT optimization. TechCrunch reports that the company raised a $5 million round led by Lightspeed, has 25 full-time employees, offices in Paris, San Francisco, and Singapore, and is already in commercial deployment with customers in defense, telco, robotics, and AI.

Kyber is open source at the core and sells a productized enterprise version, with forward-deployed engineers handling custom deployments. The company is prioritizing robotics, drones, and remote IT access. The caveat in the piece is scale: companies have built similar systems for fleets of a few thousand vehicles, but Kempf argues that managing millions of devices requires a different infrastructure layer and much stronger observability, especially when AI agents rather than people are operating fleets.

TBD

Post of the Week

TMax: An open RL recipe for terminal agents

Nathan Lambert@natolambert

TMax: An open RL recipe for terminal agents I’m very excited to get to share a new RL paper today that I got to have a small part in – a type of paper I suspect we’ll see much more of in the future. The key is that RL research is very different today, in mid-2026, than what most

Hamish Ivison @hamishivi

Trained some terminal agents with friends! Introducing Tmax, open RL terminal agent models. Under default settings and shorter length (65k) token budgets, tmax outperforms prior open work on terminal use. We are releasing all data+weights+rollouts publically!

1:49 PM · Jun 22, 2026 · 124K Views

10 Replies · 69 Reposts · 618 Likes

Nathan Lambert shared a long X post about TMax, an open reinforcement-learning recipe for terminal agents. Lambert says the important shift is that RL research in mid-2026 is no longer mainly the early-2025 pattern of applying RLVR libraries to climb math benchmarks. Agent tasks require complex tool use, harnesses that manage model history, more infrastructure, and much more training to produce smaller evaluation gains. He describes TMax as open data and recipe lessons for hillclimbing Qwen 3.5 smaller dense models on frontier terminal tasks.

The post’s main contribution is the idea of “recipe work.” Lambert defines it as empirical work that documents the data, algorithm, codebase, pitfalls, infrastructure, and decision steps needed to make model improvements reproducible. He argues that meaningful RL experiments now resemble pretraining work from earlier years: expensive, fragile, infrastructure-dependent, and dependent on clear baselines. He says a standard TMax training job used 8 H100 nodes for 2 to 3 days, while establishing the recipe took on the order of 100 such jobs.

Lambert’s caveat is about incentives. He says academic gatekeepers often reward new algorithms more than clean empirical work that improves a recipe by 1% or 2%, even though the community needs stable recipes across agent types so new ideas can be tested more clearly. He points to frameworks such as SLIME and SkyRL as examples of a fast-changing tooling layer where more continuity would help.

A reminder for new readers. Each week, That Was The Week, includes a collection of selected essays on critical issues in tech, startups, and venture capital.

I choose the articles based on their interest to me. The selections often include viewpoints I can't entirely agree with. I include them if they make me think or add to my knowledge. Click on the headline, the contents section link, or the ‘Read More’ link at the bottom of each piece to go to the original.

I express my point of view in the editorial and the weekly video.