The Era of Token Scarcity

Throughout 2023 and 2024, the AI industry sold you on the idea that you could have unlimited access to the most advanced models for a flat fee. It was a lie, of course — but it worked. Millions of developers got hooked on GitHub Copilot, Claude, and ChatGPT with $10, $20, or $30 monthly subscriptions. It felt like an all-you-can-eat buffet. The problem is that all-you-can-eat buffets always end when customers learn to eat a lot.

And the customers learned. Autonomous coding agents — Claude Code, Cursor, Copilot’s agents — discovered they could consume 10, 50, 100 times more tokens than a human chatting. And inference costs skyrocketed.

Within months, the entire industry did a 180-degree turn. GitHub Copilot, which for years had an unlimited flat subscription model, migrated on June 1, 2026 to a token-based credit system. They’re called “GitHub AI Credits”: 1 credit = $0.01 USD, and each model consumes a different amount per token. The $10 monthly Pro plan gives 1,500 credits; the $39 Pro+ gives 7,000; the $100 Max gives 20,000. Code completions remain unlimited, but everything else — chat, terminal, cloud agents — is metered and billed.

Anthropic had been doing the same thing for a while. It started migrating its enterprise Claude customers from a per-seat model to per-token pricing in November 2025. By April 2026, the shift was complete. As of June 15, 2026, Claude’s agent tools and third-party integrators are billed at full API rates. The subsidy is over.

The case that best illustrates the earthquake is Uber. In February 2026, 32% of its engineers were using Claude Code. In March, 84%. By April, nearly 95% were using AI tools monthly, and 70% of committed code was AI-generated. That sounds like a productivity success story — except the cost ate up the entire 2026 AI budget in just four months. CTO Praveen Neppalli Naga confirmed it to The Information. COO Andrew Macdonald was more blunt: in an interview with Fortune, he said the link between AI spending and visible user features “isn’t there yet.” Monthly costs per engineer ranged between $500 and $2,000. Power users were running 10 or more worktrees in parallel, each one burning through Claude tokens nonstop.

Microsoft also felt the impact. In May 2026, it started canceling Claude Code licenses for its Windows, Teams, Outlook, and Surface engineers, redirecting them to GitHub Copilot CLI. The reason wasn’t technical — it was cost. The token model made Anthropic’s tools significantly more expensive than Microsoft’s own. Access is cut off by June 30.

All of this is Jevons’ paradox applied to AI: as per-token prices drop — Anthropic reduced Opus from $15/$75 to $5/$25 per million tokens; Nvidia promises 50x improvements with its Vera Rubin platform — the volume of consumption grows faster than prices fall. The result: total bills go up even as unit prices go down. Jensen Huang, CEO of Nvidia, summed it up at GTC 2026 by describing data centers as “Token Factories.” They’re no longer file warehouses; they’re inference production plants.

The hit isn’t just for companies. OpenAI loses roughly $1.35 for every dollar it takes in. Its internal projections, reported by Yahoo Finance, anticipate $14 billion in losses for 2026 and a cumulative $44 billion between 2023 and 2028 before turning profitable in 2029. Deloitte predicts inference will account for two-thirds of all AI compute in 2026, up from one-third in 2023. Gartner projects $401 billion in AI infrastructure alone this year.

And yet, there’s an uncomfortable nuance. VentureBeat reports that GPU utilization in enterprises averages just 5%. Companies bought more hardware than they need, underutilize it, and now face 3-to-5 year depreciation cycles. The problem isn’t so much that tokens are scarce — it’s that the current consumption architecture is terribly inefficient and the market is migrating from a subsidized model to one that reflects real costs.

Why It Matters

The era of flat-rate unlimited subscriptions is over. For startups and individual developers, this means the cost of using AI tools is now variable and can scale unpredictably. For companies, it means AI adoption is no longer just a productivity decision — it’s a financial decision that requires budgeting, monitoring, and optimization.

The winners in this transition are infrastructure providers (Nvidia, cloud platforms), optimization platforms, and models that achieve efficient inference at scale. Everyone else — companies, developers, startups — will have to learn to live in a world where every token counts.

The free buffet is over. Welcome to the era of scarcity.

Main source: Uber burned through its entire 2026 AI budget in four months — Fortune