Why are AI inference prices falling so fast?

Performance-adjusted inference prices fell 1000x between 2021 and 2026. The decline accelerated to roughly 200x per year after January 2024. Two forces drive it: better algorithms make models more efficient, and open-weight releases from Meta, DeepSeek, and Moonshot create a permanent floor that frontier providers cannot lift.

Can OpenAI raise their API prices?

OpenAI can raise their own pricing, but they cannot raise the floor. Open-weight models like DeepSeek run on commodity hardware at $0.14 per million input tokens. Customers who need frontier capability will keep paying for OpenAI. Customers who need sufficient capability will migrate to the floor. There is no pricing committee for open weights, no revenue target, and no mechanism to recall the weights once they are released.

What companies built value above the bandwidth and storage commodity layers?

After the dotcom bust and dark fiber glut, the survivors built above the commodity layer. Google, Akamai, Netflix, and Amazon used cheap bandwidth and storage to build businesses that depended on the floor existing. The companies that captured value were not the ones operating the commodity layer but the ones already building above it before the floor arrived. Intelligence is on the same curve.

The cost of intelligence just hit zero

May 4, 2026 4 min read

The companies that build the layer value migrates through don’t usually see it coming. I was an executive at Exodus Communications from 1999 to 2001 (world’s largest web hosting provider, 46 data centers, $32 billion at peak). We sold rack space and connectivity to the companies running the internet. When our customers started going bankrupt, we went bankrupt too. When WorldCom and Global Crossing followed us, they left a glut of dark fiber that made bandwidth essentially free. The companies that survived were building above the commodity layer. We were the commodity layer.

Storage followed. Hard drive prices fell for decades until per-gigabyte cost stopped mattering, and the value shifted to what the storage enabled and the services extracting signal from it. Amazon S3 launched in 2006 not because storage had gotten cheap but because cheap storage made a different kind of business possible. In each case, the companies that captured value were not the ones that owned the commodity layer. They were the ones already building above it before the floor arrived. Intelligence is on the same curve.

In November 2021, when OpenAI put GPT-3 Davinci into commercial availability at $60 per million tokens, the question shaping what anyone would attempt to build was whether the inference bill could fit the budget. A product making a million calls a month cost $60,000 to run. That number sorted serious enterprise work from consumer experiments and made large categories of application economically implausible: not technically out of reach, just too expensive to justify.

By April 2026, DeepSeek matches GPT-3 Davinci’s MMLU performance at $0.14 per million input tokens. Gemini 2.5 Flash-Lite is even lower at $0.10. The million-call application that cost $60,000 in 2021 costs roughly a hundred dollars today. a16z tracked a thousand-fold performance-adjusted price decline in three years.

The floor is permanent because no company can raise it. Google can price Gemini competitively and reconsider next quarter. Meta, DeepSeek, and Moonshot have released their flagship models under permissive licenses. Anyone can run them, fine-tune them, or build a competing service on the same weights. There is no pricing committee for these models, no revenue targets, no terms of service preventing a competitor from deploying a direct substitute.

The weights are already in the world. They sit on Hugging Face, mirrored across every cloud provider, fine-tuned into tens of thousands of variants. That distribution does not get unwound. Any of the publishers could revoke their licenses tomorrow and the existing weights would still be running. Anyone who wants DeepSeek at $0.14 per million input tokens can get it. Anyone who wants to undercut that price on their own hardware can do that too.

The closest historical analog is Linux. In the late 1990s, Sun and SGI and IBM sold Unix workstations and servers at margins that paid for elaborate sales organizations. Linux was something else. A substrate. Free, modifiable, good enough that companies could build above it without paying the Unix tax. The proprietary Unix market got hollowed out from below. Sun was acquired for a fraction of its peak. SGI went bankrupt twice. IBM eventually bought Red Hat and conceded the substrate.

None of this means intelligence is cheap to produce. Each frontier model costs more to train than the last. Microsoft, Google, Meta, and Amazon have committed hundreds of billions to AI infrastructure through 2026. Stargate is a half-trillion-dollar bet. Two different things are happening at once. The cost of producing the next frontier model goes up. The marginal cost of serving the previous one goes to zero.

OpenAI and Anthropic still have moves. They can ship better models. They can offer reliability and integration the open-weight ecosystem cannot match. They cannot set a floor under their own pricing. The floor is set by whatever the cheapest sufficient model costs to serve, and hosts running open-weight models on commodity GPUs are discovering that number every week. That’s a permanent price anchor sitting under the entire market.

Inference consumption is growing. Agents and automated workflows are consuming tokens at a rate that may outpace price declines. If frontier supply gets constrained (physical limits on training runs, compliance overhead, data center buildout lagging demand), API prices could rise. The argument has merit on the demand side. It misses where the floor sits. OpenAI can raise prices. Those models cannot follow them. There is no mechanism for coordinating a floor across the open-weight models already in the wild: no license provision, no way to call them back. That asymmetry holds regardless of what happens at the frontier.

The open-weight models running on commodity hardware are the floor. At Exodus, we were the commodity layer. Google built above it. The next layer is forming above intelligence now.

Keep reading