The gold rush of artificial intelligence has a hidden tax. Across Silicon Valley, a quiet panic is spreading as startups confront the staggering operational costs of running large language models. Token anxiety, the gnawing fear of burning through venture capital on API calls, is reshaping the landscape for founders who bet their futures on AI. The cost of inference, the process of generating an answer from an AI model, has become the elephant in the room. A single query to a state-of-the-art system might seem cheap, but scale it to thousands of users and the bills spiral. For many, the economics no longer make sense. This is a story about the user experience of society, how a technology once hailed as a democratising force is now creating a divide between those who can afford to run it and those who cannot.
Consider the SaaS startups integrating chatbots. A typical conversational AI might cost $0.01 per query, but real-world usage involves context windows, multiple turns, and retrieval-augmented generation. Costs quickly exceed $0.10 per session. With user acquisition costs already high, margins vanish. The promise of AI as a utility evaporates when usage spikes. One founder I spoke with described the moment of dread when viewing the monthly cloud bill: a six-figure sum that wiped out two years of hard work. We built a product that users loved, they said, but we couldn't afford to give it away.
The root cause lies in the architecture of current AI. Transformers require immense compute. Moore’s Law is slowing, and while hardware like NVIDIA's H100s boost efficiency, demand outstrips supply. The cost per token has dropped, but not as fast as usage grows. For startups, this creates a zero-sum game: either raise more money or pivot to lighter models. Both options carry existential risks. The VCs who once funded AI-first companies are now demanding unit economics. Some are pulling back, fearing a market where only deep-pocketed giants like Google and OpenAI can compete.
Yet there is a nuanced story here. Not all startups are suffering. Those leveraging open-source models like Llama or Mistral are finding ways to optimise. By fine-tuning smaller models for specific tasks, they reduce token burn. Others are experimenting with edge computing, running inference on-device to avoid cloud costs. But these workarounds require technical expertise that early-stage companies often lack. The digital sovereignty of startups, their ability to control their own data and destiny, is compromised when the infrastructure is owned by a handful of providers.
Meanwhile, the problem is trickling down to users. Free tiers are shrinking. Pricing is shifting to consumption-based models that penalise heavy use. For a user, the experience becomes one of constant metering. The magic of AI is replaced by a transaction. This is the Black Mirror consequence we feared: a technology that was supposed to augment human potential becomes a luxury good.
The industry needs a reckoning. We are at a pivotal moment where the next wave of AI startups will be defined by their ability to manage token budgets. The winners will be those who reimagine the user experience of society, designing systems that are not just intelligent but cost-aware. Perhaps this means returning to simpler algorithms or embracing federated learning. Maybe it requires a shift in how we value compute: treating it as a scarce resource, not an infinite cloud.
For now, the anxiety is real. The gold rush is giving way to a reckoning. Silicon Valley must solve the cost problem before it solves the intelligence problem. Otherwise, the AI revolution will be a short-lived mirage, leaving only the big players standing. And that, my friends, is a future none of us should want.







