The Token Tax

Written by Tom Simon | Mar 5, 2026 6:52:00 PM

Why AI Usage Is So Hard to Forecast (and How to Stop Getting Surprised)

AI was supposed to make work faster. And it is.

But somewhere between “Let’s spin up a quick workflow in Clay” and “Can HubSpot draft these sequences?” a new line item quietly took over your budget: tokens.

Not in the crypto way. In the “every prompt costs something and nobody prices it the same way” way.

If you’re using AI across tools like Lovable, Genspark, HubSpot, and Clay, you’ve probably felt the same tension: these platforms are powerful, but their token systems are not consistent. The result is a cost model that feels less like a subscription and more like a utility bill. Some months it is fine. Other months it spikes for reasons that are hard to explain to anyone holding the budget.

And the hardest part is this: token usage does not map cleanly to how businesses plan work.

Most organizations think in projects or in predictable monthly capacity. “This campaign will take X hours.” “This tool costs Y per month.” Tokens live in a different universe. They behave more like compute, where tiny changes in how you work can create meaningful swings in cost. A longer prompt, a heavier context window, a few extra iterations to get the output right, or an agent that takes multiple steps behind the scenes can all quietly compound.

That compounding is what makes token spend feel unpredictable. The work might look the same on the surface, but the token footprint underneath can vary wildly.

Why this gets messy fast in a multi platform stack

When AI lived in one place, you could at least squint at a dashboard and make sense of the month. But most teams are not using one AI tool. They are using a mix.

Maybe Lovable is helping generate early creative directions. Genspark is supporting research and drafting. HubSpot is pulling AI into email, content creation and CRM workflows. Clay is doing enrichment, segmentation, and automation that is part data, part inference. Clay is the only one that has relatively accurate tracking on credit burns, but a lot depends on what you’re asking for and where it’s pulling the source from, so it can vary from project to project.

Each platform has its own pricing logic. Some show tokens, some show credits, some abstract it into “messages” or “runs.” Some include usage in the plan until you hit a ceiling, then you fall into overages or upgrades that can get expensive quickly.

So even if you are trying to be responsible, you run into a basic visibility problem. It becomes difficult to answer simple questions like: how much AI did this project really consume? Was it worth it? And what should we budget for next month if we plan to do this again?

It is not that teams are careless. It is that the measurement is fragmented.

The project vs monthly limit trap

This is where the real frustration shows up.

Leadership approves a tool because it looks like a manageable monthly subscription. Teams adopt it because it saves time and improves output. Then a big initiative hits, a product launch, a website rebuild, a lead gen sprint, a new content engine, and usage spikes.

Now you are in the danger zone.

Either the platform throttles usage at the worst possible time or you end up buying add-ons that were not part of the original plan. Suddenly the AI line item is unpredictable, and the conversation shifts from “AI is helping” to “AI is expensive.”

The mismatch is structural. Projects create surges. Subscription limits assume consistency. Token pricing punishes spikes.

Why forecasting feels impossible even when you try

Token spend is hard to forecast because it is tied to behaviors that are not stable.

Output quality is rarely perfect on the first try, especially when the work matters. Good messaging takes iteration. Strong positioning takes rounds. A sales sequence that actually sounds like your brand takes feedback and refinement. Each one of those cycles burns more tokens.

On top of that, the best advice for getting higher quality output is to add context. Share examples, brand guidelines, audience notes, product detail, competitive context. That is smart. It also increases input tokens. So the very thing that improves quality can increase cost.

Then there are the background processes you do not always see. Some tools run multi-step workflows where one request triggers several model calls. That is not a flaw, it is a feature. It is also one more reason your usage can climb without anyone feeling like they “used AI more.”

So you end up with a weird reality: the team does not feel like they are doing anything differently, but the bill says otherwise.

What to do instead: manage tokens like a resource, not a surprise

The solution is not to tell teams to stop using AI. If it is helping, you want more of that, not less. The solution is to treat token usage like a real operational resource.

Start by separating two types of usage in your mind, and in your budget.

There is baseline usage, the everyday work that will happen whether you are launching something big or not. Then there is surge usage, the spikes tied to campaigns, launches, sprints, and one-time initiatives. When you plan these as one combined monthly pool, you will always feel like you are guessing. When you acknowledge that surges are normal, you can plan for them.

Practically, this looks like creating a simple internal system: an AI budget with an owner and a rhythm. Not a gatekeeper. Just someone responsible for watching usage trends and calling it out early when you are tracking toward a problem. A weekly check beats a month-end surprise every time.

It also helps to assign usage to initiatives. Not perfectly, not down to the penny, just enough to create accountability. If a lead enrichment push is going to run heavy in Clay this month, that is not “random overage.” That is a planned investment. Now you can compare cost to outcome, and you can decide whether it is worth repeating next month.

Reduce waste without making everyone feel policed

Once you have visibility, the next lever is efficiency. Not in a technical way. In a human way.

Most token waste comes from two places: starting with too much context and iterating too many times because the first output is vague.

The fix is surprisingly simple. Encourage teams to start smaller. Ask for a rough structure before asking for a polished deliverable. Get the outline, then feed back the brand nuance. Get the direction, then expand. When you front-load everything, you pay for it even if the direction is wrong.

The other easy win is to be intentional about output length. A lot of tools default to being helpful by being long. But longer is rarely better for real work. Shorter outputs are easier to review, easier to refine, and they cost less. You do not need to turn every prompt into an essay to get value.

And for repeatable tasks, build a few reusable prompt starters that already reflect your brand and your standards. Not a giant library. Just a handful that prevent teams from reinventing the wheel every time. Less wandering equals fewer retries. Fewer retries equals lower spend.

The bigger shift: connect tokens to outcomes

Tokens feel annoying when they are just a meter running in the background. They feel manageable when they are tied to impact.

If AI usage is helping you ship more content with the same team, improve close rates with stronger sequences, accelerate research and strategy, or reduce manual enrichment work, then token spend is not just a cost. It is leverage.

But leverage only stays leverage when it is measured and planned.

If you are already operating with a multi-tool AI stack, you are not behind. You are just early in a new phase of operations where usage-based pricing is part of the deal.

The companies that win here will not be the ones that avoid tokens. They will be the ones that build a lightweight system around them, one that keeps teams moving fast while keeping budgets predictable.

And that is the real goal. Not less AI. More AI, with fewer surprises.

View full post