Fitra Kacamarga
← Blog

Token-Maxing Is Not Waste. It Is How Builders Buy the Future.

aiagentsstartup

There is a phase in using AI coding agents where it looks like you are burning money.

The context window is full. Claude Code runs multiple times. Cursor regenerates again. An agent reads hundreds of files, writes a patch, fails, retries, refactors, and eventually produces something you still need to review.

From the outside, this looks wasteful.

I am increasingly convinced this is the wrong way to see it.

For founders, engineers, and builders working on things whose final shape is not yet obvious, token-maxing is not just an operational cost. Token-maxing is R&D investment.

You are not only buying output.

You are buying a map.

Token-maxing is discovery budget

The biggest mistake in calculating AI cost is treating tokens only as units of work.

If 10 million tokens produce one exploration sprint, people ask:

“Was that sprint worth it?”

The better question is:

“After those 10 million tokens, do we now understand a new way of working that was invisible before?”

Because in practice, token-maxing often produces things that do not immediately show up in the diff:

  • we discover architectures we were previously afraid to try
  • we test three to five approaches in one afternoon
  • we see failure modes earlier
  • we learn prompt, review, evaluation, and workflow patterns that can be reused
  • we build confidence to take on more ambitious scope
  • we absorb work that normally requires several people: engineer, analyst, QA, tech writer, researcher

The tokens that were “burned” are often not waste.

They are exploration cost.

And in software, exploration that expands the ambition frontier is often far more valuable than small efficiency gains on old tasks.

The old way of calculating AI ROI is too narrow

Many people calculate AI like this:

“I paid this many dollars for tokens. How many engineering hours did I get back?”

That is useful, but too narrow.

The biggest ROI from AI agents is not just replacing engineering hours. The biggest ROI appears when a small team can build something that previously did not make sense to attempt.

Without AI, founders often think:

“This needs a backend engineer, frontend engineer, data engineer, QA, DevOps, and a domain expert. Later.”

With AI agents, the question changes:

“What is the first version I can prove this week?”

That change in question is valuable.

Once the ambition frontier moves upward, the roadmap changes with it. A product that used to feel like a six-month project can become a two-week prototype. An experiment that used to require a small team can be handled by a founder plus an agentic workflow. Documentation, benchmarks, tests, demos, and internal tools can run in parallel.

This is where token-maxing becomes a long-term investment.

The Uber story is the tension in one headline

A recent Uber example captures the exact tension.

In April 2026, India Today reported, citing The Information, that Uber CTO Praveen Neppalli Naga said the company had already exceeded its AI spending assumptions because adoption of AI coding tools such as Claude Code had grown faster than expected. His quote is the kind of line that makes token-maxing look irresponsible:

“I’m back to the drawing board, because the budget I thought I would need is blown away already.”

Other summaries of the same story framed it even more dramatically: Uber had burned through its full-year AI budget in roughly four months, driven by Claude Code and Cursor usage. SmarterX described the episode as part of a larger enterprise problem: AI agents are arriving faster than budgets, governance, permissions, and architecture can absorb.

At first glance, this sounds like waste.

But the more interesting read is not “Uber wasted money on AI coding.” The more interesting read is that Uber discovered a budgeting model mismatch.

AI coding agents do not behave like ordinary SaaS seats. A per-seat tool can be forecast from headcount. A token-consuming agent workflow compounds with usage intensity: longer contexts, parallel agents, repeated retries, test generation, code review, and background tasks. Once a large engineering organization actually finds the tools useful, spend can rise much faster than the original plan.

That does not automatically make the spend good. It also does not automatically make it bad.

It means the unit of planning changed.

The lesson from Uber is not “stop using AI coding tools.” It is: if token-maxing is becoming a serious workflow, finance, engineering, and product leadership need a new operating model. Track outcomes, not just tokens. Separate waste from exploration. Put observability around loops and context decay. Budget for learning, not only for seats.

This is exactly why I prefer to call it R&D budget rather than AI usage cost.

A simple model: 10 million tokens as learning sprint

Sometimes the right unit is not one prompt, one PR, or one session. It is a 10-million-token learning sprint - enough to load a full codebase context, run multiple agent passes, generate tests, write docs, retry failed approaches, and produce a reviewable artifact.

Let us use rough numbers.

Two to four such sprints per week means roughly:

  • 80-160 million tokens per month, or about 5 million per day during intensive phases
  • around US$750-3,000 per month depending on model mix and subscription vs API
  • around Rp13-53 million per month

At 150 million tokens per month, token-maxing stops looking like casual AI usage and starts looking like an R&D line item.

In Indonesia, Rp13-53 million per month is comparable to 0.5-2 fully-loaded engineers, depending on seniority and pricing model.

So the bar cannot be:

“Did this token spend generate more code?”

That is too narrow. The better bar is:

“Did this spending reduce our uncertainty faster than normal working methods?”

Because proper token-maxing does not buy lines of code. It buys learning velocity.

With 150 million tokens per month, the outcomes worth seeking are not just patches, but:

  • discovery cycles that normally take 4 weeks compressed to 1-2 weeks
  • 1-2 wrong implementation paths eliminated before they become expensive
  • prototypes real enough to test with users or prospective customers
  • test suites, evals, benchmarks, and harness rules that can be reused
  • architecture notes and decision memos that clarify tradeoffs
  • founders knowing faster whether a product direction is worth pursuing
  • hiring delayed until the problem shape is clearer, not because AI replaces humans permanently

This is the critical difference.

If tokens only produce untested code, unused documents, or agent loops that never changed a decision, that is waste.

But if tokens produce clarity, artifacts, and sharper decisions, the spending starts to look like R&D.

Three ROI scenarios in Indonesia

This is not final accounting. It is a mental model for seeing the scale of impact.

Conservative case

Say token-maxing costs around Rp13-18 million per month.

The realistic outcome is not “replacing two engineers.” The more defensible outcome is:

  • avoiding one wrong build path
  • accelerating one small prototype
  • producing initial tests and evals
  • helping a founder decide whether a feature is worth pursuing

If one wrong decision typically costs 2 weeks of engineering time, then avoiding a single false start can already justify the token cost.

This is not “millions of dollars” yet. But it is already huge for an early-stage Indonesian startup.

Base case

Say the cost is around Rp26-36 million per month.

At this level, token-maxing should produce more than ad-hoc output. It should produce a working system:

  • reusable prompts
  • regression tests
  • benchmark scenarios
  • architecture notes
  • internal tools
  • review checklists
  • deployment or validation workflows

The defensible outcome: one discovery cycle that normally takes a month compressed to 1-2 weeks, and one specialist hire delayed until scope is genuinely clear.

Not because AI replaces that hire forever, but because we are no longer hiring in the fog.

Here, token-maxing starts to look like serious leverage. Not because tokens are cheap, but because uncertainty is expensive, and tokens reduce it.

Ambitious case

Say the cost approaches or exceeds Rp53 million per month.

At this point, token-maxing should be treated like a real R&D budget. There should be clear exploration targets:

  • a new product wedge
  • a faster pilot
  • a claim review workflow that can be tested
  • an eval harness that makes quality measurable
  • a revenue hypothesis that can be validated earlier

If spending this much only produces more raw output, that is waste.

But if it accelerates revenue validation, opens a new product direction, or prevents the team from building the wrong system for 2-3 months, the return can far exceed the token cost.

This is where token-maxing can enter million-dollar territory.

Bottom line

Token-maxing only pays off when the output is not just code.

It has to reduce uncertainty.

It has to produce reusable artifacts.

It has to make the next decision sharper.

At 150 million tokens per month, we are no longer talking about casual AI usage. We are talking about a deliberate R&D budget.

And like any R&D budget, the question is not whether every experiment succeeds.

The question is whether the portfolio of experiments moves the company toward a future it could not reach otherwise.

For individuals: US$200/month is not an AI splurge

This argument is not only for companies.

For individuals - engineers, founders, operators, analysts, researchers, or anyone trying to level up in the AI era - a maximum-tier AI subscription, for example US$200 per month, should also be seen as a long-term tactical tool.

In Indonesia, US$200 per month is roughly Rp3.5 million per month. In one year, around Rp42 million.

That sounds expensive compared with normal consumer subscriptions. But compared with career investment, the number starts to make sense.

An engineer earning Rp20-30 million per month makes Rp240-360 million per year. An AI subscription costing Rp42 million per year is about 12-18% of annual income to buy daily leverage.

The question is not:

“Is US$200 expensive?”

The question is:

“Can US$200/month raise my skill, output, and ambition level faster than the alternatives?”

If used properly, the answer is often yes.

Because a maximum-tier AI subscription is not just a tool for answering questions. It can become:

  • a coding partner for reading codebases and producing patches
  • a private tutor for learning new frameworks
  • a reviewer for writing, proposals, and technical decisions
  • a research assistant for papers, docs, and competitor landscapes
  • a product thinking partner for turning ideas into experiments
  • a QA assistant for test cases and edge-case checklists
  • a tactical career coach for portfolios, CVs, interview prep, and technical communication

For individuals, token-maxing is a way to buy compound learning.

If a US$200/month subscription helps someone increase their salary from Rp25 million to Rp35 million per month, the additional income is Rp10 million/month, or Rp120 million/year.

The AI cost is Rp42 million/year.

Rough net benefit: Rp78 million/year.

That does not include side projects, consulting, or the ability to build your own product. Even one small micro-SaaS that makes US$1,000 MRR becomes around US$12,000 ARR - roughly 5x the annual AI subscription cost of US$2,400.

But there is an important caveat: an expensive subscription does not automatically make someone more productive.

US$200/month only becomes an investment if it is actively used to build assets:

  • repositories you can show
  • technical writing that strengthens your personal brand
  • automation for daily work
  • reusable templates and workflows
  • deeper domain understanding
  • new skills that are actually practiced

If it is only used for random chatting, article summaries, or one-off generated answers, then yes, it can be waste.

But if it is used as a daily gym for thinking, building, writing, evaluating, and shipping, US$200/month is a tactical tool for a long-term strategy.

At that point, token-maxing is not a lifestyle.

It is a career strategy.

The compound effect is confidence

The biggest saving from AI does not always show up in this month’s P&L.

The biggest saving appears when the team becomes more courageous.

Before AI agents, many ideas die inside the founder’s head because they feel too expensive:

  • “We do not have a data engineer yet.”
  • “Backend bandwidth is not available.”
  • “Nobody is handling QA.”
  • “Maybe after we hire.”
  • “This is too complex for now.”

After a few months of token-maxing, the thinking changes:

  • “We can prototype it first.”
  • “We can ask the agent to read the codebase and propose a patch.”
  • “We can generate the test harness.”
  • “We can build an internal validation tool.”
  • “We can explore a new workflow without waiting for headcount.”

This confidence compounds.

One internal tool makes the next experiment faster. One benchmark makes the next refactor safer. One agent workflow makes the team brave enough to take on a more complex use case. One successful product wedge opens a new revenue stream.

Viewed per session, token-maxing looks messy.

Viewed over a year, token-maxing looks like an organization building a new muscle.

Case study: ClaimMind

ClaimMind is a useful example because the problem is not simply “use AI to answer claim questions”.

The real problem is deeper: how do you build workflow intelligence for claim review that is evidence-grounded, policy-bound, and auditable?

In the context of BPJS and hospital claim operations, AI cannot merely produce an answer that sounds correct. The system must be able to:

  • read claim documents and metadata
  • check evidence completeness
  • understand ICD, procedure, and tariff paths
  • detect conflicts with policy
  • choose the right branch: auto-pass, clarification, escalate, reject/block
  • preserve an audit trail
  • explain why a decision was made

This is not a chatbot.

This is a workflow system.

And to build a system like this, the token-maxing phase matters.

At the beginning, we do not know every branch. We do not know which edge cases appear most often. We do not know the right harness shape. We do not know which prompts are robust under user pressure. We do not know which parts should be verified by a rule engine, which parts can be assisted by an LLM, and which parts must remain human-in-the-loop.

Token-maxing lets ClaimMind explore that possibility space faster:

  • agents can help map BPJS exception workflow branches
  • agents can generate adversarial scenarios: override pressure, shortcut requests, misleading summaries, state confusion
  • agents can create a minimum audit schema
  • agents can propose scoring axes: branch accuracy, evidence sufficiency, policy compliance, escalation precision
  • agents can write test scenarios and regression suites
  • agents can help design a harness so the AI does not merely “answer”, but follows a policy-bound workflow

If all of this is done manually by a small team, you need several roles at once: domain analyst, product engineer, backend engineer, QA, technical writer, and applied AI engineer.

With token-maxing, those roles do not disappear. But the exploration phase compresses. The founder can see the system map faster, make sharper decisions, and hire after the shape of the problem becomes clearer.

That is the difference between burning tokens and buying clarity.

Observability matters, but do not become cheap with tokens

Interestingly, a recent 30-day scan showed that token-maxing is already real enough to create dedicated observability tooling.

Projects like tokscale, codeburn, claude-usage, and token-optimizer help teams see token usage, cost, session history, ghost tokens, and context decay.

This is an important signal.

Whenever a new behavior creates observability tooling, it usually means the behavior has become a serious workflow. People do not build dashboards for things that do not matter.

But observability is not an excuse to become cheap with tokens.

Observability should help us distinguish two things:

  1. Waste - tokens consumed because of loops, poor prompts, broken context, or unclear task definition.
  2. Investment - tokens used for exploration, building, testing, learning, and creating reusable workflows.

The first should be reduced.

The second should be managed like an R&D portfolio.

Good token-maxing has discipline

I am not saying every use of tokens is automatically good.

Token-maxing without discipline can absolutely become waste. If an agent loops without acceptance criteria, reads irrelevant files, or keeps generating without tests, that is not investment. That is noise.

Healthy token-maxing has a few rules:

  1. There is a learning objective. We know what we are trying to discover: architecture, failure mode, benchmark, product wedge, or implementation path.
  2. There is an artifact. A large session should produce something reusable: a document, patch, test, eval, prompt pattern, harness rule, or decision memo.
  3. There is a verification gate. AI output must be tested, reviewed, or at least checked against a rubric. Without verification, tokens only generate false confidence.
  4. There is compaction discipline. Large context must be managed. Once context decays, the agent can become more expensive and worse at the same time.
  5. There is a lightweight postmortem. After an expensive session, ask: what did we learn, what is reusable, and what should we avoid tomorrow?

With this discipline, token-maxing is not “high AI usage”.

It becomes a structured learning system.

Do not optimize too early

In the early phase of AI-native building, I am more afraid of founders saving tokens too early than using too many tokens.

Premature optimization in the AI era can take a new form:

“Do not use too much context.” “Do not run the agent too often.” “Do not try that approach, it is expensive.” “Do not explore too far.”

But in the early phase, our job is precisely to discover what is possible.

Correct token-maxing does not mean spending without limits. Correct token-maxing means being willing to buy a faster learning loop, then converting that learning into software, workflow, revenue streams, and organizational leverage.

For a company like ClaimMind, this can be the difference between building a generic claim chatbot and building infrastructure for auditable claim intelligence.

One saves cost.

The other opens a new category.

And sometimes, 10 million tokens are not a cost.

They are a down payment on a more ambitious future.

References

Build ambitious products with Claude Code workflow engineering

If this essay resonates, I am opening a class on Claude Code workflow engineering.

The class is for builders who want to use Claude Code as more than a coding assistant. We will learn how to turn AI coding agents into a practical operating system for building ambitious products: shaping tasks, managing context, designing agent workflows, reviewing output, creating verification loops, and using token-maxing as disciplined R&D instead of random prompting.

The goal is simple: move from idea to working product faster, with better judgment and less wasted exploration.

Join the class