Token economy 01/19

Presentation deck

AI Coding Pricing in 2026

The end of the flat-fee era: compare coding tools by agent-session capacity, not monthly sticker price.

Yingting Huang · May 2026

The thesis 02/19

Flat-fee pricing broke when coding became agentic

One human prompt can now trigger dozens of model calls, repeated repository context, tool schemas, test logs, and retry loops.

Autocomplete was one cheap inference path.

Agents run multi-step loops that compound context cost.

The useful unit is no longer a message; it is a completed task.

Cost anatomy 03/19

A coding request becomes an economic loop

flowchart LR
  Prompt[user prompt] --> Loop[agent loop<br/>read - edit - test - retry]
  Loop --> OpenAI[OpenAI API<br/>direct token meter]
  Loop --> CodexPlan[ChatGPT Codex<br/>plan windows + credits]
  Loop --> Copilot[GitHub Copilot<br/>AI Credits bucket]
  Loop --> Claude[Claude Code<br/>rolling plan windows]
  OpenAI --> OCost[fresh + cached input<br/>output]
  CodexPlan --> XCost[message/task limits<br/>then add-on credits]
  Copilot --> CCost[token cost converted<br/>1 credit = 1 cent]
  Claude --> ACost[plan credits<br/>session and weekly caps]
Rendered live from the article's first Mermaid diagram.
  • The same workflow can be billed as tokens, credits, or rolling windows.
  • Repeated context and output length dominate long sessions.
  • Model routing changes the cost curve more than plan price alone.
Market map 04/19

Pricing guardrails moved to different layers

OpenAI / Codex

Raw token utility billing or plan windows with add-on credits.

  • Best for tool builders
  • Risk: costs scale directly

GitHub Copilot

Subscription plus AI Credits for premium and agentic work.

  • Best IDE integration
  • Risk: budget predictability

Claude Code

CLI subscription windows or direct API billing.

  • Best terminal workflow
  • Risk: interrupted sessions
Normalization 05/19

Translate plans into token-like units

Comparing sticker prices hides the actual unit of work. Normalize everything to agent-call capacity.

  • API billing: fresh input + cached input + cache writes + output.
  • GitHub Copilot: API-equivalent dollars converted into AI Credits.
  • Claude subscriptions: rolling internal credit-like windows.
  • A full task may use 10, 25, or 50+ model calls.
  • Reference call: 100K cached + 10K fresh + 2K output tokens.
China market lens 06/19

Chinese coding products expose more billing units

PatternExamplesVisible meterCost lesson
API tokensDeepSeek / Kimiinput + cached + outputTransparent, router-friendly
Usage walletTrae IDEplan balanceCan drain fast in agent loops
CreditsTencent CodeBuddytask + model weightCopilot-like abstraction
Action capsTongyi / Trae freeturns or completionsEasy entry, hidden context cost
Per-seatAlibaba / Tencent teamsuser/monthPredictable enterprise budget

Tokens are the substrate; products expose wallets, credits, action limits, seats, or future value metrics.

Metering 07/19

A single agent call has four cost surfaces

flowchart TB
  T[one agent call] --> FI[fresh input tokens]
  T --> CI[cached input tokens]
  T --> CW[cache write tokens]
  T --> OUT[output tokens]
  FI --> R[provider rate card]
  CI --> R
  CW --> R
  OUT --> R
  R --> USD[API-equivalent dollars]
  USD --> GH[GitHub Copilot<br/>multiply by 100<br/>AI Credits]
  USD --> API[OpenAI or Anthropic API<br/>bill directly]
  USD --> ClaudePlan[Claude subscription<br/>compare with inferred<br/>plan credit bucket]
Live render of the article's token accounting diagram.
  • Cached context is cheaper, but not always free.
  • Output is often the expensive side of the bill.
  • Credits are a wrapper over provider-specific token economics.
OpenAI 08/19

Model routing is the OpenAI cost model

ModelReference callCalls per $10Implication
GPT-5.5$0.16062Reserve for hard reasoning
GPT-5.4$0.080125Mid-tier agent work
GPT-5.3-Codex$0.063158Coding default candidate
GPT-5.4 mini$0.024416Cheap verifiable work

Reference call: 100K cached input + 10K fresh input + 2K output.

Codex plans 09/19

Codex through ChatGPT is plan-window economics

Included usage and purchased credits are not the same economic product.

  • Plus and Business expose five-hour local/cloud-task windows.
  • The public docs publish ranges, not stable monthly dollar quotas.
  • Purchased credits use a token-based Codex rate card.
  • Community dollar-equivalent conversions are useful but unofficial.
  • Fast mode improves latency but burns credits faster.
GitHub Copilot 10/19

AI Credits turn Copilot into a shared budget

PlanBaseFlexTotalEst. tasks/mo
Pro1,0005001,5009
Pro+3,9003,1007,00044
Max10,00010,00020,000126
Business1,900pooled12/seat
Enterprise3,900pooled24/seat

Same formula for every plan: total credits ÷ (6.3 credits/call × 25 calls/task).

Claude Code 11/19

Claude Code trades wallet risk for window risk

Subscription

Predictable spend with rolling limits.

  • Good monthly value
  • Can interrupt refactors

API billing

No artificial session lockout.

  • Better automation path
  • Costs spike in long loops

Team / Enterprise

More usage plus admin controls.

  • Useful governance
  • Higher baseline seat cost
Claude limits 12/19

Inferred Claude plan buckets are not linear

Plan5h bucketWeekly bucketOpus equiv/moAPI value
Pro ($20)550K5.0M32.5M in / 6.5M out~$163 · 8.1×
Max 5x ($100)3.3M41.7M270.8M in / 54.2M out~$1,354 · 13.5×
Max 20x ($200)11.0M83.3M541.7M in / 108.3M out~$2,708 · 13.5×

Buckets from ShellaC; monthly = weekly × 52/12; Opus equiv = credits ÷ Opus rates; API value = Opus tokens × API price.

Caching 13/19

Claude's caching surprise changes repeated-context math

flowchart LR
  Context[100K repeated context] --> Cold[cold turn<br/>cache write]
  Context --> Warm[warm turn<br/>cache read]
  Cold --> SubCold[Claude plan<br/>100K input counts]
  Cold --> ApiCold[API<br/>100K write at 1.25x]
  Warm --> SubWarm[Claude plan<br/>cache read = 0 credits<br/>only new input + output]
  Warm --> ApiWarm[API or Copilot meter<br/>100K cache read at 10%]
  SubWarm --> Value[subscription value expands<br/>in repeated agent loops]
  ApiWarm --> MeterAnxiety[long loops still draw down<br/>credits or dollars]
Live render of the cold-cache and warm-cache examples.
  • Cold turns pay for cache writes.
  • Warm subscription turns may avoid charging cache reads.
  • API-style meters still price repeated context reads.
Total cost 14/19

The cheapest model can be the expensive choice

A cheap failed attempt can cost more than a strong first pass once retries, review time, and human cleanup enter the bill.

More iterations resend context and tool schemas.

Subtle code errors move cost into review and CI.

Senior-engineer cleanup dominates small token savings.

Operating model 15/19

Route by verification cost, not habit

flowchart LR
  Task[coding task] --> Verifiable{cheap to verify<br/>and cheap to retry?}
  Verifiable -->|yes| Small[start with small model]
  Verifiable -->|no| Strong[start with strong model]
  Small --> Pass{passed in 1-2 tries?}
  Pass -->|yes| Done[ship - cheapest path]
  Pass -->|no| Escalate[escalate to stronger model<br/>log the escalation]
  Escalate --> Strong
  Strong --> Review[human review]
  Review --> Done
Live render of the article's model-routing loop.
  • Start cheap only when verification is cheap.
  • Escalate after one or two failed attempts.
  • Log escalation reasons to tune policy.
Playbook 16/19

Five controls keep agentic coding costs sane

  1. Make prompts cache-friendly with stable prefixes.
  2. Prune context aggressively before each retry.
  3. Route models by task risk and verification cost.
  4. Put budgets, alerts, and kill switches near the workflow.
  5. Standardize agent operating procedures.
Governance 17/19

Make cost visible without shaming developers

The goal is feedback loops, not guilt. Put cost signals where engineers already work.

  • Track per-user, per-team, model-level, and CI-linked usage.
  • Use soft warnings before hard limits.
  • Attribute background agents and automated code review separately.
  • Create a named reason for every model escalation.
  • Review failed cheap attempts, not just expensive successful ones.
Conclusion 18/19

AI coding is now priced like cloud computing

  1. Choose plans by real agent-session capacity.
  2. Cache static context and avoid file dumps.
  3. Use the cheapest model whose mistakes are affordable.
  4. Measure human cleanup as part of total cost.
  5. Treat prompt and workflow design as FinOps controls.
Appendix 19/19

OpenAI reference-call assumptions

Token bucketAmountWhy it matters
Cached input100KRepeated repository context
Fresh input10KNew prompt, logs, diffs
Output2KPlans, patches, explanations
Session25 callsA medium agent task
Budget$10Comparison denominator

Use this slide only when the audience wants the math behind the model table.