Token economy 01/19

Presentation deck

AI Coding Pricing in 2026

The end of the flat-fee era: compare coding tools by agent-session capacity, not monthly sticker price.

Yingting Huang · May 2026

The thesis 02/19

Flat-fee pricing broke when coding became agentic

One human prompt can now trigger dozens of model calls, repeated repository context, tool schemas, test logs, and retry loops.

Autocomplete was one cheap inference path.

Agents run multi-step loops that compound context cost.

The useful unit is no longer a message; it is a completed task.

Cost anatomy 03/19

A coding request becomes an economic loop

flowchart LR
  Prompt[user prompt] --> Loop[agent loop<br/>read - edit - test - retry]
  Loop --> OpenAI[OpenAI API<br/>direct token meter]
  Loop --> CodexPlan[ChatGPT Codex<br/>plan windows + credits]
  Loop --> Copilot[GitHub Copilot<br/>AI Credits bucket]
  Loop --> Claude[Claude Code<br/>rolling plan windows]
  OpenAI --> OCost[fresh + cached input<br/>output]
  CodexPlan --> XCost[message/task limits<br/>then add-on credits]
  Copilot --> CCost[token cost converted<br/>1 credit = 1 cent]
  Claude --> ACost[plan credits<br/>session and weekly caps]

Rendered live from the article's first Mermaid diagram.

The same workflow can be billed as tokens, credits, or rolling windows.
Repeated context and output length dominate long sessions.
Model routing changes the cost curve more than plan price alone.

Market map 04/19

Pricing guardrails moved to different layers

OpenAI / Codex

Raw token utility billing or plan windows with add-on credits.

Best for tool builders
Risk: costs scale directly

GitHub Copilot

Subscription plus AI Credits for premium and agentic work.

Best IDE integration
Risk: budget predictability

Claude Code

CLI subscription windows or direct API billing.

Best terminal workflow
Risk: interrupted sessions

Normalization 05/19

Translate plans into token-like units

Comparing sticker prices hides the actual unit of work. Normalize everything to agent-call capacity.

API billing: fresh input + cached input + cache writes + output.
GitHub Copilot: API-equivalent dollars converted into AI Credits.
Claude subscriptions: rolling internal credit-like windows.
A full task may use 10, 25, or 50+ model calls.
Reference call: 100K cached + 10K fresh + 2K output tokens.

China market lens 06/19

Chinese coding products expose more billing units

Pattern	Examples	Visible meter	Cost lesson
API tokens	DeepSeek / Kimi	input + cached + output	Transparent, router-friendly
Usage wallet	Trae IDE	plan balance	Can drain fast in agent loops
Credits	Tencent CodeBuddy	task + model weight	Copilot-like abstraction
Action caps	Tongyi / Trae free	turns or completions	Easy entry, hidden context cost
Per-seat	Alibaba / Tencent teams	user/month	Predictable enterprise budget

Tokens are the substrate; products expose wallets, credits, action limits, seats, or future value metrics.

Metering 07/19

A single agent call has four cost surfaces

flowchart TB
  T[one agent call] --> FI[fresh input tokens]
  T --> CI[cached input tokens]
  T --> CW[cache write tokens]
  T --> OUT[output tokens]
  FI --> R[provider rate card]
  CI --> R
  CW --> R
  OUT --> R
  R --> USD[API-equivalent dollars]
  USD --> GH[GitHub Copilot<br/>multiply by 100<br/>AI Credits]
  USD --> API[OpenAI or Anthropic API<br/>bill directly]
  USD --> ClaudePlan[Claude subscription<br/>compare with inferred<br/>plan credit bucket]

Live render of the article's token accounting diagram.

Cached context is cheaper, but not always free.
Output is often the expensive side of the bill.
Credits are a wrapper over provider-specific token economics.

OpenAI 08/19

Model routing is the OpenAI cost model

Model	Reference call	Calls per $10	Implication
GPT-5.5	$0.160	62	Reserve for hard reasoning
GPT-5.4	$0.080	125	Mid-tier agent work
GPT-5.3-Codex	$0.063	158	Coding default candidate
GPT-5.4 mini	$0.024	416	Cheap verifiable work

Reference call: 100K cached input + 10K fresh input + 2K output.

Codex plans 09/19

Codex through ChatGPT is plan-window economics

Included usage and purchased credits are not the same economic product.

Plus and Business expose five-hour local/cloud-task windows.
The public docs publish ranges, not stable monthly dollar quotas.
Purchased credits use a token-based Codex rate card.
Community dollar-equivalent conversions are useful but unofficial.
Fast mode improves latency but burns credits faster.

GitHub Copilot 10/19

AI Credits turn Copilot into a shared budget

Plan	Base	Flex	Total	Est. tasks/mo
Pro	1,000	500	1,500	9
Pro+	3,900	3,100	7,000	44
Max	10,000	10,000	20,000	126
Business	1,900	—	pooled	12/seat
Enterprise	3,900	—	pooled	24/seat

Same formula for every plan: total credits ÷ (6.3 credits/call × 25 calls/task).

Claude Code 11/19

Claude Code trades wallet risk for window risk

Subscription

Predictable spend with rolling limits.

Good monthly value
Can interrupt refactors

API billing

No artificial session lockout.

Better automation path
Costs spike in long loops

Team / Enterprise

More usage plus admin controls.

Useful governance
Higher baseline seat cost

Claude limits 12/19

Inferred Claude plan buckets are not linear

Plan	5h bucket	Weekly bucket	Opus equiv/mo	API value
Pro ($20)	550K	5.0M	32.5M in / 6.5M out	~$163 · 8.1×
Max 5x ($100)	3.3M	41.7M	270.8M in / 54.2M out	~$1,354 · 13.5×
Max 20x ($200)	11.0M	83.3M	541.7M in / 108.3M out	~$2,708 · 13.5×

Buckets from ShellaC; monthly = weekly × 52/12; Opus equiv = credits ÷ Opus rates; API value = Opus tokens × API price.

Caching 13/19

Claude's caching surprise changes repeated-context math

flowchart LR
  Context[100K repeated context] --> Cold[cold turn<br/>cache write]
  Context --> Warm[warm turn<br/>cache read]
  Cold --> SubCold[Claude plan<br/>100K input counts]
  Cold --> ApiCold[API<br/>100K write at 1.25x]
  Warm --> SubWarm[Claude plan<br/>cache read = 0 credits<br/>only new input + output]
  Warm --> ApiWarm[API or Copilot meter<br/>100K cache read at 10%]
  SubWarm --> Value[subscription value expands<br/>in repeated agent loops]
  ApiWarm --> MeterAnxiety[long loops still draw down<br/>credits or dollars]

Live render of the cold-cache and warm-cache examples.

Cold turns pay for cache writes.
Warm subscription turns may avoid charging cache reads.
API-style meters still price repeated context reads.

Total cost 14/19

The cheapest model can be the expensive choice

A cheap failed attempt can cost more than a strong first pass once retries, review time, and human cleanup enter the bill.

More iterations resend context and tool schemas.

Subtle code errors move cost into review and CI.

Senior-engineer cleanup dominates small token savings.

Operating model 15/19

Route by verification cost, not habit

flowchart LR
  Task[coding task] --> Verifiable{cheap to verify<br/>and cheap to retry?}
  Verifiable -->|yes| Small[start with small model]
  Verifiable -->|no| Strong[start with strong model]
  Small --> Pass{passed in 1-2 tries?}
  Pass -->|yes| Done[ship - cheapest path]
  Pass -->|no| Escalate[escalate to stronger model<br/>log the escalation]
  Escalate --> Strong
  Strong --> Review[human review]
  Review --> Done

Live render of the article's model-routing loop.

Start cheap only when verification is cheap.
Escalate after one or two failed attempts.
Log escalation reasons to tune policy.

Playbook 16/19

Five controls keep agentic coding costs sane

Make prompts cache-friendly with stable prefixes.
Prune context aggressively before each retry.
Route models by task risk and verification cost.
Put budgets, alerts, and kill switches near the workflow.
Standardize agent operating procedures.

Governance 17/19

Make cost visible without shaming developers

The goal is feedback loops, not guilt. Put cost signals where engineers already work.

Track per-user, per-team, model-level, and CI-linked usage.
Use soft warnings before hard limits.
Attribute background agents and automated code review separately.
Create a named reason for every model escalation.
Review failed cheap attempts, not just expensive successful ones.

Conclusion 18/19

AI coding is now priced like cloud computing

Choose plans by real agent-session capacity.
Cache static context and avoid file dumps.
Use the cheapest model whose mistakes are affordable.
Measure human cleanup as part of total cost.
Treat prompt and workflow design as FinOps controls.

Appendix 19/19

OpenAI reference-call assumptions

Token bucket	Amount	Why it matters
Cached input	100K	Repeated repository context
Fresh input	10K	New prompt, logs, diffs
Output	2K	Plans, patches, explanations
Session	25 calls	A medium agent task
Budget	$10	Comparison denominator

Use this slide only when the audience wants the math behind the model table.