OpenAI / Codex
Raw token utility billing or plan windows with add-on credits.
- Best for tool builders
- Risk: costs scale directly
Presentation deck
The end of the flat-fee era: compare coding tools by agent-session capacity, not monthly sticker price.
Yingting Huang · May 2026
One human prompt can now trigger dozens of model calls, repeated repository context, tool schemas, test logs, and retry loops.
Autocomplete was one cheap inference path.
Agents run multi-step loops that compound context cost.
The useful unit is no longer a message; it is a completed task.
flowchart LR Prompt[user prompt] --> Loop[agent loop<br/>read - edit - test - retry] Loop --> OpenAI[OpenAI API<br/>direct token meter] Loop --> CodexPlan[ChatGPT Codex<br/>plan windows + credits] Loop --> Copilot[GitHub Copilot<br/>AI Credits bucket] Loop --> Claude[Claude Code<br/>rolling plan windows] OpenAI --> OCost[fresh + cached input<br/>output] CodexPlan --> XCost[message/task limits<br/>then add-on credits] Copilot --> CCost[token cost converted<br/>1 credit = 1 cent] Claude --> ACost[plan credits<br/>session and weekly caps]
Raw token utility billing or plan windows with add-on credits.
Subscription plus AI Credits for premium and agentic work.
CLI subscription windows or direct API billing.
Comparing sticker prices hides the actual unit of work. Normalize everything to agent-call capacity.
| Pattern | Examples | Visible meter | Cost lesson |
|---|---|---|---|
| API tokens | DeepSeek / Kimi | input + cached + output | Transparent, router-friendly |
| Usage wallet | Trae IDE | plan balance | Can drain fast in agent loops |
| Credits | Tencent CodeBuddy | task + model weight | Copilot-like abstraction |
| Action caps | Tongyi / Trae free | turns or completions | Easy entry, hidden context cost |
| Per-seat | Alibaba / Tencent teams | user/month | Predictable enterprise budget |
Tokens are the substrate; products expose wallets, credits, action limits, seats, or future value metrics.
flowchart TB T[one agent call] --> FI[fresh input tokens] T --> CI[cached input tokens] T --> CW[cache write tokens] T --> OUT[output tokens] FI --> R[provider rate card] CI --> R CW --> R OUT --> R R --> USD[API-equivalent dollars] USD --> GH[GitHub Copilot<br/>multiply by 100<br/>AI Credits] USD --> API[OpenAI or Anthropic API<br/>bill directly] USD --> ClaudePlan[Claude subscription<br/>compare with inferred<br/>plan credit bucket]
| Model | Reference call | Calls per $10 | Implication |
|---|---|---|---|
| GPT-5.5 | $0.160 | 62 | Reserve for hard reasoning |
| GPT-5.4 | $0.080 | 125 | Mid-tier agent work |
| GPT-5.3-Codex | $0.063 | 158 | Coding default candidate |
| GPT-5.4 mini | $0.024 | 416 | Cheap verifiable work |
Reference call: 100K cached input + 10K fresh input + 2K output.
Included usage and purchased credits are not the same economic product.
| Plan | Base | Flex | Total | Est. tasks/mo |
|---|---|---|---|---|
| Pro | 1,000 | 500 | 1,500 | 9 |
| Pro+ | 3,900 | 3,100 | 7,000 | 44 |
| Max | 10,000 | 10,000 | 20,000 | 126 |
| Business | 1,900 | — | pooled | 12/seat |
| Enterprise | 3,900 | — | pooled | 24/seat |
Same formula for every plan: total credits ÷ (6.3 credits/call × 25 calls/task).
Predictable spend with rolling limits.
No artificial session lockout.
More usage plus admin controls.
| Plan | 5h bucket | Weekly bucket | Opus equiv/mo | API value |
|---|---|---|---|---|
| Pro ($20) | 550K | 5.0M | 32.5M in / 6.5M out | ~$163 · 8.1× |
| Max 5x ($100) | 3.3M | 41.7M | 270.8M in / 54.2M out | ~$1,354 · 13.5× |
| Max 20x ($200) | 11.0M | 83.3M | 541.7M in / 108.3M out | ~$2,708 · 13.5× |
Buckets from ShellaC; monthly = weekly × 52/12; Opus equiv = credits ÷ Opus rates; API value = Opus tokens × API price.
flowchart LR Context[100K repeated context] --> Cold[cold turn<br/>cache write] Context --> Warm[warm turn<br/>cache read] Cold --> SubCold[Claude plan<br/>100K input counts] Cold --> ApiCold[API<br/>100K write at 1.25x] Warm --> SubWarm[Claude plan<br/>cache read = 0 credits<br/>only new input + output] Warm --> ApiWarm[API or Copilot meter<br/>100K cache read at 10%] SubWarm --> Value[subscription value expands<br/>in repeated agent loops] ApiWarm --> MeterAnxiety[long loops still draw down<br/>credits or dollars]
A cheap failed attempt can cost more than a strong first pass once retries, review time, and human cleanup enter the bill.
More iterations resend context and tool schemas.
Subtle code errors move cost into review and CI.
Senior-engineer cleanup dominates small token savings.
flowchart LR
Task[coding task] --> Verifiable{cheap to verify<br/>and cheap to retry?}
Verifiable -->|yes| Small[start with small model]
Verifiable -->|no| Strong[start with strong model]
Small --> Pass{passed in 1-2 tries?}
Pass -->|yes| Done[ship - cheapest path]
Pass -->|no| Escalate[escalate to stronger model<br/>log the escalation]
Escalate --> Strong
Strong --> Review[human review]
Review --> Done The goal is feedback loops, not guilt. Put cost signals where engineers already work.
| Token bucket | Amount | Why it matters |
|---|---|---|
| Cached input | 100K | Repeated repository context |
| Fresh input | 10K | New prompt, logs, diffs |
| Output | 2K | Plans, patches, explanations |
| Session | 25 calls | A medium agent task |
| Budget | $10 | Comparison denominator |
Use this slide only when the audience wants the math behind the model table.