Why is OpenAI Codex draining quota so fast?
OpenAI Codex is draining quota faster than expected because cloud tasks and local tasks now share the same weekly quota pool - a silent change made in late 2025 with no announcement. This is compounded by a confirmed scaling bug in v0.40.0 where the 5-hour and weekly limits don’t increase proportionally, and an undocumented API key routing issue that can bill organisation API credits instead of your ChatGPT subscription.
After tracking quota consumption across my own Pro account and correlating it with reports from GitHub issue #13186, Reddit, and the OpenAI community forums, the pattern is clear: a single cloud task that spins up a sandboxed VM and loads a 192K-token context window can silently consume the same budget as dozens of local CLI prompts. There is no pre-task estimate, no confirmation prompt, and no way to know the cost until your dashboard reads 0%.
We’re calling the worst part of this Phantom Quota - quota consumed by tasks that hang, time out, or fail without returning any usable output. You pay for the tokens. You get nothing back. There’s no reclaim mechanism.
TL;DR: OpenAI Codex Quota Drain
- The Silent Change: Cloud tasks no longer have their own quota pool - they now share the same 5-hour and weekly budget as local tasks. No announcement was made.
- The Scaling Bug: In v0.40.0, the 5-hour limit and weekly limit don’t scale proportionally. OpenAI acknowledged this as a bug.
- The API Key Trap: Codex CLI may silently route requests through your org API key, billing your paid API credits at pay-as-you-go rates instead of your subscription.
- Phantom Quota: Hung or failed tasks still consume your token allocation with no reclaim mechanism.
- The Response: OpenAI’s collaborator-badged engineer blamed users: “It’s likely in how you’re using Codex.” The comment earned a wall of downvotes.
- The Fix: Run
/statusto check limits, prefer local tasks, unsetOPENAI_API_KEYfor org accounts, usecodex-mini-latestfor routine work, and use the Web UI as a fallback, Pro users can use spark model.
Timeline: How the Codex Quota Crisis Unfolded
Understanding the sequence matters - this wasn’t a single event, but a slow erosion of limits with no communication:
- Pre-November 2025: Cloud tasks have a separate, generous quota pool. OpenAI pricing page advertises “generous limits for a limited time.”
- November 2025: “Generous limits” language silently removed from the pricing page. Cloud and local tasks merged into a shared quota pool. No announcement, no changelog.
- Late 2025 (v0.40.0): Users confirm the 5-hour and weekly limits don’t scale proportionally - a 2% increase in the 5-hour window yields only 1% weekly.
- December 2025: OpenAI acknowledges the scaling bug and provides credits to affected cloud task users.
- February 2026: GPT-5.3-Codex released. Codex macOS app launched. Quota structure remains unchanged, but 2x for limited time.
- March 2026: Core quota caps, Phantom Quota, and transparency gaps remain unaddressed.GitHub issue #13186 opened.
etraut-openairesponds: “It’s likely in how you’re using Codex.” Mass downvotes follow.
The quota math that doesn’t add up
Users on both Plus ($20/month) and Pro ($200/month) plans are hitting their weekly limits after as little as two 30-minute coding sessions:
| Report | Plan | What Happened |
|---|---|---|
| 2-minute task, 70 lines of code | Plus | Consumed 23% of the 5-hour limit in a single request |
| $40 in extra credits | Pro | Disappeared within a handful of prompts |
| 256K tokens | Pro | Burns 1% of weekly limit - implying a cap of ~2.5M tokens/week |
| Hung sessions (Phantom Quota) | All plans | Sessions that return no results still consume quota with no recourse |
What a Pro plan actually costs per session
Based on the 256K tokens = 1% weekly limit reports, a Pro plan user has a theoretical weekly cap of approximately 2.5M tokens:
- Average cloud task: 150K-300K tokens (context window loading + generated output)
- Tasks before weekly limit: Approximately 8-16 cloud tasks per week
- Tasks per session (typical): 3-5 cloud tasks
- Sessions before exhaustion: 2-5 sessions per week
- Effective cost per session: $10-$25 (on a $200/month subscription)
For context, OpenAI’s pay-as-you-go API pricing for codex-mini-latest is $1.50 per million input tokens and $6.00 per million output tokens. At 2.5M tokens/week, the raw API cost would be roughly $3.75-$15/week depending on input/output ratio. A $200/month Pro subscription is only competitive if the quota is generous enough to justify the premium - and at 2-5 sessions per week, it isn’t. If you’re looking for ways to maximize AI coding output without burning through quota, we broke down a $0 AI coding agency workflow that sidesteps subscription limits entirely.
Why cloud tasks consume so much more
Cloud tasks burn through quota faster than local tasks for three structural reasons:
- Sandbox VM overhead: Each cloud task spins up an isolated container with its own filesystem, adding system-prompt tokens and environment context to every request.
- Larger context windows: Cloud tasks load up to 192K tokens of input context (your codebase, instructions, file contents) before generating any output, compared to local tasks which operate on smaller, focused contexts.
- Token metering is opaque: OpenAI measures consumption via input tokens (TPM) and output tokens, but the CLI doesn’t show per-task token counts. You can’t see how much a cloud task will cost before it runs.
OpenAI’s response: “It’s probably you”
When dozens of developers flooded GitHub issue #13186 and related threads with reproducible reports, an OpenAI collaborator - etraut-openai - responded with this:
“We have not changed anything on the server side related to usage accounting or metering. If you’re noticing a change in usage, it’s likely in how you’re using Codex. If you suspect this might be a client-side regression, you could try downgrading to an earlier version of the CLI.”
The response earned widespread downvotes - and for good reason:
1. The pattern doesn’t match isolated user error. Multiple independent users across different accounts, plans, operating systems, and use cases reported the exact same behavior simultaneously. Isolated user error doesn’t produce a synchronized flood across GitHub, Reddit, and the OpenAI forums.
2. Phantom Quota is a platform problem, not a user problem. When a task hangs indefinitely without returning results and still deducts from your token allocation, that’s a metering issue. Users have no mechanism to cancel a hung task and reclaim the consumed tokens.
3. “Try downgrading” is a deflection, not a fix. Telling paying customers to roll back their CLI version without any further investigation shifts the burden of diagnosis onto the people experiencing the problem. No logs were requested, no reproduction steps gathered, no escalation offered.
When a Collaborator-badged engineer responds to a mass-reported issue with “it’s likely how you’re using it” and gets a wall of downvotes, that’s not a communication gap. That’s a trust problem.
When did OpenAI change Codex cloud task quota?
OpenAI merged Codex cloud task quota into the shared local task pool in late 2025, removing the “generous limits for a limited time” language from its pricing page without any announcement, changelog, or in-product notification. Previously, cloud tasks had a separate, larger quota allocation. Now, a single cloud task consumes the same weekly budget as dozens of local CLI prompts.
| Aspect | Before (Pre-Late 2025) | After (Current) |
|---|---|---|
| Cloud task quota | Separate, generous pool | Shared with local tasks |
| Local task quota | Own pool | Shared with cloud tasks |
| Pricing page language | ”Generous limits for a limited time” | Language removed |
| User notification | N/A | None - no announcement, no changelog, no UI banner |
| Pre-task cost estimate | Not available | Still not available |
| Token usage visibility | Not available | Only via /status command and dashboard - no per-task breakdown |
This is the equivalent of a cloud provider changing your instance pricing overnight and expecting you to discover it through your next invoice. AWS, GCP, and Azure all publish pricing changes with advance notice and deprecation timelines. OpenAI updated a web page.
Why don’t the 5-hour and weekly Codex limits scale equally?
There’s a compounding issue. In Codex CLI version 0.40.0, users confirmed that the 5-hour rolling limit and the weekly limit don’t scale proportionally:
- When the 5-hour limit increases by 2%, the weekly limit only goes up by 1%
- This means heavy usage within a 5-hour window disproportionately impacts your weekly ceiling
- The mismatch creates a situation where burst usage is penalized more heavily than spread-out usage
OpenAI acknowledged this as a bug they were actively working on. However, even with the scaling fixed, the core quota caps remain too tight for professional development workflows.
The OpenAI community forum has users explicitly calling for OpenAI to revert quota to the previous specification, noting Codex “hits the limit almost immediately under normal use and is effectively unusable.” Multiple users have reported canceling their Pro subscriptions or evaluating alternatives directly in those threads.
Does Codex CLI bill my API key instead of my subscription?
Yes. Codex CLI can silently route requests through your organisation’s OPENAI_API_KEY environment variable instead of your ChatGPT Plus or Pro subscription. This means API credits are consumed at pay-as-you-go rates ($1.50-$6.00 per million tokens for codex-mini-latest) even though you’re paying $20-$200/month for a subscription that should cover the usage.
This affects users who have both a ChatGPT subscription and an organisation API key set in their environment (via .env files, shell profiles, or system variables). The CLI picks up the OPENAI_API_KEY and uses it for authentication, bypassing your subscription quota entirely.
Fix: Unset OPENAI_API_KEY to stop double billing
Unset your API key before launching Codex to force authentication through your ChatGPT subscription:
# macOS / Linux
unset OPENAI_API_KEY
codex
# Windows CMD
cmd /C "set "OPENAI_API_KEY=" && codex"
# Windows PowerShell
$env:OPENAI_API_KEY = ""; codex
To make this permanent, check for OPENAI_API_KEY in your ~/.bashrc, ~/.zshrc, ~/.bash_profile, and any project-level .env files. If you need the API key for other tools, create a wrapper script that unsets it before launching Codex:
# ~/bin/codex-sub (use subscription auth)
#!/bin/bash
unset OPENAI_API_KEY
codex "$@"
How do I stretch my OpenAI Codex quota?
Until OpenAI addresses quota transparency and fixes the scaling mismatch, here’s how to stretch your limits. (For a broader look at how security and trust compare across AI coding tools, see our dedicated breakdown.)
| Workaround | Why It Helps | How to Do It |
|---|---|---|
| Run /status to check limits | Shows current rate limits and session info before you run expensive tasks | Type /status in the Codex CLI. Cross-reference with your usage dashboard for weekly totals. |
| Prefer local tasks over cloud tasks | Cloud tasks spin up sandbox VMs and load larger context windows, consuming 5-10x more tokens | Reserve cloud tasks for complex, multi-file work only. Use local mode for single-file edits and quick fixes. |
| Use codex-mini-latest (—model flag) | Offers up to 4x higher usage limits for local messages compared to the full model | Run codex —model codex-mini-latest or select it via the /model command in the CLI. Reserve GPT-5.3-Codex for complex architecture work. |
| Use the Codex Web UI + GitHub fallback | When CLI quota is exhausted, the web UI can continue working | Connect your repo to the Codex web UI to work in sandboxed environments and push PRs back to your local branch. |
| Consolidate prompts | Reduces per-request overhead and context re-loading tokens | Instead of multiple small back-and-forth prompts, batch context into one well-scoped task with clear requirements. |
| Monitor proactively via API | Prevents surprise walls when you’re mid-session | Check /status in the CLI, the usage dashboard, or use the OpenAI Usage API for programmatic monitoring. Check x-ratelimit-remaining-requests headers in API responses. |
| Unset OPENAI_API_KEY | Stops silent billing to your org API credits at pay-as-you-go rates | Run unset OPENAI_API_KEY before launching Codex if you have an organisation account. |
| Consider the Batch API for async work | 50% cost savings on input and output tokens for non-real-time tasks | If you’re running batch processing or large refactors, the Batch API processes tasks asynchronously at half the token cost. |
The Web UI fallback workflow
When your CLI quota is exhausted, you don’t have to stop working entirely:
- Push your current branch to GitHub
- Open the Codex Web UI and connect your repository
- Create tasks in the sandboxed web environment
- Review and merge the resulting PRs back to your local branch
- Pull the changes and continue development
This adds friction and breaks your flow - but it’s better than staring at a “quota exceeded” wall for the rest of the week. If your project uses TypeScript, our TypeScript patterns guide can help you write cleaner prompts that generate better code on the first pass - reducing the back-and-forth that eats quota.
What OpenAI should actually do
The developer community isn’t asking for unlimited usage. They’re asking for basic transparency that every mature developer platform already provides:
| Feature | Status | Industry Standard | Why It Matters |
|---|---|---|---|
| Pre-task cost estimate | ❌ Not available | AWS, GCP, Azure all provide this | Users can’t make informed decisions about which tasks to run |
| Real-time consumption warning | ❌ Not available | Standard in cloud billing | A task consuming >10% of your limit should ask for confirmation |
| Clear reset time display | ❌ Not in CLI | Standard in rate-limited APIs | Users can’t plan their work around reset windows |
| Per-task token breakdown | ❌ Not available | Standard in LLM APIs (Anthropic, Google) | Users can’t identify which tasks consumed the most quota |
| Policy change announcements | ❌ Silent changes | Changelogs, blog posts, deprecation notices | Users discover changes through Reddit, not official channels |
| Phantom Quota recovery | ❌ Not available | Common in cloud compute billing | Users pay for compute that delivered no output |
The fact that developers are discovering policy changes through Reddit threads and GitHub issue floods - not official announcements - says a lot about how OpenAI views its developer community right now.
Alternatives worth considering
If the Codex quota situation is pushing you toward alternatives, here’s how competing AI coding tools compare on transparency and limits:
| Tool | Pricing | Quota Transparency | Best For |
|---|---|---|---|
| Claude Code (Anthropic) | $20/mo (Max plan) | Token-based, visible limits, per-message token counts | Terminal workflows, backend |
| GitHub Copilot | $19-39/mo | Unlimited completions, premium request counters | Enterprise teams, compliance |
| Cursor | $20/mo | Request-based, visible counter in status bar | Full-stack, codebase-aware |
| Windsurf | $15/mo | Credit-based, dashboard visible | Agentic workflows |
Need a detailed comparison? Read our comprehensive AI-Augmented Development Guide for in-depth analysis of Cursor, Windsurf, Claude Code, and more - including quota models and billing transparency. For a security-focused comparison, see Antigravity vs Cursor, Kiro & Windsurf.
The bottom line
OpenAI Codex’s quota drain is a combination of a silent policy change, a confirmed scaling bug, an undocumented API key routing issue for org users, Phantom Quota from hung tasks, and - most frustratingly - an official response that deflects blame onto the users experiencing it.
Pro subscribers publicly stating they’ve canceled, GitHub issues with hundreds of reproductions, and forum threads calling for quota rollbacks aren’t isolated complaints. They’re a pattern. OpenAI has provided credits as a short-term bandage, but credits don’t fix the underlying issues: no pre-task visibility, no per-task token breakdown, no Phantom Quota recovery, and no honest communication when policies change.
What to do right now:
- Run
/statusin the CLI to check your current rate limits before running expensive tasks - Prefer local tasks and use
codex-mini-latestvia the--modelflag for routine work (up to 4x higher limits) - Unset
OPENAI_API_KEYif you have an org account - stop paying twice - Use the Web UI + GitHub fallback when CLI quota is exhausted
- Consolidate prompts into well-scoped tasks to minimize token overhead
- Monitor via the Usage API or dashboard proactively
- Add your voice to GitHub issue #13186 with your own reproduction data
- Consider alternatives if the limits break your workflow
As engineers, we deserve better observability into the tools we depend on. Every cloud provider, every SaaS platform, every API service has learned that billing transparency isn’t optional - it’s foundational to developer trust. OpenAI is the outlier here, and the community pressure is making that harder to ignore.
Building with AI coding tools? Make sure your AI-generated code is production-ready. Read Why 73% of Vibe-Coded Apps Fail for the full production readiness checklist.