Why is my OpenAI Codex quota draining so fast?

Codex cloud tasks now share the same quota pool as local CLI tasks. A single cloud task can consume as much quota as dozens of smaller local prompts because cloud tasks spin up sandboxed VMs and load larger context windows, consuming more input and output tokens. Additionally, hung sessions that never return results still drain your allocation (Phantom Quota). Use the /status command in the CLI to check your current rate limits.

What changed about OpenAI Codex quota in late 2025?

Before late 2025, cloud tasks had their own separate, generous quota pool. OpenAI's pricing page previously advertised 'generous limits for a limited time.' That language was silently removed, and cloud tasks now pull from the same 5-hour and weekly quota as lightweight local tasks - with no announcement or changelog.

What is the OpenAI Codex 5-hour vs weekly limit scaling bug?

In Codex CLI version 0.40.0, users confirmed the 5-hour limit and weekly limit don't scale proportionally. When the 5-hour limit increases by 2%, the weekly limit only goes up by 1%. OpenAI acknowledged this as a bug they were working on, but the core quota caps remain restrictive regardless.

How do I fix the Codex OPENAI_API_KEY organisation billing bug?

If you have an OpenAI organisation account, Codex CLI may silently route requests through your org API key instead of your ChatGPT Plus or Pro subscription, billing at pay-as-you-go API rates. Run 'unset OPENAI_API_KEY' before launching Codex on macOS/Linux, or use 'cmd /C set OPENAI_API_KEY= && codex' on Windows. This forces authentication through your subscription.

How much quota does a single Codex cloud task consume?

Cloud tasks are significantly more resource-intensive than local tasks because they spin up sandboxed VMs and load larger context windows (up to 192K tokens of input context). Users have reported a single 2-minute cloud task producing roughly 70 lines of code consuming 23% of a 5-hour limit. On Pro plans, 256K tokens reportedly burns 1% of the weekly limit, implying a weekly cap of approximately 2.5M tokens.

Does OpenAI Codex charge me if a task hangs or fails?

Yes. This is what we call 'Phantom Quota' - sessions that hang without returning results still consume quota, and users have no way to reclaim that lost allocation. Users are billed for compute time (token processing) regardless of whether they received usable output, with no mechanism to cancel a hung task and recover the spent tokens.

How do I check my remaining Codex quota?

Use the /status command in the Codex CLI to see your current session info and rate limits. For a comprehensive overview, check the OpenAI platform dashboard at platform.openai.com/usage. Developers can also use the OpenAI Usage API and Costs API for programmatic monitoring, or check x-ratelimit-remaining-requests headers in API responses.

What is the best way to stretch my Codex quota?

Prefer local tasks over cloud tasks, use codex-mini-latest (or GPT-5.1-Codex-Mini via the --model flag) for routine work (up to 4x higher limits), consolidate multiple small prompts into one well-scoped task, use the Codex Web UI with GitHub as a fallback when CLI quota is exhausted, monitor your usage dashboard proactively, and unset your OPENAI_API_KEY if you have an org account to prevent silent API billing.

Can I get a refund for Codex quota lost to bugs?

OpenAI has provided free credits to affected Plus and Pro users who experienced issues with cloud task usage tracking. Contact OpenAI support with your usage data if you believe a bug consumed your quota. However, there is no formal refund process for quota consumed by normal usage or Phantom Quota from hung sessions.

What did OpenAI say about the Codex quota drain reports?

An OpenAI collaborator (etraut-openai) responded to mass-reported quota drain issues on GitHub issue #13186 by stating 'We have not changed anything on the server side related to usage accounting or metering' and suggested the issue was 'likely in how you're using Codex.' The response received significant community backlash with numerous downvotes across GitHub and Reddit.

Should I cancel my ChatGPT Pro subscription over the Codex quota issue?

That depends on your usage patterns. If Codex is your primary tool, the quota restrictions may make the $200/month Pro plan poor value - users report exhausting their weekly allocation in under 3.5 sessions, effectively paying $57+ per usable session. Consider whether local tasks with codex-mini-latest meet your needs, try the workarounds listed in this article, and monitor the OpenAI community forums for policy updates.

Is there an alternative to OpenAI Codex with better quota policies?

Claude Code by Anthropic, GitHub Copilot, Cursor, and Windsurf are alternatives with different pricing and quota models. Each has trade-offs in capabilities and limits. Our AI-Augmented Development guide compares these tools in detail, including quota transparency and billing models.

Will OpenAI fix the Codex quota issues?

OpenAI acknowledged the 5-hour vs weekly scaling mismatch as a bug and stated they were working on it. They have also provided credits to affected cloud task users. However, the core quota caps and the lack of pre-task cost estimates, real-time warnings, and clear reset time displays remain unaddressed as of March 2026.

What is Phantom Quota in OpenAI Codex?

Phantom Quota is quota consumed by Codex tasks that hang, time out, or fail without returning usable output. The user's token allocation is debited for compute time, but no code or useful result is delivered. There is currently no mechanism to cancel a hung task, recover the spent tokens, or receive a credit for failed executions.

How does OpenAI Codex metering work?

Codex metering counts input tokens (your prompt, codebase context, and system instructions) and output tokens (generated code and explanations) against your subscription limits. Rate limits are measured in tokens per minute (TPM) and requests per minute (RPM). Cloud tasks consume more tokens because they load larger context windows (up to 192K tokens) and run inside sandboxed VMs with additional system overhead.

OpenAI Codex Drains Quota Fast: The Hidden Truth

Why is OpenAI Codex draining quota so fast?

OpenAI Codex is draining quota faster than expected because cloud tasks and local tasks now share the same weekly quota pool - a silent change made in late 2025 with no announcement.

After tracking quota consumption across my own Pro account and correlating it with reports from GitHub issue #13186, Reddit, and the OpenAI community forums, the pattern is clear: a single cloud task that spins up a sandboxed VM and loads a 192K-token context window can silently consume the same budget as dozens of local CLI prompts.

We’re calling the worst part of this Phantom Quota - quota consumed by tasks that hang, time out, or fail without returning any usable output. You pay for the tokens.

TL;DR: OpenAI Codex Quota Drain

The Silent Change: Cloud tasks no longer have their own quota pool - they now share the same 5-hour and weekly budget as local tasks. No announcement was made.

The Scaling Bug: In v0.40.0, the 5-hour limit and weekly limit don’t scale proportionally. OpenAI acknowledged this as a bug.

The API Key Trap: Codex CLI may silently route requests through your org API key, billing your paid API credits at pay-as-you-go rates instead of your subscription.

Phantom Quota: Hung or failed tasks still consume your token allocation with no reclaim mechanism.

The Response: OpenAI’s collaborator-badged engineer blamed users: “It’s likely in how you’re using Codex.” The comment earned a wall of downvotes.

The Fix: Run /status to check limits, prefer local tasks, unset OPENAI_API_KEY for org accounts, use codex-mini-latest for routine work, and use the Web UI as a fallback, Pro users can use spark model.

Timeline: How the Codex Quota Crisis Unfolded

Understanding the sequence matters - this wasn’t a single event, but a slow erosion of limits with no communication:

Pre-November 2025: Cloud tasks have a separate, generous quota pool. OpenAI pricing page advertises “generous limits for a limited time.”
November 2025: “Generous limits” language silently removed from the pricing page. Cloud and local tasks merged into a shared quota pool. No announcement, no changelog.
Late 2025 (v0.40.0): Users confirm the 5-hour and weekly limits don’t scale proportionally - a 2% increase in the 5-hour window yields only 1% weekly.
December 2025: OpenAI acknowledges the scaling bug and provides credits to affected cloud task users.
February 2026: GPT-5.3-Codex released. Codex macOS app launched. Quota structure remains unchanged, but 2x for limited time.
March 2026: Core quota caps, Phantom Quota, and transparency gaps remain unaddressed.GitHub issue #13186 opened.

The quota math that doesn’t add up

Users on both Plus ($20/month) and Pro ($200/month) plans are hitting their weekly limits after as little as two 30-minute coding sessions:

Report	Plan	What Happened
2-minute task, 70 lines of code	Plus	Consumed 23% of the 5-hour limit in a single request
$40 in extra credits	Pro	Disappeared within a handful of prompts
256K tokens	Pro	Burns 1% of weekly limit - implying a cap of ~2.5M tokens/week
Hung sessions (Phantom Quota)	All plans	Sessions that return no results still consume quota with no recourse

What a Pro plan actually costs per session

Based on the 256K tokens = 1% weekly limit reports, a Pro plan user has a theoretical weekly cap of approximately 2.5M tokens:

Average cloud task: 150K-300K tokens (context window loading + generated output)
Tasks before weekly limit: Approximately 8-16 cloud tasks per week
Tasks per session (typical): 3-5 cloud tasks
Sessions before exhaustion: 2-5 sessions per week
Effective cost per session: $10-$25 (on a $200/month subscription)

For context, OpenAI’s pay-as-you-go API pricing for codex-mini-latest is $1.50 per million input tokens and $6.00 per million output tokens.

Why cloud tasks consume so much more

Cloud tasks burn through quota faster than local tasks for three structural reasons:

Sandbox VM overhead: Each cloud task spins up an isolated container with its own filesystem, adding system-prompt tokens and environment context to every request.
Large context windows: Cloud tasks can load up to 192K tokens of input context—your codebase, system prompts, and conversation history—all counted against your quota before any code is generated.
Token metering is opaque: OpenAI measures consumption via input tokens (TPM) and output tokens, but the CLI doesn’t show per-task token counts.

OpenAI’s response: “It’s probably you”

When dozens of developers flooded GitHub issue #13186 and related threads with reproducible reports, an OpenAI collaborator - etraut-openai - responded with this:

“We have not changed anything on the server side related to usage accounting or metering.

The response earned widespread downvotes - and for good reason:

1. The pattern doesn’t match isolated user error. Multiple independent users across different accounts, plans, operating systems, and use cases reported the exact same behavior simultaneously.

2. Phantom Quota is a platform problem, not a user problem. When a task hangs indefinitely without returning results and still deducts from your token allocation, that’s a metering issue.

3. “Try downgrading” is a deflection, not a fix. Telling paying customers to roll back their CLI version without any further investigation shifts the burden of diagnosis onto the people experiencing the problem.

When a Collaborator-badged engineer responds to a mass-reported issue with “it’s likely how you’re using it” and gets a wall of downvotes, that’s not a communication gap. That’s a trust problem.

When did OpenAI change Codex cloud task quota?

OpenAI merged Codex cloud task quota into the shared local task pool in late 2025, removing the “generous limits for a limited time” language from its pricing page without any announcement, changelog, or in-product notification.

Aspect	Before (Pre-Late 2025)	After (Current)
Cloud task quota	Separate, generous pool	Shared with local tasks
Local task quota	Own pool	Shared with cloud tasks
Pricing page language	”Generous limits for a limited time”	Language removed
User notification	N/A	None - no announcement, no changelog, no UI banner
Pre-task cost estimate	Not available	Still not available
Token usage visibility	Not available	Only via /status command and dashboard - no per-task breakdown

This is the equivalent of a cloud provider changing your instance pricing overnight and expecting you to discover it through your next invoice.

Why don’t the 5-hour and weekly Codex limits scale equally?

There’s a compounding issue. In Codex CLI version 0.40.0, users confirmed that the 5-hour rolling limit and the weekly limit don’t scale proportionally:

When the 5-hour limit increases by 2%, the weekly limit only goes up by 1%
This means heavy usage within a 5-hour window disproportionately impacts your weekly ceiling
The mismatch creates a situation where burst usage is penalized more heavily than spread-out usage

OpenAI acknowledged this as a bug they were actively working on. However, even with the scaling fixed, the core quota caps remain too tight for professional development workflows.

The OpenAI community forum has users explicitly calling for OpenAI to revert quota to the previous specification, noting Codex “hits the limit almost immediately under normal use and is effectively unusable.” Multiple users have reported canceling their Pro subscriptions or evaluating alternatives directly in those threads.

Does Codex CLI bill my API key instead of my subscription?

Yes. If you have an OpenAI organization account and an OPENAI_API_KEY environment variable set, the Codex CLI may silently route requests through your org API key instead of your ChatGPT Plus or Pro subscription—billing at pay-as-you-go API rates.

This affects users who have both a ChatGPT subscription and an organisation API key set in their environment (via .env files, shell profiles, or system variables).

Fix: Unset OPENAI_API_KEY to stop double billing

Unset your API key before launching Codex to force authentication through your ChatGPT subscription:

# macOS / Linux
unset OPENAI_API_KEY
codex

# Windows CMD
cmd /C "set "OPENAI_API_KEY=" && codex"

# Windows PowerShell
$env:OPENAI_API_KEY = ""; codex

To make this permanent, check for OPENAI_API_KEY in your ~/.bashrc, ~/.zshrc, ~/.bash_profile, and any project-level .env files.

# ~/bin/codex-sub (use subscription auth)
#!/bin/bash
unset OPENAI_API_KEY
codex "$@"

How do I stretch my OpenAI Codex quota?

Until OpenAI addresses quota transparency and fixes the scaling mismatch, here’s how to stretch your limits.

Workaround	Why It Helps	How to Do It
Run /status to check limits	Shows current rate limits and session info before you run expensive tasks	Type `/status` in the Codex CLI. Cross-reference with your usage dashboard for weekly totals.
Prefer local tasks over cloud tasks	Cloud tasks spin up sandbox VMs and load larger context windows, consuming 5-10x more tokens	Reserve cloud tasks for complex, multi-file work only. Use local mode for single-file edits and quick fixes.
Use codex-mini-latest (—model flag)	Offers up to 4x higher usage limits for local messages compared to the full model	Run `codex —model codex-mini-latest` or select it via the `/model` command in the CLI. Reserve GPT-5.3-Codex for complex architecture work.
Use the Codex Web UI + GitHub fallback	When CLI quota is exhausted, the web UI can continue working	Connect your repo to the Codex web UI to work in sandboxed environments and push PRs back to your local branch.
Consolidate prompts	Reduces per-request overhead and context re-loading tokens	Instead of multiple small back-and-forth prompts, batch context into one well-scoped task with clear requirements.
Monitor proactively via API	Prevents surprise walls when you’re mid-session	Check `/status` in the CLI, the usage dashboard, or use the OpenAI Usage API for programmatic monitoring.
Unset OPENAI_API_KEY	Stops silent billing to your org API credits at pay-as-you-go rates	Run `unset OPENAI_API_KEY` before launching Codex if you have an organisation account.
Consider the Batch API for async work	50% cost savings on input and output tokens for non-real-time tasks	If you’re running batch processing or large refactors, the Batch API processes tasks asynchronously at half the token cost.

The Web UI fallback workflow

When your CLI quota is exhausted, you don’t have to stop working entirely:

Push your current branch to GitHub
Open the Codex Web UI and connect your repository
Create tasks in the sandboxed web environment
Review and merge the resulting PRs back to your local branch
Pull the changes and continue development

This adds friction and breaks your flow - but it’s better than staring at a “quota exceeded” wall for the rest of the week.

What OpenAI should actually do

The developer community isn’t asking for unlimited usage. They’re asking for basic transparency that every mature developer platform already provides:

Feature	Status	Industry Standard	Why It Matters
Pre-task cost estimate	❌ Not available	AWS, GCP, Azure all provide this	Users can’t make informed decisions about which tasks to run
Real-time consumption warning	❌ Not available	Standard in cloud billing	A task consuming >10% of your limit should ask for confirmation
Clear reset time display	❌ Not in CLI	Standard in rate-limited APIs	Users can’t plan their work around reset windows
Per-task token breakdown	❌ Not available	Standard in LLM APIs (Anthropic, Google)	Users can’t identify which tasks consumed the most quota
Policy change announcements	❌ Silent changes	Changelogs, blog posts, deprecation notices	Users discover changes through Reddit, not official channels
Phantom Quota recovery	❌ Not available	Common in cloud compute billing	Users pay for compute that delivered no output

The fact that developers are discovering policy changes through Reddit threads and GitHub issue floods - not official announcements - says a lot about how OpenAI views its developer community right now.

Alternatives worth considering

If the Codex quota situation is pushing you toward alternatives, here’s how competing AI coding tools compare on transparency and limits:

Tool	Pricing	Quota Transparency	Best For
Claude Code (Anthropic)	$20/mo (Max plan)	Token-based, visible limits, per-message token counts	Terminal workflows, backend
GitHub Copilot	$19-39/mo	Unlimited completions, premium request counters	Enterprise teams, compliance
Cursor	$20/mo	Request-based, visible counter in status bar	Full-stack, codebase-aware
Windsurf	$15/mo	Credit-based, dashboard visible	Agentic workflows

Need a detailed comparison? Read our comprehensive AI-Augmented Development Guide for in-depth analysis of Cursor, Windsurf, Claude Code, and more - including quota models and billing transparency.

The bottom line

OpenAI Codex’s quota drain is a combination of a silent policy change, a confirmed scaling bug, an undocumented API key routing issue for org users, Phantom Quota from hung tasks, and - most frustratingly - an official response that deflects blame onto the users experiencing it.

Pro subscribers publicly stating they’ve canceled, GitHub issues with hundreds of reproductions, and forum threads calling for quota rollbacks aren’t isolated complaints. They’re a pattern.

What to do right now:

Run /status in the CLI to check your current rate limits before running expensive tasks
Prefer local tasks and use codex-mini-latest via the --model flag for routine work (up to 4x higher limits)
Unset OPENAI_API_KEY if you have an org account - stop paying twice
Use the Web UI + GitHub fallback when CLI quota is exhausted
Consolidate prompts into well-scoped tasks to minimize token overhead
Monitor via the Usage API or dashboard proactively
Add your voice to GitHub issue #13186 with your own reproduction data
Consider alternatives if the limits break your workflow

As engineers, we deserve better observability into the tools we depend on.

Building with AI coding tools? Make sure your AI-generated code is production-ready.