OpenAI Codex is draining your quota too fast & they're blaming you for it

OpenAI Codex quota is vanishing after a single session for Plus and Pro users. A confirmed scaling bug draining limits - and OpenAI's response blames users.

Inzimam Ul Haq
Inzimam Ul Haq
· 14 min read
Developer staring at a terminal screen showing quota exceeded errors and usage warnings
Photo by Vojtech Bruzek on Unsplash

Why is OpenAI Codex draining quota so fast?

OpenAI Codex is draining quota faster than expected because cloud tasks and local tasks now share the same weekly quota pool - a silent change made in late 2025 with no announcement. This is compounded by a confirmed scaling bug in v0.40.0 where the 5-hour and weekly limits don’t increase proportionally, and an undocumented API key routing issue that can bill organisation API credits instead of your ChatGPT subscription.

After tracking quota consumption across my own Pro account and correlating it with reports from GitHub issue #13186, Reddit, and the OpenAI community forums, the pattern is clear: a single cloud task that spins up a sandboxed VM and loads a 192K-token context window can silently consume the same budget as dozens of local CLI prompts. There is no pre-task estimate, no confirmation prompt, and no way to know the cost until your dashboard reads 0%.

We’re calling the worst part of this Phantom Quota - quota consumed by tasks that hang, time out, or fail without returning any usable output. You pay for the tokens. You get nothing back. There’s no reclaim mechanism.

TL;DR: OpenAI Codex Quota Drain

  • The Silent Change: Cloud tasks no longer have their own quota pool - they now share the same 5-hour and weekly budget as local tasks. No announcement was made.
  • The Scaling Bug: In v0.40.0, the 5-hour limit and weekly limit don’t scale proportionally. OpenAI acknowledged this as a bug.
  • The API Key Trap: Codex CLI may silently route requests through your org API key, billing your paid API credits at pay-as-you-go rates instead of your subscription.
  • Phantom Quota: Hung or failed tasks still consume your token allocation with no reclaim mechanism.
  • The Response: OpenAI’s collaborator-badged engineer blamed users: “It’s likely in how you’re using Codex.” The comment earned a wall of downvotes.
  • The Fix: Run /status to check limits, prefer local tasks, unset OPENAI_API_KEY for org accounts, use codex-mini-latest for routine work, and use the Web UI as a fallback, Pro users can use spark model.

Timeline: How the Codex Quota Crisis Unfolded

Understanding the sequence matters - this wasn’t a single event, but a slow erosion of limits with no communication:

  • Pre-November 2025: Cloud tasks have a separate, generous quota pool. OpenAI pricing page advertises “generous limits for a limited time.”
  • November 2025: “Generous limits” language silently removed from the pricing page. Cloud and local tasks merged into a shared quota pool. No announcement, no changelog.
  • Late 2025 (v0.40.0): Users confirm the 5-hour and weekly limits don’t scale proportionally - a 2% increase in the 5-hour window yields only 1% weekly.
  • December 2025: OpenAI acknowledges the scaling bug and provides credits to affected cloud task users.
  • February 2026: GPT-5.3-Codex released. Codex macOS app launched. Quota structure remains unchanged, but 2x for limited time.
  • March 2026: Core quota caps, Phantom Quota, and transparency gaps remain unaddressed.GitHub issue #13186 opened. etraut-openai responds: “It’s likely in how you’re using Codex.” Mass downvotes follow.

The quota math that doesn’t add up

Users on both Plus ($20/month) and Pro ($200/month) plans are hitting their weekly limits after as little as two 30-minute coding sessions:

ReportPlanWhat Happened
2-minute task, 70 lines of codePlusConsumed 23% of the 5-hour limit in a single request
$40 in extra creditsProDisappeared within a handful of prompts
256K tokensProBurns 1% of weekly limit - implying a cap of ~2.5M tokens/week
Hung sessions (Phantom Quota)All plansSessions that return no results still consume quota with no recourse

What a Pro plan actually costs per session

Based on the 256K tokens = 1% weekly limit reports, a Pro plan user has a theoretical weekly cap of approximately 2.5M tokens:

  • Average cloud task: 150K-300K tokens (context window loading + generated output)
  • Tasks before weekly limit: Approximately 8-16 cloud tasks per week
  • Tasks per session (typical): 3-5 cloud tasks
  • Sessions before exhaustion: 2-5 sessions per week
  • Effective cost per session: $10-$25 (on a $200/month subscription)

For context, OpenAI’s pay-as-you-go API pricing for codex-mini-latest is $1.50 per million input tokens and $6.00 per million output tokens. At 2.5M tokens/week, the raw API cost would be roughly $3.75-$15/week depending on input/output ratio. A $200/month Pro subscription is only competitive if the quota is generous enough to justify the premium - and at 2-5 sessions per week, it isn’t. If you’re looking for ways to maximize AI coding output without burning through quota, we broke down a $0 AI coding agency workflow that sidesteps subscription limits entirely.

Why cloud tasks consume so much more

Cloud tasks burn through quota faster than local tasks for three structural reasons:

  1. Sandbox VM overhead: Each cloud task spins up an isolated container with its own filesystem, adding system-prompt tokens and environment context to every request.
  2. Larger context windows: Cloud tasks load up to 192K tokens of input context (your codebase, instructions, file contents) before generating any output, compared to local tasks which operate on smaller, focused contexts.
  3. Token metering is opaque: OpenAI measures consumption via input tokens (TPM) and output tokens, but the CLI doesn’t show per-task token counts. You can’t see how much a cloud task will cost before it runs.

OpenAI’s response: “It’s probably you”

When dozens of developers flooded GitHub issue #13186 and related threads with reproducible reports, an OpenAI collaborator - etraut-openai - responded with this:

“We have not changed anything on the server side related to usage accounting or metering. If you’re noticing a change in usage, it’s likely in how you’re using Codex. If you suspect this might be a client-side regression, you could try downgrading to an earlier version of the CLI.”

The response earned widespread downvotes - and for good reason:

1. The pattern doesn’t match isolated user error. Multiple independent users across different accounts, plans, operating systems, and use cases reported the exact same behavior simultaneously. Isolated user error doesn’t produce a synchronized flood across GitHub, Reddit, and the OpenAI forums.

2. Phantom Quota is a platform problem, not a user problem. When a task hangs indefinitely without returning results and still deducts from your token allocation, that’s a metering issue. Users have no mechanism to cancel a hung task and reclaim the consumed tokens.

3. “Try downgrading” is a deflection, not a fix. Telling paying customers to roll back their CLI version without any further investigation shifts the burden of diagnosis onto the people experiencing the problem. No logs were requested, no reproduction steps gathered, no escalation offered.

When a Collaborator-badged engineer responds to a mass-reported issue with “it’s likely how you’re using it” and gets a wall of downvotes, that’s not a communication gap. That’s a trust problem.

When did OpenAI change Codex cloud task quota?

OpenAI merged Codex cloud task quota into the shared local task pool in late 2025, removing the “generous limits for a limited time” language from its pricing page without any announcement, changelog, or in-product notification. Previously, cloud tasks had a separate, larger quota allocation. Now, a single cloud task consumes the same weekly budget as dozens of local CLI prompts.

AspectBefore (Pre-Late 2025)After (Current)
Cloud task quotaSeparate, generous poolShared with local tasks
Local task quotaOwn poolShared with cloud tasks
Pricing page language”Generous limits for a limited time”Language removed
User notificationN/ANone - no announcement, no changelog, no UI banner
Pre-task cost estimateNot availableStill not available
Token usage visibilityNot availableOnly via /status command and dashboard - no per-task breakdown

This is the equivalent of a cloud provider changing your instance pricing overnight and expecting you to discover it through your next invoice. AWS, GCP, and Azure all publish pricing changes with advance notice and deprecation timelines. OpenAI updated a web page.

Why don’t the 5-hour and weekly Codex limits scale equally?

There’s a compounding issue. In Codex CLI version 0.40.0, users confirmed that the 5-hour rolling limit and the weekly limit don’t scale proportionally:

  • When the 5-hour limit increases by 2%, the weekly limit only goes up by 1%
  • This means heavy usage within a 5-hour window disproportionately impacts your weekly ceiling
  • The mismatch creates a situation where burst usage is penalized more heavily than spread-out usage

OpenAI acknowledged this as a bug they were actively working on. However, even with the scaling fixed, the core quota caps remain too tight for professional development workflows.

The OpenAI community forum has users explicitly calling for OpenAI to revert quota to the previous specification, noting Codex “hits the limit almost immediately under normal use and is effectively unusable.” Multiple users have reported canceling their Pro subscriptions or evaluating alternatives directly in those threads.

Does Codex CLI bill my API key instead of my subscription?

Yes. Codex CLI can silently route requests through your organisation’s OPENAI_API_KEY environment variable instead of your ChatGPT Plus or Pro subscription. This means API credits are consumed at pay-as-you-go rates ($1.50-$6.00 per million tokens for codex-mini-latest) even though you’re paying $20-$200/month for a subscription that should cover the usage.

This affects users who have both a ChatGPT subscription and an organisation API key set in their environment (via .env files, shell profiles, or system variables). The CLI picks up the OPENAI_API_KEY and uses it for authentication, bypassing your subscription quota entirely.

Fix: Unset OPENAI_API_KEY to stop double billing

Unset your API key before launching Codex to force authentication through your ChatGPT subscription:

# macOS / Linux
unset OPENAI_API_KEY
codex
# Windows CMD
cmd /C "set "OPENAI_API_KEY=" && codex"
# Windows PowerShell
$env:OPENAI_API_KEY = ""; codex

To make this permanent, check for OPENAI_API_KEY in your ~/.bashrc, ~/.zshrc, ~/.bash_profile, and any project-level .env files. If you need the API key for other tools, create a wrapper script that unsets it before launching Codex:

# ~/bin/codex-sub (use subscription auth)
#!/bin/bash
unset OPENAI_API_KEY
codex "$@"

How do I stretch my OpenAI Codex quota?

Until OpenAI addresses quota transparency and fixes the scaling mismatch, here’s how to stretch your limits. (For a broader look at how security and trust compare across AI coding tools, see our dedicated breakdown.)

WorkaroundWhy It HelpsHow to Do It
Run /status to check limitsShows current rate limits and session info before you run expensive tasksType /status in the Codex CLI. Cross-reference with your usage dashboard for weekly totals.
Prefer local tasks over cloud tasksCloud tasks spin up sandbox VMs and load larger context windows, consuming 5-10x more tokensReserve cloud tasks for complex, multi-file work only. Use local mode for single-file edits and quick fixes.
Use codex-mini-latest (—model flag)Offers up to 4x higher usage limits for local messages compared to the full modelRun codex —model codex-mini-latest or select it via the /model command in the CLI. Reserve GPT-5.3-Codex for complex architecture work.
Use the Codex Web UI + GitHub fallbackWhen CLI quota is exhausted, the web UI can continue workingConnect your repo to the Codex web UI to work in sandboxed environments and push PRs back to your local branch.
Consolidate promptsReduces per-request overhead and context re-loading tokensInstead of multiple small back-and-forth prompts, batch context into one well-scoped task with clear requirements.
Monitor proactively via APIPrevents surprise walls when you’re mid-sessionCheck /status in the CLI, the usage dashboard, or use the OpenAI Usage API for programmatic monitoring. Check x-ratelimit-remaining-requests headers in API responses.
Unset OPENAI_API_KEYStops silent billing to your org API credits at pay-as-you-go ratesRun unset OPENAI_API_KEY before launching Codex if you have an organisation account.
Consider the Batch API for async work50% cost savings on input and output tokens for non-real-time tasksIf you’re running batch processing or large refactors, the Batch API processes tasks asynchronously at half the token cost.

The Web UI fallback workflow

When your CLI quota is exhausted, you don’t have to stop working entirely:

  1. Push your current branch to GitHub
  2. Open the Codex Web UI and connect your repository
  3. Create tasks in the sandboxed web environment
  4. Review and merge the resulting PRs back to your local branch
  5. Pull the changes and continue development

This adds friction and breaks your flow - but it’s better than staring at a “quota exceeded” wall for the rest of the week. If your project uses TypeScript, our TypeScript patterns guide can help you write cleaner prompts that generate better code on the first pass - reducing the back-and-forth that eats quota.

What OpenAI should actually do

The developer community isn’t asking for unlimited usage. They’re asking for basic transparency that every mature developer platform already provides:

FeatureStatusIndustry StandardWhy It Matters
Pre-task cost estimate❌ Not availableAWS, GCP, Azure all provide thisUsers can’t make informed decisions about which tasks to run
Real-time consumption warning❌ Not availableStandard in cloud billingA task consuming >10% of your limit should ask for confirmation
Clear reset time display❌ Not in CLIStandard in rate-limited APIsUsers can’t plan their work around reset windows
Per-task token breakdown❌ Not availableStandard in LLM APIs (Anthropic, Google)Users can’t identify which tasks consumed the most quota
Policy change announcements❌ Silent changesChangelogs, blog posts, deprecation noticesUsers discover changes through Reddit, not official channels
Phantom Quota recovery❌ Not availableCommon in cloud compute billingUsers pay for compute that delivered no output

The fact that developers are discovering policy changes through Reddit threads and GitHub issue floods - not official announcements - says a lot about how OpenAI views its developer community right now.

Alternatives worth considering

If the Codex quota situation is pushing you toward alternatives, here’s how competing AI coding tools compare on transparency and limits:

ToolPricingQuota TransparencyBest For
Claude Code (Anthropic)$20/mo (Max plan)Token-based, visible limits, per-message token countsTerminal workflows, backend
GitHub Copilot$19-39/moUnlimited completions, premium request countersEnterprise teams, compliance
Cursor$20/moRequest-based, visible counter in status barFull-stack, codebase-aware
Windsurf$15/moCredit-based, dashboard visibleAgentic workflows

Need a detailed comparison? Read our comprehensive AI-Augmented Development Guide for in-depth analysis of Cursor, Windsurf, Claude Code, and more - including quota models and billing transparency. For a security-focused comparison, see Antigravity vs Cursor, Kiro & Windsurf.

The bottom line

OpenAI Codex’s quota drain is a combination of a silent policy change, a confirmed scaling bug, an undocumented API key routing issue for org users, Phantom Quota from hung tasks, and - most frustratingly - an official response that deflects blame onto the users experiencing it.

Pro subscribers publicly stating they’ve canceled, GitHub issues with hundreds of reproductions, and forum threads calling for quota rollbacks aren’t isolated complaints. They’re a pattern. OpenAI has provided credits as a short-term bandage, but credits don’t fix the underlying issues: no pre-task visibility, no per-task token breakdown, no Phantom Quota recovery, and no honest communication when policies change.

What to do right now:

  1. Run /status in the CLI to check your current rate limits before running expensive tasks
  2. Prefer local tasks and use codex-mini-latest via the --model flag for routine work (up to 4x higher limits)
  3. Unset OPENAI_API_KEY if you have an org account - stop paying twice
  4. Use the Web UI + GitHub fallback when CLI quota is exhausted
  5. Consolidate prompts into well-scoped tasks to minimize token overhead
  6. Monitor via the Usage API or dashboard proactively
  7. Add your voice to GitHub issue #13186 with your own reproduction data
  8. Consider alternatives if the limits break your workflow

As engineers, we deserve better observability into the tools we depend on. Every cloud provider, every SaaS platform, every API service has learned that billing transparency isn’t optional - it’s foundational to developer trust. OpenAI is the outlier here, and the community pressure is making that harder to ignore.

Building with AI coding tools? Make sure your AI-generated code is production-ready. Read Why 73% of Vibe-Coded Apps Fail for the full production readiness checklist.

Frequently Asked Questions

Why is my OpenAI Codex quota draining so fast?
Codex cloud tasks now share the same quota pool as local CLI tasks. A single cloud task can consume as much quota as dozens of smaller local prompts because cloud tasks spin up sandboxed VMs and load larger context windows, consuming more input and output tokens. Additionally, hung sessions that never return results still drain your allocation (Phantom Quota). Use the /status command in the CLI to check your current rate limits.
What changed about OpenAI Codex quota in late 2025?
Before late 2025, cloud tasks had their own separate, generous quota pool. OpenAI's pricing page previously advertised 'generous limits for a limited time.' That language was silently removed, and cloud tasks now pull from the same 5-hour and weekly quota as lightweight local tasks - with no announcement or changelog.
What is the OpenAI Codex 5-hour vs weekly limit scaling bug?
In Codex CLI version 0.40.0, users confirmed the 5-hour limit and weekly limit don't scale proportionally. When the 5-hour limit increases by 2%, the weekly limit only goes up by 1%. OpenAI acknowledged this as a bug they were working on, but the core quota caps remain restrictive regardless.
How do I fix the Codex OPENAI_API_KEY organisation billing bug?
If you have an OpenAI organisation account, Codex CLI may silently route requests through your org API key instead of your ChatGPT Plus or Pro subscription, billing at pay-as-you-go API rates. Run 'unset OPENAI_API_KEY' before launching Codex on macOS/Linux, or use 'cmd /C set OPENAI_API_KEY= && codex' on Windows. This forces authentication through your subscription.
How much quota does a single Codex cloud task consume?
Cloud tasks are significantly more resource-intensive than local tasks because they spin up sandboxed VMs and load larger context windows (up to 192K tokens of input context). Users have reported a single 2-minute cloud task producing roughly 70 lines of code consuming 23% of a 5-hour limit. On Pro plans, 256K tokens reportedly burns 1% of the weekly limit, implying a weekly cap of approximately 2.5M tokens.
Does OpenAI Codex charge me if a task hangs or fails?
Yes. This is what we call 'Phantom Quota' - sessions that hang without returning results still consume quota, and users have no way to reclaim that lost allocation. Users are billed for compute time (token processing) regardless of whether they received usable output, with no mechanism to cancel a hung task and recover the spent tokens.
How do I check my remaining Codex quota?
Use the /status command in the Codex CLI to see your current session info and rate limits. For a comprehensive overview, check the OpenAI platform dashboard at platform.openai.com/usage. Developers can also use the OpenAI Usage API and Costs API for programmatic monitoring, or check x-ratelimit-remaining-requests headers in API responses.
What is the best way to stretch my Codex quota?
Prefer local tasks over cloud tasks, use codex-mini-latest (or GPT-5.1-Codex-Mini via the --model flag) for routine work (up to 4x higher limits), consolidate multiple small prompts into one well-scoped task, use the Codex Web UI with GitHub as a fallback when CLI quota is exhausted, monitor your usage dashboard proactively, and unset your OPENAI_API_KEY if you have an org account to prevent silent API billing.
Can I get a refund for Codex quota lost to bugs?
OpenAI has provided free credits to affected Plus and Pro users who experienced issues with cloud task usage tracking. Contact OpenAI support with your usage data if you believe a bug consumed your quota. However, there is no formal refund process for quota consumed by normal usage or Phantom Quota from hung sessions.
What did OpenAI say about the Codex quota drain reports?
An OpenAI collaborator (etraut-openai) responded to mass-reported quota drain issues on GitHub issue #13186 by stating 'We have not changed anything on the server side related to usage accounting or metering' and suggested the issue was 'likely in how you're using Codex.' The response received significant community backlash with numerous downvotes across GitHub and Reddit.
Should I cancel my ChatGPT Pro subscription over the Codex quota issue?
That depends on your usage patterns. If Codex is your primary tool, the quota restrictions may make the $200/month Pro plan poor value - users report exhausting their weekly allocation in under 3.5 sessions, effectively paying $57+ per usable session. Consider whether local tasks with codex-mini-latest meet your needs, try the workarounds listed in this article, and monitor the OpenAI community forums for policy updates.
Is there an alternative to OpenAI Codex with better quota policies?
Claude Code by Anthropic, GitHub Copilot, Cursor, and Windsurf are alternatives with different pricing and quota models. Each has trade-offs in capabilities and limits. Our AI-Augmented Development guide compares these tools in detail, including quota transparency and billing models.
Will OpenAI fix the Codex quota issues?
OpenAI acknowledged the 5-hour vs weekly scaling mismatch as a bug and stated they were working on it. They have also provided credits to affected cloud task users. However, the core quota caps and the lack of pre-task cost estimates, real-time warnings, and clear reset time displays remain unaddressed as of March 2026.
What is Phantom Quota in OpenAI Codex?
Phantom Quota is quota consumed by Codex tasks that hang, time out, or fail without returning usable output. The user's token allocation is debited for compute time, but no code or useful result is delivered. There is currently no mechanism to cancel a hung task, recover the spent tokens, or receive a credit for failed executions.
How does OpenAI Codex metering work?
Codex metering counts input tokens (your prompt, codebase context, and system instructions) and output tokens (generated code and explanations) against your subscription limits. Rate limits are measured in tokens per minute (TPM) and requests per minute (RPM). Cloud tasks consume more tokens because they load larger context windows (up to 192K tokens) and run inside sandboxed VMs with additional system overhead.

Sources & References