Rate Limits & Quotas

2 min read

Understanding and working within Claude API rate limits and usage quotas

title: Rate Limits & Quotas description: Understanding and working within Claude API rate limits and usage quotas

The Claude API implements rate limits to ensure fair usage and system stability. This guide explains how limits work and strategies to optimize your usage.

Rate Limit Types

Request Limits

| Tier | Requests Per Minute (RPM) | |------|---------------------------| | Free | 5 | | Build | 50 | | Scale | 1,000 | | Enterprise | Custom |

Token Limits

| Tier | Tokens Per Minute (TPM) | Tokens Per Day (TPD) | |------|-------------------------|----------------------| | Free | 20,000 | 300,000 | | Build | 100,000 | 2,500,000 | | Scale | 400,000 | 10,000,000 | | Enterprise | Custom | Custom |

Model-Specific Limits

Different models may have different limits:

| Model | Max Context | Max Output | |-------|-------------|------------| | Claude Opus 4 | 200K | 8,192 | | Claude Sonnet 4 | 200K | 8,192 | | Claude 3.5 Haiku | 200K | 8,192 |

Understanding Rate Limit Headers

Every API response includes rate limit information:

Text

Accessing Headers

TypeScript

Handling Rate Limits

Basic Rate Limit Handler

TypeScript

Token-Based Rate Limiting

TypeScript

Request Queuing

Simple Queue Implementation

TypeScript

Optimizing Token Usage

Prompt Compression

Reduce token usage by being concise:

TypeScript

Response Length Control

Set appropriate max_tokens:

TypeScript

Context Window Management

Trim conversation history to stay within limits:

TypeScript

Model Selection for Cost Optimization

Choose the appropriate model for the task:

| Task Type | Recommended Model | Rationale | |-----------|-------------------|-----------| | Simple Q&A | Haiku | Fast, cheap | | Code review | Sonnet | Good balance | | Complex analysis | Opus | Highest capability | | High volume | Haiku/Sonnet | Cost effective |

TypeScript

Monitoring Usage

Usage Tracking

TypeScript

Best Practices

Implement backoff - Always use exponential backoff for retries
Monitor headers - Track remaining limits in responses
Batch when possible - Reduce request overhead
Use appropriate models - Don't use Opus for simple tasks
Set max_tokens appropriately - Don't request more than needed
Compress prompts - Remove unnecessary verbosity
Manage conversation length - Trim history to stay within limits
Plan for scale - Consider upgrading tiers for production

Tier Upgrades

If you're hitting limits regularly:

Build tier - For development and small-scale production
Scale tier - For production applications
Enterprise - For custom limits and SLAs

Visit console.anthropic.com↗ to manage your tier.

Next Steps

Sources

Generated with AI using Claude AI by Anthropic

Model: Claude Opus 4.5 · Generated: 2025-12-20 · Build: v1.21.0-75762dc

Edit this page on GitHub··

Rate Limits & Quotas

title: Rate Limits & Quotas description: Understanding and working within Claude API rate limits and usage quotas

Rate Limit Types

Request Limits

Token Limits

Model-Specific Limits

Understanding Rate Limit Headers

Accessing Headers

Handling Rate Limits

Basic Rate Limit Handler

Token-Based Rate Limiting

Request Queuing

Simple Queue Implementation

Optimizing Token Usage

Prompt Compression

Response Length Control

Context Window Management

Model Selection for Cost Optimization

Monitoring Usage

Usage Tracking

Best Practices

Tier Upgrades

Next Steps

Sources

Discussion