API / Auth / MCP Easy

RateLimit headers: let agents back off gracefully

What standard RateLimit headers are, why AI agents need them, a minimal example, right vs wrong, mistakes, and how to verify.

Updated: June 7, 2026

What it is

Standard RateLimit headers (an IETF draft) tell the client its quota right in the response: how many requests remain and when the counter resets. The modern form is a structured RateLimit field (with limit, remaining, reset parameters); previously three separate headers were used — RateLimit-Limit/-Remaining/-Reset.

Why it matters for AI agents

An automated agent doesn’t “feel” load the way a human does. Without an explicit quota it either hammers your server or hits a hard block (429) and fails. A machine-readable limit lets the agent slow down early and politely: it sees remaining: 2 and eases off. That’s both server protection and predictable automation for the client.

Minimal working example

Modern (structured) form:

HTTP/1.1 200 OK
RateLimit: limit=100, remaining=87, reset=42
RateLimit-Policy: 100;w=60

When exceeded:

HTTP/1.1 429 Too Many Requests
RateLimit: limit=100, remaining=0, reset=30
Retry-After: 30

Right vs wrong

Right	Wrong
`RateLimit` with `limit/remaining/reset` on responses	No headers — the agent learns the limit only from a `429`
`Retry-After` on a `429`	A `429` with no hint when to retry
`reset` reflects the real window	Random/incorrect values
Consistent across all limited endpoints	Only on some endpoints

Common mistakes

Only a 429 with no RateLimit headers — the agent flies blind.
No Retry-After on a 429 — the client doesn’t know the pause.
Wrong reset (wrong window/units) — the agent backs off at the wrong time.
Mixing the old and new formats without consistency.

How to verify

A scan checks for RateLimit headers. Manually:

curl -sI https://example.com/api/endpoint | grep -i 'ratelimit\|retry-after'

Expect RateLimit: (or RateLimit-*) on a normal response and Retry-After on a 429.