API / Auth / MCP Easy

RateLimit headers: let agents back off gracefully

What standard RateLimit headers are, why AI agents need them, a minimal example, right vs wrong, mistakes, and how to verify.

Updated:

What it is

Standard RateLimit headers (an IETF draft) tell the client its quota right in the response: how many requests remain and when the counter resets. The modern form is a structured RateLimit field (with limit, remaining, reset parameters); previously three separate headers were used — RateLimit-Limit/-Remaining/-Reset.

Why it matters for AI agents

An automated agent doesn’t “feel” load the way a human does. Without an explicit quota it either hammers your server or hits a hard block (429) and fails. A machine-readable limit lets the agent slow down early and politely: it sees remaining: 2 and eases off. That’s both server protection and predictable automation for the client.

Minimal working example

Modern (structured) form:

HTTP/1.1 200 OK
RateLimit: limit=100, remaining=87, reset=42
RateLimit-Policy: 100;w=60

When exceeded:

HTTP/1.1 429 Too Many Requests
RateLimit: limit=100, remaining=0, reset=30
Retry-After: 30

Right vs wrong

RightWrong
RateLimit with limit/remaining/reset on responsesNo headers — the agent learns the limit only from a 429
Retry-After on a 429A 429 with no hint when to retry
reset reflects the real windowRandom/incorrect values
Consistent across all limited endpointsOnly on some endpoints

Common mistakes

  • Only a 429 with no RateLimit headers — the agent flies blind.
  • No Retry-After on a 429 — the client doesn’t know the pause.
  • Wrong reset (wrong window/units) — the agent backs off at the wrong time.
  • Mixing the old and new formats without consistency.

How to verify

A scan checks for RateLimit headers. Manually:

curl -sI https://example.com/api/endpoint | grep -i 'ratelimit\|retry-after'

Expect RateLimit: (or RateLimit-*) on a normal response and Retry-After on a 429.

Sources