RateLimit headers: let agents back off gracefully
What standard RateLimit headers are, why AI agents need them, a minimal example, right vs wrong, mistakes, and how to verify.
Updated:
What it is
Standard RateLimit headers (an IETF draft) tell the client its quota right in
the response: how many requests remain and when the counter resets. The modern
form is a structured RateLimit field (with limit, remaining, reset
parameters); previously three separate headers were used —
RateLimit-Limit/-Remaining/-Reset.
Why it matters for AI agents
An automated agent doesn’t “feel” load the way a human does. Without an explicit
quota it either hammers your server or hits a hard block (429) and fails. A
machine-readable limit lets the agent slow down early and politely: it sees
remaining: 2 and eases off. That’s both server protection and predictable
automation for the client.
Minimal working example
Modern (structured) form:
HTTP/1.1 200 OK
RateLimit: limit=100, remaining=87, reset=42
RateLimit-Policy: 100;w=60
When exceeded:
HTTP/1.1 429 Too Many Requests
RateLimit: limit=100, remaining=0, reset=30
Retry-After: 30
Right vs wrong
| Right | Wrong |
|---|---|
RateLimit with limit/remaining/reset on responses | No headers — the agent learns the limit only from a 429 |
Retry-After on a 429 | A 429 with no hint when to retry |
reset reflects the real window | Random/incorrect values |
| Consistent across all limited endpoints | Only on some endpoints |
Common mistakes
- Only a
429with no RateLimit headers — the agent flies blind. - No
Retry-Afteron a429— the client doesn’t know the pause. - Wrong
reset(wrong window/units) — the agent backs off at the wrong time. - Mixing the old and new formats without consistency.
How to verify
A scan checks for RateLimit headers. Manually:
curl -sI https://example.com/api/endpoint | grep -i 'ratelimit\|retry-after'
Expect RateLimit: (or RateLimit-*) on a normal response and Retry-After on a 429.