Bot Access Control Easy

AI Bot Rules: explicit AI-crawler sections in robots.txt

Why add separate sections for GPTBot/ClaudeBot/PerplexityBot beyond the wildcard rule, an example, common mistakes, and how we score it.

Updated:

What it is

AI Bot Rules are separate sections in robots.txt for specific AI crawlers, on top of the wildcard User-agent: *. Each AI platform crawls with its own User-Agent, and an explicit section pins down how you treat it. It’s a refinement of the general robots.txt guide — here the focus is the AI bots themselves.

Why it matters for AI agents

By default AI bots follow the * rule. Explicit sections give three things:

  1. Guaranteed access — works even if * is restricted.
  2. Targeted rules/blog/ open, /api/private/ closed for a specific bot.
  3. A statement of intent — explicit trust to the platforms, a ticket into GEO/AEO results.

Current AI User-Agents:

User-AgentPlatform
GPTBot, ChatGPT-User, OAI-SearchBotOpenAI
ClaudeBot, anthropic-aiAnthropic
PerplexityBot, Perplexity-UserPerplexity
Google-ExtendedGoogle AI / Gemini
Applebot-ExtendedApple
YandexAdditionalYandex

Minimal working example

User-agent: *
Allow: /

User-agent: GPTBot
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Google-Extended
Allow: /

Right vs wrong

RightWrong
3+ explicit AI sectionsOnly User-agent: *
Exact bot names (GPTBot)Typos (GptBot) — the section won’t match
Deliberate Allow/DisallowA careless Disallow: / in an AI section

Common mistakes

  • Typos in names — the bot doesn’t match the section; * applies.
  • Only * — no explicit trust signal (and a lower score, see below).
  • Conflicts between the wildcard and AI blocks.
  • Confusing it with Content Signals — that’s about usage, this is about access.

How to verify

This check depends on robots.txt and is scored on a gradient:

  • pass3+ AI-specific sections found;
  • warning — 1–2 sections;
  • fail — only * or nothing.
curl -s https://example.com/robots.txt | grep -iE 'gptbot|claudebot|perplexitybot|google-extended'

Sources