# robots-txt

## What is robots.txt?

`robots.txt` is a text file at `/robots.txt` containing directives for search bots and AI crawlers: which pages they are allowed or disallowed to visit.

The standard is formalized in RFC 9309 (2022). It has existed since 1994 and is supported by all search engines and most AI bots.

## Why does a site need robots.txt?

Without `robots.txt`, AI bots (GPTBot, ClaudeBot, PerplexityBot) don't know where they can or cannot go. The file solves four problems:

1. Explicitly allow AI bots to index public content
2. Block indexing of sensitive paths (`/admin/`, `/api/private/`)
3. Point to the sitemap via the `Sitemap:` directive
4. Add a [Content-Signal](/glossary/content-signals) to declare consent for AI use

A correct `robots.txt` for [GEO](/glossary/geo):

```
User-agent: *
Allow: /
Disallow: /admin/
Disallow: /api/private/

User-agent: GPTBot
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: PerplexityBot
Allow: /

Sitemap: https://example.com/sitemap.xml

Content-Signal: ai-train=yes, search=yes, ai-input=yes
```

## How to configure robots.txt?

Create a `public/robots.txt` file (for static sites) or a `/robots.txt` endpoint (for dynamic sites).

**WordPress:** plugins Yoast SEO or Rank Math generate robots.txt automatically. Add AI-bot sections manually via the file editor.

**Next.js:** create `app/robots.ts` or `public/robots.txt`:
```typescript
// app/robots.ts
import type { MetadataRoute } from 'next';

export default function robots(): MetadataRoute.Robots {
  return {
    rules: [
      { userAgent: '*', allow: '/' },
      { userAgent: 'GPTBot', allow: '/' },
      { userAgent: 'ClaudeBot', allow: '/' },
    ],
    sitemap: 'https://example.com/sitemap.xml',
  };
}
```

**Astro:** place `public/robots.txt` — it is served as a static file.

## How do we check robots.txt?

The scanner performs `GET /robots.txt` and checks sequentially:

1. **HTTP 200** — the file exists and is accessible
2. **Content-Type: text/plain** — served as text, not HTML
3. **Non-empty content** — the file is not empty
4. **Presence of `User-agent:` directives** — at least one block (RFC 9309)
5. **Format validity** — no structural errors

Gradient result: **1.0** if a `Sitemap:` directive or at least one non-wildcard User-agent block is present; **0.5** if only `User-agent: *` with no Sitemap. Status **fail** — on non-200 HTTP, empty file, or missing directives.

The file is cached and passed to dependent checks: [AI bot rules](/glossary/ai-bot-rules), [Content Signals](/glossary/content-signals), [Sitemap](/glossary/sitemap).

[← All glossary terms](/en/glossary)
