# content-signals

## What are Content Signals?

**Content Signals** is an extension to the [robots.txt](/glossary/robots-txt) standard that lets site owners explicitly declare their content-use policy for AI systems. The standard is developed by contentsignals.org.

The directive is added to the `robots.txt` file as:

```
Content-Signal: ai-train=yes, search=yes, ai-input=yes
```

Three keys:
- **`ai-train`** — whether you allow your content to be used for training AI models (`yes` / `no`)
- **`search`** — whether you allow AI search engines to index and cite your content (`yes` / `no`)
- **`ai-input`** — whether you allow your content to be fed as input to AI agents when they execute tasks (`yes` / `no`)

The absence of a `Content-Signal` means ambiguity — each AI provider interprets that in its own way.

## Why does a site need Content Signals?

Before Content Signals, site owners had only one tool to control AI bots — blocking them via `User-agent: GPTBot / Disallow: /`. That is binary: allow everything or block everything.

Content Signals adds **granularity**:

- You can allow `search=yes` (citation in AI search) while prohibiting `ai-train=no` (don't use for model training)
- Media companies that sell training licenses set `ai-train=no` as an explicit declaration of their position
- Open-source and educational content often sets `ai-train=yes, search=yes, ai-input=yes` as support for the AI ecosystem

**For GEO:** `search=yes` signals to AI search engines (Perplexity, ChatGPT Search, Google AI Overview) that your content may be cited. Without this signal, AI systems act more cautiously.

**Legal aspect:** an explicit `ai-train=no` declaration in robots.txt is becoming one element of copyright protection in the context of AI training. Some jurisdictions treat such a declaration as a legally meaningful statement of non-consent.

## How to add Content Signals?

Add the directive to the end of your `robots.txt`:

```
User-agent: *
Allow: /

Sitemap: https://example.com/sitemap.xml

Content-Signal: ai-train=yes, search=yes, ai-input=yes
```

**Recommended values by site type:**

| Site type | Recommendation |
|---|---|
| Open content, documentation, education | `ai-train=yes, search=yes, ai-input=yes` |
| Commercial content, news, media | `ai-train=no, search=yes, ai-input=yes` |
| Closed/premium content | `ai-train=no, search=no, ai-input=no` |
| SaaS, product without editorial content | `ai-train=yes, search=yes, ai-input=yes` |

**Important:** the `Content-Signal` directive is at the file level, not inside a `User-agent` block. It is a global policy declaration for the site.

**Example with different policies per section:**

The Content Signals spec also supports per-path directives in its extended syntax, but a single global declaration is sufficient for most use cases.

## How do we check Content Signals?

Our scanner depends on the `robots_txt` check — we first retrieve the robots.txt file, then parse it for the `Content-Signal` directive.

Algorithm:

1. **Fetch robots.txt** (result cached from the `robots_txt` check)
2. **Find the directive** `Content-Signal:` (case-insensitive search)
3. **Parse values** — split by comma, extract keys `ai-train`, `search`, `ai-input`

Status **pass** — directive found with at least one key. Status **fail** — directive absent from robots.txt.

We deliberately do not penalize `ai-train=no` or other specific values — that is the site's policy, not an error. The scanner only records the presence of an explicit declaration, not its content.

[← All glossary terms](/en/glossary)
