Bot Access Control

Content Signals

Declaration of consent for AI training and search in robots.txt via the Content-Signal directive (contentsignals.org).

What are Content Signals?

Content Signals is an extension to the robots.txt standard that lets site owners explicitly declare their content-use policy for AI systems. The standard is developed by contentsignals.org.

The directive is added to the robots.txt file as:

Content-Signal: ai-train=yes, search=yes, ai-input=yes

Three keys:

ai-train — whether you allow your content to be used for training AI models (yes / no)
search — whether you allow AI search engines to index and cite your content (yes / no)
ai-input — whether you allow your content to be fed as input to AI agents when they execute tasks (yes / no)

The absence of a Content-Signal means ambiguity — each AI provider interprets that in its own way.

Why does a site need Content Signals?

Before Content Signals, site owners had only one tool to control AI bots — blocking them via User-agent: GPTBot / Disallow: /. That is binary: allow everything or block everything.

Content Signals adds granularity:

You can allow search=yes (citation in AI search) while prohibiting ai-train=no (don’t use for model training)
Media companies that sell training licenses set ai-train=no as an explicit declaration of their position
Open-source and educational content often sets ai-train=yes, search=yes, ai-input=yes as support for the AI ecosystem

For GEO: search=yes signals to AI search engines (Perplexity, ChatGPT Search, Google AI Overview) that your content may be cited. Without this signal, AI systems act more cautiously.

Legal aspect: an explicit ai-train=no declaration in robots.txt is becoming one element of copyright protection in the context of AI training. Some jurisdictions treat such a declaration as a legally meaningful statement of non-consent.

How to add Content Signals?

Add the directive to the end of your robots.txt:

User-agent: *
Allow: /

Sitemap: https://example.com/sitemap.xml

Content-Signal: ai-train=yes, search=yes, ai-input=yes

Recommended values by site type:

Site type	Recommendation
Open content, documentation, education	`ai-train=yes, search=yes, ai-input=yes`
Commercial content, news, media	`ai-train=no, search=yes, ai-input=yes`
Closed/premium content	`ai-train=no, search=no, ai-input=no`
SaaS, product without editorial content	`ai-train=yes, search=yes, ai-input=yes`

Important: the Content-Signal directive is at the file level, not inside a User-agent block. It is a global policy declaration for the site.

Example with different policies per section:

The Content Signals spec also supports per-path directives in its extended syntax, but a single global declaration is sufficient for most use cases.

How do we check Content Signals?

Our scanner depends on the robots_txt check — we first retrieve the robots.txt file, then parse it for the Content-Signal directive.

Algorithm:

Fetch robots.txt (result cached from the robots_txt check)
Find the directive Content-Signal: (case-insensitive search)
Parse values — split by comma, extract keys ai-train, search, ai-input

Status pass — directive found with at least one key. Status fail — directive absent from robots.txt.

We deliberately do not penalize ai-train=no or other specific values — that is the site’s policy, not an error. The scanner only records the presence of an explicit declaration, not its content.

What are Content Signals?

Why does a site need Content Signals?

How to add Content Signals?

How do we check Content Signals?

Step-by-step guides

Checks in our scanner

Related terms

Sources and specifications