Bot Access Control
Content Signals
Declaration of consent for AI training and search in robots.txt via the Content-Signal directive (contentsignals.org).
What are Content Signals?
Content Signals is an extension to the robots.txt standard that lets site owners explicitly declare their content-use policy for AI systems. The standard is developed by contentsignals.org.
The directive is added to the robots.txt file as:
Content-Signal: ai-train=yes, search=yes, ai-input=yes
Three keys:
ai-train— whether you allow your content to be used for training AI models (yes/no)search— whether you allow AI search engines to index and cite your content (yes/no)ai-input— whether you allow your content to be fed as input to AI agents when they execute tasks (yes/no)
The absence of a Content-Signal means ambiguity — each AI provider interprets that in its own way.
Why does a site need Content Signals?
Before Content Signals, site owners had only one tool to control AI bots — blocking them via User-agent: GPTBot / Disallow: /. That is binary: allow everything or block everything.
Content Signals adds granularity:
- You can allow
search=yes(citation in AI search) while prohibitingai-train=no(don’t use for model training) - Media companies that sell training licenses set
ai-train=noas an explicit declaration of their position - Open-source and educational content often sets
ai-train=yes, search=yes, ai-input=yesas support for the AI ecosystem
For GEO: search=yes signals to AI search engines (Perplexity, ChatGPT Search, Google AI Overview) that your content may be cited. Without this signal, AI systems act more cautiously.
Legal aspect: an explicit ai-train=no declaration in robots.txt is becoming one element of copyright protection in the context of AI training. Some jurisdictions treat such a declaration as a legally meaningful statement of non-consent.
How to add Content Signals?
Add the directive to the end of your robots.txt:
User-agent: *
Allow: /
Sitemap: https://example.com/sitemap.xml
Content-Signal: ai-train=yes, search=yes, ai-input=yes
Recommended values by site type:
| Site type | Recommendation |
|---|---|
| Open content, documentation, education | ai-train=yes, search=yes, ai-input=yes |
| Commercial content, news, media | ai-train=no, search=yes, ai-input=yes |
| Closed/premium content | ai-train=no, search=no, ai-input=no |
| SaaS, product without editorial content | ai-train=yes, search=yes, ai-input=yes |
Important: the Content-Signal directive is at the file level, not inside a User-agent block. It is a global policy declaration for the site.
Example with different policies per section:
The Content Signals spec also supports per-path directives in its extended syntax, but a single global declaration is sufficient for most use cases.
How do we check Content Signals?
Our scanner depends on the robots_txt check — we first retrieve the robots.txt file, then parse it for the Content-Signal directive.
Algorithm:
- Fetch robots.txt (result cached from the
robots_txtcheck) - Find the directive
Content-Signal:(case-insensitive search) - Parse values — split by comma, extract keys
ai-train,search,ai-input
Status pass — directive found with at least one key. Status fail — directive absent from robots.txt.
We deliberately do not penalize ai-train=no or other specific values — that is the site’s policy, not an error. The scanner only records the presence of an explicit declaration, not its content.