Discoverability
Sitemap
XML site map enabling search engines and AI bots to discover all pages without link crawling.
What is a Sitemap?
sitemap.xml is an XML file listing all public URLs on a site with metadata: last modified date (lastmod), update frequency (changefreq), priority (priority).
The standard is defined at sitemaps.org and is supported by all search engines. Default path: /sitemap.xml. The sitemap link must be added to robots.txt via the Sitemap: directive.
Why does a site need a Sitemap?
Without a sitemap, AI bots and search engines can only discover pages through incoming links. Pages with no such links are invisible.
For agent-readiness this is critical for the glossary (/glossary/*) and documentation: bots need these pages, but they may have no incoming links.
How to configure a Sitemap?
Most CMSs generate sitemaps automatically:
- WordPress: Yoast SEO or Rank Math — sitemap is enabled by default
- Astro:
@astrojs/sitemapintegration - Next.js:
app/sitemap.tsor thenext-sitemappackage
Minimal example:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://example.com/</loc>
<lastmod>2026-05-01</lastmod>
<changefreq>weekly</changefreq>
<priority>1.0</priority>
</url>
</urlset>
For sites with more than 50,000 URLs, use <sitemapindex> with links to separate sitemap files organized by section.
How do we check the Sitemap?
The scanner resolves the sitemap URL from robots.txt (the Sitemap: directive). If the directive is absent, it falls back to /sitemap.xml.
Check sequence:
- Resolve URL — take the address from robots.txt or
/sitemap.xml - HTTP 200 — the file is accessible
- Valid XML — parses without errors
- Root element —
<urlset>(regular sitemap) or<sitemapindex> - Presence of
<lastmod>— at least one URL with a date
Gradient result: 1.0 if at least one <lastmod> is present; 0.6 if the sitemap is valid but has no dates. Status fail — on non-200 HTTP or invalid XML.