Discoverability Easy

Sitemap with lastmod for AI: how to do it right

Why AI crawlers need a sitemap with lastmod dates, a minimal working example, right vs wrong, common mistakes, and how to verify.

Updated: June 7, 2026

What it is

A sitemap is an XML file (usually /sitemap.xml) that lists your site’s URLs. The element that matters most in the AI era is the <lastmod> tag — the date a page last changed. It tells crawlers which pages are fresh and worth re-fetching.

Why it matters for AI agents

AI engines (Perplexity, ChatGPT Search, YandexGPT) pull fresh content into answers. <lastmod> is a direct freshness signal: without it the crawler guesses when to re-crawl and often serves a stale version in answers. Correct lastmod speeds updates into AI results. The sitemap is also the map an agent uses to find pages that aren’t linked from the homepage.

Minimal working example

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://example.com/</loc>
    <lastmod>2026-06-07</lastmod>
  </url>
  <url>
    <loc>https://example.com/guides/llms-txt</loc>
    <lastmod>2026-06-05</lastmod>
  </url>
</urlset>

And always reference the sitemap from robots.txt:

Sitemap: https://example.com/sitemap.xml

Right vs wrong

Right	Wrong
`<lastmod>` on every URL, W3C date format (`2026-06-07`)	No `lastmod` — the crawler can’t see freshness
Absolute URLs in `<loc>`	Relative paths (`/page`) — invalid
A `Sitemap:` line in robots.txt	A sitemap that’s referenced nowhere
`lastmod` reflects the real edit date	”Today” everywhere — the crawler stops trusting it

Common mistakes

Fake lastmod (today’s date on every page) — crawlers detect this and start ignoring the signal.
Sitemap not referenced in robots.txt — lowers discovery odds.
Over 50,000 URLs or >50 MB in one file — you need a sitemap index.
Relative URLs in <loc> — the spec requires absolute URLs.
404 or wrong Content-Type on the sitemap file itself.

How to verify

A free scan checks for the sitemap, valid lastmod, and the robots.txt reference. Manually:

curl -sI https://example.com/sitemap.xml   # 200 + application/xml
curl -s  https://example.com/sitemap.xml | head -20

Also submit the sitemap to Google Search Console and Yandex Webmaster — they show its indexing status.