Sitemap with lastmod for AI: how to do it right
Why AI crawlers need a sitemap with lastmod dates, a minimal working example, right vs wrong, common mistakes, and how to verify.
Updated:
What it is
A sitemap is an XML file (usually /sitemap.xml) that lists your site’s URLs.
The element that matters most in the AI era is the <lastmod> tag — the date a
page last changed. It tells crawlers which pages are fresh and worth re-fetching.
Why it matters for AI agents
AI engines (Perplexity, ChatGPT Search, YandexGPT) pull fresh content into
answers. <lastmod> is a direct freshness signal: without it the crawler guesses
when to re-crawl and often serves a stale version in answers. Correct lastmod
speeds updates into AI results. The sitemap is also the map an agent uses to find
pages that aren’t linked from the homepage.
Minimal working example
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://example.com/</loc>
<lastmod>2026-06-07</lastmod>
</url>
<url>
<loc>https://example.com/guides/llms-txt</loc>
<lastmod>2026-06-05</lastmod>
</url>
</urlset>
And always reference the sitemap from robots.txt:
Sitemap: https://example.com/sitemap.xml
Right vs wrong
| Right | Wrong |
|---|---|
<lastmod> on every URL, W3C date format (2026-06-07) | No lastmod — the crawler can’t see freshness |
Absolute URLs in <loc> | Relative paths (/page) — invalid |
A Sitemap: line in robots.txt | A sitemap that’s referenced nowhere |
lastmod reflects the real edit date | ”Today” everywhere — the crawler stops trusting it |
Common mistakes
- Fake
lastmod(today’s date on every page) — crawlers detect this and start ignoring the signal. - Sitemap not referenced in
robots.txt— lowers discovery odds. - Over 50,000 URLs or >50 MB in one file — you need a sitemap index.
- Relative URLs in
<loc>— the spec requires absolute URLs. - 404 or wrong
Content-Typeon the sitemap file itself.
How to verify
A free scan checks for the sitemap, valid lastmod, and the robots.txt
reference. Manually:
curl -sI https://example.com/sitemap.xml # 200 + application/xml
curl -s https://example.com/sitemap.xml | head -20
Also submit the sitemap to Google Search Console and Yandex Webmaster — they show its indexing status.