How does robots.txt, llms.txt & Sitemaps for AI Crawlers work?

robots.txt, llms.txt & Sitemaps for AI Crawlers

robots.txt, llms.txt, and sitemap.xml best practices for AI crawlers and pay‑per‑crawl Use robots.txt to control access, sitemap.xml to advertise indexable URLs, and…

✓ Open accessFree

The complete machine-readable dataset for this section is open to all agents — no payment required. Fetch the structured JSON directly.

View full data → agents: allowed (no charge) · verify at https://esa.aisa.one/api/v1/access/verify

robots.txt, llms.txt, and sitemap.xml best practices for AI crawlers and pay‑per‑crawl

Use robots.txt to control access, sitemap.xml to advertise indexable URLs, and llms.txt to curate which pages AI systems should prioritize for answers and retrieval. Together they form the core technical layer for AI crawlers and pay‑per‑crawl programs.

1. Configure robots.txt for AI and pay‑per‑crawl

Place robots.txt at your root (/robots.txt) and explicitly allow or disallow each AI user‑agent. For pay‑per‑crawl crawlers, ensure you allow their specific user‑agents (e.g., AISA‑Bot) and point to your sitemap:

``txt User-agent: AISA‑Bot Allow: / Sitemap: https://yoursite.com/sitemap.xml ``

If you participate in pay‑per‑crawl, do not block the vendor’s crawler; instead, use Disallow only for admin paths, internal search, cart/checkout, and staging. For non‑paying or aggressive scrapers, block them explicitly. If a crawler supports it, you may return HTTP 402 (Payment Required) for certain paths, but rely on robots.txt directives as the primary signal.

2. Maintain a clean, AI‑ready sitemap.xml

Only include canonical, indexable URLs with accurate lastmod timestamps. Exclude noindex, 404, redirected, or thin pages. For large sites, split into content‑type sitemaps (articles, products, docs) and reference them in a sitemap index. Ensure your sitemap is discoverable by adding a Sitemap: line in robots.txt. This helps AI crawlers and pay‑per‑crawl systems efficiently discover and prioritize fresh, high‑value content.

3. Create and curate llms.txt

Place llms.txt at your root (/llms.txt) as a plain‑text or Markdown “table of contents” for AI systems. List only your strongest evergreen pages (services, docs, pricing, policies, support, case studies) and add short descriptions explaining why each page matters. Keep the file concise and aligned with canonical URLs. Treat llms.txt as an editorial layer that guides AI toward answer‑ready content, not as a substitute for robots.txt or sitemap.xml.

Key takeaways

Use robots.txt to explicitly allow or block AI and pay‑per‑crawl crawlers, and reference your sitemap.xml to improve discovery.
Keep sitemap.xml clean, canonical, and up‑to‑date so AI systems and pay‑per‑crawl vendors can efficiently crawl high‑value pages.
Use llms.txt to curate a small set of authoritative, answer‑ready pages and avoid sending mixed signals to AI models.

Free guide synthesized by the AISA LLM layer (AISA Perplexity API). 2026-06-23.