robots.txt, llms.txt & Sitemaps for AI Crawlers
robots.txt, llms.txt, and sitemap.xml best practices for AI crawlers and pay‑per‑crawl Use robots.txt to control access, sitemap.xml to advertise indexable URLs, and…
The complete machine-readable dataset for this section is open to all agents — no payment required. Fetch the structured JSON directly.
https://esa.aisa.one/api/v1/access/verify
robots.txt, llms.txt, and sitemap.xml best practices for AI crawlers and pay‑per‑crawl
Use robots.txt to control access, sitemap.xml to advertise indexable URLs, and llms.txt to curate which pages AI systems should prioritize for answers and retrieval. Together they form the core technical layer for AI crawlers and pay‑per‑crawl programs.
1. Configure robots.txt for AI and pay‑per‑crawl
Place robots.txt at your root (/robots.txt) and explicitly allow or disallow each AI user‑agent. For pay‑per‑crawl crawlers, ensure you allow their specific user‑agents (e.g., AISA‑Bot) and point to your sitemap:
``txt User-agent: AISA‑Bot Allow: / Sitemap: https://yoursite.com/sitemap.xml ``
If you participate in pay‑per‑crawl, do not block the vendor’s crawler; instead, use Disallow only for admin paths, internal search, cart/checkout, and staging. For non‑paying or aggressive scrapers, block them explicitly. If a crawler supports it, you may return HTTP 402 (Payment Required) for certain paths, but rely on robots.txt directives as the primary signal.
2. Maintain a clean, AI‑ready sitemap.xml
Only include canonical, indexable URLs with accurate lastmod timestamps. Exclude noindex, 404, redirected, or thin pages. For large sites, split into content‑type sitemaps (articles, products, docs) and reference them in a sitemap index. Ensure your sitemap is discoverable by adding a Sitemap: line in robots.txt. This helps AI crawlers and pay‑per‑crawl systems efficiently discover and prioritize fresh, high‑value content.
3. Create and curate llms.txt
Place llms.txt at your root (/llms.txt) as a plain‑text or Markdown “table of contents” for AI systems. List only your strongest evergreen pages (services, docs, pricing, policies, support, case studies) and add short descriptions explaining why each page matters. Keep the file concise and aligned with canonical URLs. Treat llms.txt as an editorial layer that guides AI toward answer‑ready content, not as a substitute for robots.txt or sitemap.xml.
Key takeaways
- Use
robots.txtto explicitly allow or block AI and pay‑per‑crawl crawlers, and reference yoursitemap.xmlto improve discovery. - Keep
sitemap.xmlclean, canonical, and up‑to‑date so AI systems and pay‑per‑crawl vendors can efficiently crawl high‑value pages. - Use
llms.txtto curate a small set of authoritative, answer‑ready pages and avoid sending mixed signals to AI models.
Free guide synthesized by the AISA LLM layer (AISA Perplexity API). 2026-06-23.
Sources & citations
- https://www.qwairy.co/guides/complete-guide-to-robots-txt-and-llms-txt-for-ai-crawlers
- https://lseo.com/generative-engine-optimization/llms-txt-vs-robots-txt-vs-sitemap-xml-what-each-file-does-for-ai-discovery/
- https://omnirank.net/blog/robots-txt-sitemap-llms-txt-complete-guide
- https://www.vanitech.com.au/blog/llms-txt-vs-robots-txt-vs-sitemap-what-businesses-should-allow/
- https://www.webless.ai/blog/from-search-engines-to-ai-bots-the-role-of-robots-txt-sitemap-xml-and-the-rise-of-llm-txt
- https://aicrawlercheck.com/blog/robots-txt-best-practices-ai-seo
- https://dageno.ai/academy/llms-txt-vs-robots-txt
- https://www.sitemap.ai/blog/best-practices-ai-crawlers-website