Pay-Per-Crawl and the New Bot Web
AI Web Crawlers: GPTBot, ClaudeBot, PerplexityBot, and Pay‑Per‑Crawl As of 2026, AI‑driven web crawlers such as GPTBot, ClaudeBot, and PerplexityBot now account for a…
This page is a free summary. The complete machine-readable dataset — every data point, the full analysis and source set — is available to AI agents as structured JSON via the open HTTP 402 payment protocol.
crawlerAutoPrice=true · verify at https://esa.aisa.one/api/v1/access/verify
AI Web Crawlers: GPTBot, ClaudeBot, PerplexityBot, and Pay‑Per‑Crawl
As of 2026, AI‑driven web crawlers such as GPTBot, ClaudeBot, and PerplexityBot now account for a significant share of bot traffic. Developers and AI‑agent operators must understand how these crawlers behave, how they differ by purpose, and how emerging monetization models like HTTP 402 and pay‑per‑crawl reshape access control.
What GPTBot, ClaudeBot, and PerplexityBot Do
- GPTBot (OpenAI) is a training‑focused crawler that harvests content to improve foundation models. It generates little referral traffic but can be controlled via
robots.txtandUser‑Agentrules. - ClaudeBot (Anthropic) similarly crawls for training and retrieval, with Cloudflare data showing tens of thousands of pages crawled per referral, indicating heavy training‑oriented activity.
- PerplexityBot (Perplexity AI) is an indexing/retrieval crawler that builds the corpus for Perplexity’s answer engine, often returning cited answers and some referral traffic.
These crawlers are distinct from user‑triggered fetchers such as ChatGPT‑User, Claude‑User, and Perplexity‑User, which fetch pages on‑demand when a human queries an AI assistant.
Training vs. Search vs. User‑Triggered Crawlers
Modern AI vendors have split their crawlers into functional tiers:
- Training crawlers (GPTBot, ClaudeBot, CCBot, Google‑Extended) harvest data for model training and rarely send back clicks.
- Search/retrieval crawlers (OAI‑SearchBot, Claude‑SearchBot, PerplexityBot) build indexes used by AI search and citation features.
- User‑triggered fetchers (e.g.,
ChatGPT‑User,Claude‑Web,Perplexity‑User) represent real user intent and drive qualified traffic.
Blocking training crawlers while allowing search and user‑triggered bots is now a common strategy to preserve visibility and reduce bandwidth costs.
Pay‑Per‑Crawl, HTTP 402, and AI Agents
Cloudflare’s pay‑per‑crawl model lets publishers respond to training crawlers with HTTP 402 Payment Required, signaling licensing terms instead of a flat 403 or 404. AI agents that respect this pattern can:
- Check for
402responses and follow licensing instructions. - Integrate payment or authorization flows before fetching training‑relevant pages.
- Use
robots.txtdirectives (e.g.,Content‑Signal,TDM) to align with publisher preferences on training, citation, or research use.
Over a billion 402 responses are served daily, and early adopters are setting norms for fair compensation of training data.
Key takeaways
- GPTBot, ClaudeBot, and PerplexityBot are specialized crawlers: training‑focused, retrieval‑oriented, or answer‑engine indexing, respectively.
- Treat training, search, and user‑triggered crawlers separately; block or monetize training bots while allowing user‑facing ones.
- HTTP 402 and pay‑per‑crawl let publishers charge for training access; AI agents should respect
402and licensing metadata. - Use
robots.txtandUser‑Agentrules to implement a tiered policy that balances visibility, cost, and fair compensation.
Synthesized by the AISA LLM layer with live web sources (AISA Perplexity + Tavily APIs). 2026-06-23.
Sources & citations
- https://www.tencentcloud.com/techpedia/143900
- https://evolveamz.com/ai-crawler-list-2026-ecommerce/
- https://nohacks.co/blog/ai-user-agents-landscape-2026
- https://www.digitalapplied.com/blog/ai-crawler-bot-traffic-statistics-2026-data-reference
- https://www.tryaivo.com/blog/ai-crawler-cheat-sheet-2025-which-bots-should-you-allow
- https://www.oncrawl.com/ai/what-ai-bots-really-doing-your-site/
- https://blog.cloudflare.com/crawlers-click-ai-bots-training/
- https://www.digitalapplied.com/blog/ai-crawler-access-control-2026-robots-llms-txt-decision-matrix
- https://www.humansecurity.com/learn/blog/crawlers-list-known-bots-guide/
- Introducing pay per crawl: Enabling content owners to charge AI ...