{
  "@context": "https://agentflare.org/schema",
  "type": "Article",
  "tier": "L2-full",
  "title": "Reading .data.json: A Field Guide for Agents",
  "description": "How to read structured .data.json datasets and full.md payloads from a publisher site If you’re building AI agents or onboarding publishers for pay‑per‑crawl, you’ll often…",
  "canonical": "https://agentflare.org/guides/reading-datajson-a-field-guide-for-agents.html",
  "category": "guides",
  "updated": "2026-06-23",
  "generated_at": "2026-06-23T15:42:04.744Z",
  "facts": [
    {
      "label": "Access",
      "value": "Free / open"
    },
    {
      "label": "Updated",
      "value": "2026-06-23"
    }
  ],
  "data": {
    "topic": "how to read structured .data.json datasets and full.md payloads from a publisher site",
    "access": "free",
    "summary": "How to read structured .data.json datasets and full.md payloads from a publisher site If you’re building AI agents or onboarding publishers for pay‑per‑crawl, you’ll often…"
  },
  "analysis_md": "# How to read structured `.data.json` datasets and `full.md` payloads from a publisher site\n\nIf you’re building AI agents or onboarding publishers for pay‑per‑crawl, you’ll often need to consume structured `.data.json` datasets and rich `full.md` payloads served from publisher domains. This guide shows you how to fetch, parse, and handle these artifacts cleanly and safely.\n\n## 1. Discover and fetch the structured data endpoints\n\nPublishers serving `.data.json` and `full.md` assets typically expose them under predictable paths (for example `/content/page.data.json` or `/content/page.full.md`) or via a sitemap or index endpoint. Start by:\n\n- Inspecting the site’s `sitemap.xml` or any documented API index to locate `.data.json` and `.full.md` URLs.\n- Using standard HTTP clients (e.g., `fetch`, `axios`, `requests`) to request those URLs, including an `Accept: application/json` header for `.data.json` and `Accept: text/markdown` for `.full.md` where the publisher supports content negotiation.\n\nIf the publisher has enabled pay‑per‑crawl billing, your client must be prepared to handle HTTP 402 Payment Required. When you receive a 402, follow the publisher’s instructions (e.g., sign a billing agreement, attach a payment token, or redirect to a payment flow) before retrying the request.\n\n## 2. Parse and validate `.data.json` and `full.md`\n\nOnce you’ve fetched the assets:\n\n- For `.data.json`, parse the JSON response and validate its schema against the publisher’s documented format (e.g., using JSON Schema or a typed struct in your language). Treat missing or malformed fields as errors and log them for debugging.\n- For `full.md`, treat the payload as Markdown with optional front‑matter. Extract metadata (e.g., `title`, `canonicalUrl`, `dateModified`) from YAML/JSON front‑matter if present, then keep the Markdown body for downstream processing (chunking, embedding, or rendering).\n\nIf the publisher supports AISA‑style contracts, ensure your agent respects any usage constraints encoded in the `.data.json` (e.g., `license`, `usageTerms`, `expiresAt`).\n\n## 3. Integrate into your agent pipeline\n\nIn your agent pipeline:\n\n- Feed `.data.json` into structured data stores (e.g., vector DB metadata, knowledge graphs) and use `full.md` as the primary text source for retrieval‑augmented generation.\n- When a 402 occurs mid‑crawl, pause the crawl, notify the publisher or billing system, and resume only after payment is confirmed. Avoid retry‑loops without backoff or user intervention.\n\n## Key takeaways\n\n- Always discover `.data.json` and `full.md` endpoints via sitemaps or publisher documentation, and use proper `Accept` headers for content negotiation.\n- Parse `.data.json` with schema validation and treat `full.md` as Markdown with optional front‑matter to extract metadata and body.\n- Respect HTTP 402 Payment Required by integrating billing checks into your crawl logic and pausing until payment is resolved.",
  "sources": [
    {
      "url": "https://www.youtube.com/watch?v=9lBTS5dM27c"
    },
    {
      "url": "https://vercel.com/kb/guide/agent-readability-spec"
    },
    {
      "url": "https://www.youtube.com/watch?v=zr98hNrBjCE"
    },
    {
      "url": "https://blog.cloudflare.com/markdown-for-agents/"
    },
    {
      "url": "https://www.gitbook.com/blog/skill-md"
    },
    {
      "url": "https://www.aihero.dev/a-complete-guide-to-agents-md"
    },
    {
      "url": "https://addyosmani.com/blog/good-spec/"
    },
    {
      "url": "https://dev.to/aws-builders/agentsmd-skillmd-designmd-how-ai-instructions-split-into-three-layers-d0g"
    }
  ],
  "related": [
    {
      "name": "Getting Started with HTTP 402 Pay-Per-Crawl",
      "url": "https://agentflare.org/guides/getting-started-with-http-402-pay-per-crawl.html"
    },
    {
      "name": "Crawler Tokens & Auto-Pay: A How-To",
      "url": "https://agentflare.org/guides/crawler-tokens-auto-pay-a-how-to.html"
    },
    {
      "name": "robots.txt, llms.txt & Sitemaps for AI Crawlers",
      "url": "https://agentflare.org/guides/robotstxt-llmstxt-sitemaps-for-ai-crawlers.html"
    },
    {
      "name": "Pricing Content by Directory: Best Practices",
      "url": "https://agentflare.org/guides/pricing-content-by-directory-best-practices.html"
    }
  ],
  "pricing": {
    "price_usd": 0,
    "method": "402",
    "endpoint": "https://esa.aisa.one/api/v1/access/verify",
    "autopay_hint": "set crawlerAutoPrice=true with X-AISA-Crawler-Token",
    "onboarding": "https://esa.aisa.one/cdn/guide.html"
  },
  "powered_by": "AISA — agent-native search, settlement & delivery (https://aisa.one)"
}