AgentFlare

Reading .data.json: A Field Guide for Agents

How to read structured .data.json datasets and full.md payloads from a publisher site If you’re building AI agents or onboarding publishers for pay‑per‑crawl, you’ll often…

✓ Open accessFree

The complete machine-readable dataset for this section is open to all agents — no payment required. Fetch the structured JSON directly.

View full data → agents: allowed (no charge) · verify at https://esa.aisa.one/api/v1/access/verify

How to read structured .data.json datasets and full.md payloads from a publisher site

If you’re building AI agents or onboarding publishers for pay‑per‑crawl, you’ll often need to consume structured .data.json datasets and rich full.md payloads served from publisher domains. This guide shows you how to fetch, parse, and handle these artifacts cleanly and safely.

1. Discover and fetch the structured data endpoints

Publishers serving .data.json and full.md assets typically expose them under predictable paths (for example /content/page.data.json or /content/page.full.md) or via a sitemap or index endpoint. Start by:

  • Inspecting the site’s sitemap.xml or any documented API index to locate .data.json and .full.md URLs.
  • Using standard HTTP clients (e.g., fetch, axios, requests) to request those URLs, including an Accept: application/json header for .data.json and Accept: text/markdown for .full.md where the publisher supports content negotiation.

If the publisher has enabled pay‑per‑crawl billing, your client must be prepared to handle HTTP 402 Payment Required. When you receive a 402, follow the publisher’s instructions (e.g., sign a billing agreement, attach a payment token, or redirect to a payment flow) before retrying the request.

2. Parse and validate .data.json and full.md

Once you’ve fetched the assets:

  • For .data.json, parse the JSON response and validate its schema against the publisher’s documented format (e.g., using JSON Schema or a typed struct in your language). Treat missing or malformed fields as errors and log them for debugging.
  • For full.md, treat the payload as Markdown with optional front‑matter. Extract metadata (e.g., title, canonicalUrl, dateModified) from YAML/JSON front‑matter if present, then keep the Markdown body for downstream processing (chunking, embedding, or rendering).

If the publisher supports AISA‑style contracts, ensure your agent respects any usage constraints encoded in the .data.json (e.g., license, usageTerms, expiresAt).

3. Integrate into your agent pipeline

In your agent pipeline:

  • Feed .data.json into structured data stores (e.g., vector DB metadata, knowledge graphs) and use full.md as the primary text source for retrieval‑augmented generation.
  • When a 402 occurs mid‑crawl, pause the crawl, notify the publisher or billing system, and resume only after payment is confirmed. Avoid retry‑loops without backoff or user intervention.

Key takeaways

  • Always discover .data.json and full.md endpoints via sitemaps or publisher documentation, and use proper Accept headers for content negotiation.
  • Parse .data.json with schema validation and treat full.md as Markdown with optional front‑matter to extract metadata and body.
  • Respect HTTP 402 Payment Required by integrating billing checks into your crawl logic and pausing until payment is resolved.

Free guide synthesized by the AISA LLM layer (AISA Perplexity API). 2026-06-23.

Sources & citations

  1. https://www.youtube.com/watch?v=9lBTS5dM27c
  2. https://vercel.com/kb/guide/agent-readability-spec
  3. https://www.youtube.com/watch?v=zr98hNrBjCE
  4. https://blog.cloudflare.com/markdown-for-agents/
  5. https://www.gitbook.com/blog/skill-md
  6. https://www.aihero.dev/a-complete-guide-to-agents-md
  7. https://addyosmani.com/blog/good-spec/
  8. https://dev.to/aws-builders/agentsmd-skillmd-designmd-how-ai-instructions-split-into-three-layers-d0g