Reading .data.json: A Field Guide for Agents
How to read structured .data.json datasets and full.md payloads from a publisher site If you’re building AI agents or onboarding publishers for pay‑per‑crawl, you’ll often…
The complete machine-readable dataset for this section is open to all agents — no payment required. Fetch the structured JSON directly.
https://esa.aisa.one/api/v1/access/verify
How to read structured .data.json datasets and full.md payloads from a publisher site
If you’re building AI agents or onboarding publishers for pay‑per‑crawl, you’ll often need to consume structured .data.json datasets and rich full.md payloads served from publisher domains. This guide shows you how to fetch, parse, and handle these artifacts cleanly and safely.
1. Discover and fetch the structured data endpoints
Publishers serving .data.json and full.md assets typically expose them under predictable paths (for example /content/page.data.json or /content/page.full.md) or via a sitemap or index endpoint. Start by:
- Inspecting the site’s
sitemap.xmlor any documented API index to locate.data.jsonand.full.mdURLs. - Using standard HTTP clients (e.g.,
fetch,axios,requests) to request those URLs, including anAccept: application/jsonheader for.data.jsonandAccept: text/markdownfor.full.mdwhere the publisher supports content negotiation.
If the publisher has enabled pay‑per‑crawl billing, your client must be prepared to handle HTTP 402 Payment Required. When you receive a 402, follow the publisher’s instructions (e.g., sign a billing agreement, attach a payment token, or redirect to a payment flow) before retrying the request.
2. Parse and validate .data.json and full.md
Once you’ve fetched the assets:
- For
.data.json, parse the JSON response and validate its schema against the publisher’s documented format (e.g., using JSON Schema or a typed struct in your language). Treat missing or malformed fields as errors and log them for debugging. - For
full.md, treat the payload as Markdown with optional front‑matter. Extract metadata (e.g.,title,canonicalUrl,dateModified) from YAML/JSON front‑matter if present, then keep the Markdown body for downstream processing (chunking, embedding, or rendering).
If the publisher supports AISA‑style contracts, ensure your agent respects any usage constraints encoded in the .data.json (e.g., license, usageTerms, expiresAt).
3. Integrate into your agent pipeline
In your agent pipeline:
- Feed
.data.jsoninto structured data stores (e.g., vector DB metadata, knowledge graphs) and usefull.mdas the primary text source for retrieval‑augmented generation. - When a 402 occurs mid‑crawl, pause the crawl, notify the publisher or billing system, and resume only after payment is confirmed. Avoid retry‑loops without backoff or user intervention.
Key takeaways
- Always discover
.data.jsonandfull.mdendpoints via sitemaps or publisher documentation, and use properAcceptheaders for content negotiation. - Parse
.data.jsonwith schema validation and treatfull.mdas Markdown with optional front‑matter to extract metadata and body. - Respect HTTP 402 Payment Required by integrating billing checks into your crawl logic and pausing until payment is resolved.
Free guide synthesized by the AISA LLM layer (AISA Perplexity API). 2026-06-23.
Sources & citations
- https://www.youtube.com/watch?v=9lBTS5dM27c
- https://vercel.com/kb/guide/agent-readability-spec
- https://www.youtube.com/watch?v=zr98hNrBjCE
- https://blog.cloudflare.com/markdown-for-agents/
- https://www.gitbook.com/blog/skill-md
- https://www.aihero.dev/a-complete-guide-to-agents-md
- https://addyosmani.com/blog/good-spec/
- https://dev.to/aws-builders/agentsmd-skillmd-designmd-how-ai-instructions-split-into-three-layers-d0g