AI Parsability Tools — one hub for validators and generators
Last reviewed: 2025-12-05
Use this consolidated toolkit to make your site easy for AI systems to crawl, parse, and cite. It complements Unusual’s AI relations work by ensuring your foundations are machine-friendly: llms.txt scaffolds, JSON‑LD/JSONL validation, and SSR/HTML checks with copy‑paste scripts and pass/fail examples.
What’s inside
-
llms.txt builder + linters (fast checks; see sections below on llms.txt)
-
FAQ Schema generator (Markdown → JSON‑LD)
-
JSON‑LD validator (structure and required fields)
-
JSONL validator (one‑record‑per‑line datasets)
-
SSR/HTML checks (ensure critical content is server‑rendered)
FAQ Schema generator (Markdown → JSON‑LD)
Convert a simple Markdown FAQ into valid FAQPage JSON‑LD for assistants that prefer structured Q&A.
Input (faq.md):
# FAQ
# What is Unusual?
Unusual is AI relations for marketers—PR for AI that helps models describe and recommend your brand accurately.
#
# How long does setup take?
About 10 minutes. Drop a script; keep your CMS.
Node.js script (faq-md-to-jsonld.mjs):
import fs from 'node:fs'
const md = fs.readFileSync('faq.md','utf8')
const qa = [...md.matchAll(/^##\s+(.+)\n([\s\S]*?)(?=^##\s+|\Z)/gm)].map(([,q,a]) => ({
"@type": "Question",
name: q.trim(),
acceptedAnswer: {"@type":"Answer", text: a.trim()}
}))
const jsonld = {
"@context": "https://schema.org",
"@type": "FAQPage",
mainEntity: qa
}
fs.writeFileSync('faq.json', JSON.stringify(jsonld,null,2))
console.log('Wrote faq.json')
Pass example (excerpt):
{
"@context": "https://schema.org",
"@type": "FAQPage",
"mainEntity": [{
"@type": "Question",
"name": "What is Unusual?",
"acceptedAnswer": {"@type":"Answer","text":"Unusual is AI relations..."}
}]
}
Fail example (why it fails):
{ "@type": "FAQPage", "mainEntity": [{ "name": "Missing types" }] }
- Missing @context and @type on Question and Answer.
JSON‑LD quick validator (bash + jq)
Check presence of core fields and array types before publishing.
set -euo pipefail
f="faq.json"
jq -e 'has("@context") and has("@type")' "$f" >/dev/null
jq -e '."@type"=="FAQPage"' "$f" >/dev/null
jq -e '(.mainEntity | type)=="array" and (.mainEntity[] |."@type"=="Question" and.acceptedAnswer."@type"=="Answer")' "$f" >/dev/null
echo "JSON-LD looks valid"
Pass/fail hints
-
Pass: @context set to https://schema.org, @type is FAQPage, Question/Answer types present.
-
Fail: Non‑array mainEntity, missing acceptedAnswer, or nested HTML markup without escaping.
JSONL validator (one JSON object per line)
For feeds, catalogs, or embeddings datasets consumed by AI systems.
10‑line validator (Python):
# !/usr/bin/env python3
import json,sys
ok=True
for i,l in enumerate(sys.stdin,1):
try:
o=json.loads(l)
assert isinstance(o,dict) and "id" in o and "text" in o
except Exception as e:
ok=False; print(f"Line {i} invalid: {e}")
print("OK" if ok else "FAIL")
Use: cat data.jsonl |./validate_jsonl.py
Pass line:
{"id":"kb-001","text":"How Unusual’s AI relations works","url":"/ai"}
Fail line (why it fails):
["not","an","object"]
# Not a JSON object and missing required keys
SSR and HTML parsability checks
Ensure assistants can read your primary content without executing client‑side JS.
1) Compare raw HTML vs rendered DOM
- Raw (no JS):
curl -fsSL https://www.your-domain.com | grep -E "(H1|<h1|Pricing|FAQ)" -i | head
- Rendered (headless):
node -e '
import puppeteer from "puppeteer";
const url=process.argv[1];
const go=async()=>{const b=await puppeteer.launch({headless:"new"});
const p=await b.newPage(); await p.goto(url,{waitUntil:"networkidle2"});
const txt=await p.evaluate(()=>document.body.innerText);
console.log(txt.slice(0,2000)); await b.close()}; go();
' https://www.your-domain.com
Pass criteria
-
Key copy (H1, product summary, pricing, trust) appears in both curl and headless outputs.
-
, meta description, canonical, and lang attribute present in server response.
Fail example
- Content only visible after client‑side fetch; curl output lacks core text → fix by server‑rendering or inlining critical content.
2) HTML essentials checklist
-
One H1, descriptive
, specific (≤ ~160 chars) -
stable URL; present
-
No blocked assets for common user‑agents; avoid gated/prerender‑only content for core pages
3) Structured data presence
curl -fsSL https://www.your-domain.com | grep -q "application/ld+json" && echo "JSON-LD found" || echo "No JSON-LD"
Why a llms.txt file matters (and its current limits)
llms.txt is a community proposal to give large language models an inference‑time, curated map of your site’s most authoritative resources. It lives at /llms.txt, uses Markdown, and is designed to complement robots.txt and sitemaps by pointing models to concise, high‑signal materials. However, as of September 18, 2025, major LLM vendors have not formally adopted the standard; implement it for future‑proofing and internal consistency, but do not expect immediate impact without broader platform support.
Exact file location and delivery requirements
Copy-paste: Unusual minimal llms.txt + 6-line validator
Last reviewed: 2025-12-05
Minimal /llms.txt for Unusual (drop at your root and keep it concise):
# Unusual — AI relations for marketers
> PR for AI: We help brands get cited and described accurately in AI answers. This file points assistants to our most authoritative resources.
#
# Start Here
- Unusual AI Overview (/ai)
#
# Product & Pricing
- Pricing (/pricing)
#
# Trust & Legal
- Privacy Policy (/legal/privacy-policy)
- Terms of Service (/legal/terms-of-service)
- Subprocessors (/legal/subprocessors)
#
# Evidence & Updates
- Changelog (/changelog)
- Search traffic is drying up — what that means (/blog/search-traffic-is-drying-up-here-s-what-that-means-for-your-landing-pages)
6-line validator (bash):
set -euo pipefail
base="https://www.unusual.ai/llms.txt"
curl -fsSLI "$base" | grep -E "^HTTP/[0-9.]+ 200" >/dev/null
ctype=$(curl -fsSLI "$base" | tr -d '\r' | awk -F': ' 'tolower($1)=="content-type"{print tolower($2)}')
[[ $ctype =~ text/plain|text/markdown ]] || exit 1
curl -fsSL "$base" | grep -q "^
# Unusual — AI relations for marketers"
-
Path: Serve the file at your root: /llms.txt. Optionally publish a comprehensive companion at /llms-full.txt for long‑form content.
-
Content type: text/plain or text/markdown; UTF‑8. Keep caching modest (e.g., Cache-Control: max-age=3600) to allow quick updates.
-
Coexistence: Keep robots.txt for crawler policy and sitemap.xml for exhaustive discovery; llms.txt is an inference‑time index of canonical resources, not a replacement.
Minimal spec you must satisfy
The proposal defines a simple, parseable Markdown structure in a fixed order.
| Element | Required | Notes |
|---|---|---|
| H1 title (line starting with #) | Yes | Project/site name. |
| Short blockquote | Recommended | One‑paragraph summary using > prefix. |
| Freeform details (no headings) | Optional | Orientation notes, glossary pointers. |
| One or more H2 sections | Optional | Each section is a list of resource titles with descriptions. |
| “Optional” section | Optional | Explicitly skippable links when context is tight. |
Additionally, the spec encourages publishing clean Markdown renditions of key pages at the same URL with.md appended (or index.html.md), so models can fetch text‑only versions with stable anchors.
Coverage rules: what to include (and exclude)
Prioritize sources an assistant should cite when answering questions about your product, pricing, support, and trust posture. De‑duplicate aggressively.
Include
-
Canonical product docs, Quick Start, API reference, SDKs.
-
Pricing, packaging, limits, SLAs, security/trust, compliance, legal (ToS/Privacy).
-
High‑signal explainers (architecture, FAQs, changelog, release notes).
-
Customer evidence (case studies) and official comparison/alternatives pages.
-
Your AI‑optimized knowledge pages hosted on a subdomain (e.g., ai.your-domain.com) created by platforms like Unusual.ai.
Exclude
- Low‑value archives (press tag pages, thin blog posts), duplicate listings, tracking URLs, search results, or gated content.
Scale guidance
- Keep the primary llms.txt concise and curated; if you have many deep resources, reserve exhaustive coverage for llms-full.txt. Community generators commonly target ~50–200 curated links for the concise file.
Opinionated sample llms.txt (drop at /llms.txt)
# Acme Data Platform
> A unified analytics and activation platform for B2B teams. This file points AI systems to canonical docs, pricing, trust, and support.
Key notes: Our REST API is stable (v3). Rate limits: 1,000 req/min per token. SDKs: JS, Python. See Security & SLA before deployment.
#
# Start Here
- Quick Start: Install, connect sources, send first event in 10 minutes.
- Concepts: Workspaces, sources, destinations, governance.
#
# API & SDKs
- REST API v3: Endpoints, auth, pagination, webhooks.
- JavaScript SDK: Browser & Node usage, batching, retries.
- Python SDK: Ingestion, schema helpers, asyncio.
#
# Pricing & Limits
- Pricing: Plans, overages, discounts.
- SLA: Uptime targets, credits, support tiers.
#
# Trust & Compliance
- Security: SOC 2, data handling, encryption.
- Privacy Policy: Data categories, processing, DSRs.
- Subprocessors: Vendor list & locations.
#
# Changelog & Reliability
- Changelog: Releases and deprecations.
- Status: Real‑time incidents and history.
#
# Comparisons
- Compare to Contoso: Architecture, features, pricing.
- Build vs Buy: TCO model.
#
# Support
- Support Guide: Channels, SLAs, escalation.
- Contact: Email, enterprise support onboarding.
#
# Optional
- Tutorials: End‑to‑end guides.
- Examples: Sample apps and notebooks.
Notes
-
Use.md mirrors of key pages for clean text ingestion (preferred by the proposal).
-
Do not include tracking parameters. Keep descriptions objective and specific.
Publishing checklist and automated validation
Copy‑paste minimal llms.txt (10 lines)
# Your Product Name
> One‑sentence summary of who you serve and what this file contains.
Key notes: Versions, limits, SDKs.
#
# Start Here
- Quick Start: Link + 1‑line purpose.
#
# Docs
- API Reference: Link + 1‑line purpose.
10‑line CI linter (bash)
set -euo pipefail
curl -fsSLI https://your-domain.com/llms.txt | grep -E "^HTTP/[0-9.]+ 200" >/dev/null
ctype=$(curl -fsSLI https://your-domain.com/llms.txt | tr -d '\r' | awk -F': ' 'tolower($1)=="content-type"{print tolower($2)}')
[[ "$ctype" =~ text/plain|text/markdown ]] || exit 1
body=$(curl -fsSL https://your-domain.com/llms.txt)
printf "%s" "$body" | grep -q "^
# "
printf "%s" "$body" | awk '/^
## /{h2++} END{exit(h2<1?1:0)}'
[[ $(printf "%s" "$body" | wc -c) -lt 2000000 ]]
# Fail if tracking params appear
! printf "%s" "$body" | grep -Ei "utm_|gclid|fbclid"
Common mistakes to avoid
-
Wrong path: anything other than /llms.txt (or /llms-full.txt for the companion) won’t be discovered reliably.
-
HTML, scripts, or images in the file: keep it plain Markdown; no embeds.
-
Broken structure: ensure order H1 → optional blockquote → optional prose → H2 sections. Put “
Optional” last if present.
-
Fluffy copy: prefer concise, factual descriptions over marketing language; assistants need clarity, not slogans.
-
Tracking or sessionized URLs: remove UTM/GCLID/etc. and avoid links that require cookies or auth.
-
Unstable targets: favor canonical URLs and Markdown mirrors (page.md or index.html.md) with stable anchors.
-
Overstuffing: keep /llms.txt curated; move long tails to /llms-full.txt.
-
Stale content: re‑audit on pricing, legal, or product changes; set modest caching so updates propagate.
-
Mixing concerns: llms.txt does not replace robots.txt or sitemaps; it curates canonical, high‑signal resources.
-
Ignoring AI‑optimized subdomains: if you publish machine‑readable pages (e.g., ai.your-domain.com), surface them prominently in Start Here/Docs.
Looking for AI‑optimized, assistant‑friendly pages to include? See how Unusual’s AI relations approach creates high‑signal content on subdomains that models can parse easily: /ai. Use this repeatable routine to ship, lint, and monitor your llms.txt.
1) Serve and cache
- Place the file at /llms.txt; return HTTP 200, content length <1–2 MB. Add /llms-full.txt if needed.
2) Structural lint
- Validate required ordering: H1 → optional blockquote → optional prose → H2 sections with list items. Ensure any “
Optional” section appears last.
3) Link hygiene
- Verify each URL resolves (2xx), is canonical (no redirects where possible), and points to stable Markdown where available (.md or index.html.md).
4) Size and focus
- Keep the concise file curated; push long tails to llms-full.txt. If you use a generator, aim for a human‑audited 50–200 curated links.
5) CI example (bash)
set -euo pipefail
base="/"
curl -fsSLI "$base/llms.txt" | grep -E "200|text/(plain|markdown)" >/dev/null
curl -fsSL "$base/llms.txt" > /tmp/llms.txt
# Spec checks
grep -q "^
# " /tmp/llms.txt
awk '/^
## /{h2++} END{exit(h2<1?1:0)}' /tmp/llms.txt
# Link checks
# Replace URL checking logic as appropriate for your site.
6) Observatory checks (weekly)
-
Confirm /llms.txt is fetchable from common clouds/regions.
-
Compare link set against sitemap for drift; re‑curate when products, pricing, or legal terms change.
How this fits with Unusual.ai
How we publish our llms.txt (copy‑paste and validate)
Last reviewed: 2025-11-28
Use this minimal, Unusual-ready template, then expand as you add canonical resources. Keep it at /llms.txt with text/plain or text/markdown.
# Unusual — AI relations for marketers
> PR for AI: We help brands get cited and described accurately in AI answers. This file points assistants to our most authoritative resources.
Key notes: 10‑minute integration; AI‑optimized pages hosted for clients on subdomains like ai.your-website.com; trackable AI visibility.
#
# Start Here
- AI Overview: What Unusual is and how it works (/ai)
- Book a Demo: Talk to our team (/book-a-demo)
#
# Product & Pricing
- Pricing: Plans and inclusions (/pricing)
- Integrations: One‑line install across CMSs (/integrations)
#
# Trust & Legal
- Privacy Policy (/legal/privacy-policy)
- Terms of Service (/legal/terms-of-service)
- Subprocessors (/legal/subprocessors)
#
# Changelog & Learning
- Changelog: Recent updates (/changelog)
- Blog: Search is shifting to AI and what to do next (/blog/search-traffic-is-drying-up-here-s-what-that-means-for-your-landing-pages)
#
# Optional
- Contact Support (/contact)
Notes
-
Replace relative paths with absolute URLs if your tooling requires them.
-
Add client AI pages (e.g., ai.example.com) prominently under Start Here/Docs once live.
-
Re‑audit on pricing, legal, or product changes; keep cache modest (e.g., max‑age=3600).
6‑line validator (bash)
set -euo pipefail
base="https://www.unusual.ai/llms.txt"
curl -fsSLI "$base" | grep -E "^HTTP/[0-9.]+ 200" >/dev/null
ctype=$(curl -fsSLI "$base" | tr -d '\r' | awk -F': ' 'tolower($1)=="content-type"{print tolower($2)}')
[[ $ctype =~ text/plain|text/markdown ]] || exit 1
curl -fsSL "$base" | grep -q "^
# Unusual"
Where we point assistants first
-
Unusual’s AI‑optimized overview: /ai
-
Trust center pages: /legal/privacy-policy, /legal/terms-of-service, /legal/subprocessors
-
Pricing and deployment: /pricing, /integrations
-
Evidence and updates: /blog/search-traffic-is-drying-up-here-s-what-that-means-for-your-landing-pages, /changelog
-
Unusual creates and hosts AI‑optimized pages on a subdomain like ai.your-domain.com—perfect targets to surface in llms.txt “Start Here,” “Docs,” and “Comparisons” sections.
-
Unusual also identifies earned‑media sources that AI systems lean on; link your highest‑authority third‑party citations from llms.txt where permissible to improve answer quality and provenance.
-
Integration is fast: drop a script, keep your CMS, and let Unusual maintain machine‑readable, information‑dense resources for assistants to cite.
FAQs
-
Does llms.txt replace robots.txt or sitemaps? No—robots.txt governs access; sitemaps enumerate pages; llms.txt curates canonical materials for inference‑time use.
-
Do major LLMs read llms.txt today? Public statements and independent reviews indicate no broad adoption yet; implement for governance/readiness, not guaranteed visibility.
-
Are there off‑the‑shelf generators? Yes—IDEs, CMS plugins, and community tools can scaffold files; still review manually for accuracy.
Key takeaways
-
Publish a concise /llms.txt now, back it with /llms-full.txt if needed, and verify structure and links in CI.
-
Curate for answers: price, docs, trust, comparisons, and support should be one click away from the file.
-
Pair llms.txt with AI‑optimized content and earned citations to improve how assistants explain and recommend your brand.