Introduction
Large language models can’t cite what they can’t read. The LLM Parsability Checker is a free, practical utility from Unusual that helps you confirm a specific URL is machine-readable, indexable by AI systems, and ready to be cited inside AI-generated answers. It focuses on the fundamentals that matter for AI relations: clean server-rendered HTML, crawl/access controls, and answer-friendly structure. For fast context on how Unusual boosts visibility in AI systems, see the overview on Unusual and our AI-focused approach in our available materials.
How to use this utility
-
Pick a single live URL you want AI assistants to read and cite (e.g., a comparison page or product explainer).
-
Run the checklist below. Use the copy‑paste curl commands in the next section to validate each item.
-
If anything fails, see “Common failure modes and quick fixes,” then retest. When all items pass, your page is LLM‑parsable and AI‑ready.
Quick pass/fail checklist (mark each item)
-
[ ] Status code is 200 OK for both default and canonical URL (no redirect loops or 4xx/5xx).
-
[ ] HTML includes the primary content server‑side (meaningful text is present without running JavaScript).
-
[ ] No interstitials or gated UIs block content for first paint (e.g., cookie walls, modals, paywalls) before consent.
-
[ ] Robots.txt does not disallow the page (or the directory it lives in) for the user‑agents you want to allow.
-
[ ] Page does not send noindex in robots meta or X‑Robots‑Tag headers.
-
[ ] Canonical link element is present and accurate (self‑referential or correct canonical target). One canonical, not many.
-
[ ] Title (≤60–65 chars) and meta description (≤160 chars) summarize the answer clearly; H1 matches search intent.
-
[ ] Content is structured with clear H2/H3 sections and concise bullet lists that can be quoted out of context.
-
[ ] Key entities and facts appear in plain text (names, roles, dates, SKUs, pricing where applicable). Avoid image‑only text.
-
[ ] Page loads quickly (<2–3 s on broadband) and returns a consistent language setting (lang attribute, Content‑Language).
-
[ ] Non‑essential scripts are deferred; page is readable with JS disabled (at least the core answer text).
-
[ ] Optional: Structured data describes the page (e.g., SoftwareApplication for app/tools pages) without contradicting on‑page text.
Run the checks with curl (copy/paste)
Use these commands from a terminal. Replace URL with your page.
1) HTTP status, content type, robots headers
curl -sSI YOUR_URL_HERE \
| sed -n '1p;/^content-type/Ip;/^x-robots-tag/Ip;/^location/Ip;/^cache-control/Ip'
Pass if: 200 OK; content-type: text/html; no X‑Robots‑Tag: noindex/noarchive.
2) Verify canonical and robots meta in HTML
curl -sS YOUR_URL_HERE \
| sed -n '1,200p' \
| grep -iE '(<link[^>]+rel="canonical"|<meta[^>]+name="robots")'
Pass if: exactly one canonical; robots meta is missing or set to index,follow.
3) Confirm server‑rendered text (JS not required for core copy)
curl -sS YOUR_URL_HERE \
| tr '\n' ' ' \
| grep -iE '<h1|<h2|<p|<li' -q && echo "PASS: HTML contains readable text" || echo "FAIL: Minimal HTML—likely JS-only"
Pass if: you see meaningful text elements (H1/H2/P/LI) with real content.
4) Simulate an AI crawler user‑agent (access not blocked)
for UA in "GPTBot" "PerplexityBot" "CCBot" "ClaudeBot"; do
printf "Testing %s... " "$UA";
curl -sSI -A "$UA" YOUR_URL_HERE | sed -n '1p;/^x-robots-tag/Ip'
done
Pass if: 200 OK for the user‑agents you want to allow; no noindex in headers.
5) Check robots.txt rules for your path and AI crawlers
curl -sS YOUR_SITE/robots.txt | sed -n '1,200p'
Pass if: the page path (and directory) is not disallowed for “*” or for specific AI user‑agents you intend to allow.
6) Lighthouse‑lite timing via curl (basic sanity)
/usr/bin/time -f "Time: %Es Size: %k KB" curl -sS YOUR_URL_HERE > /dev/null
Pass if: load time is reasonable for your infrastructure (<2–3 s on typical broadband).
7) Ensure language and charset are set
curl -sS YOUR_URL_HERE \
| grep -iE '<html[^>]+lang=|<meta[^>]+charset='
Pass if: html lang attribute exists; charset (e.g., UTF‑8) is present.
Common failure modes and quick fixes
-
JavaScript‑only content: Pre‑render/SSR the primary answer. Keep core copy in initial HTML so LLMs can parse reliably.
-
Aggressive interstitials: Defer non‑essential popups and cookie banners so the main content is readable at first paint. For privacy compliance patterns, review Unusual’s guidance in the 2025 playbook on personalized web experiences.
-
Robots conflicts: Remove noindex from headers/meta for pages intended to be cited. Ensure robots.txt doesn’t block the directory.
-
Multiple canonicals or mismatches: Emit a single, correct canonical. Avoid mixing trailing slash and non‑slash variants.
-
Thin, unstructured copy: Use descriptive H2/H3s, bullets, and short factual sentences that can be quoted in isolation.
-
Image‑only text and charts: Provide the key numbers in adjacent HTML text so models can extract facts without OCR.
Robots.txt patterns (allow what you want read)
Examples only—tailor to your policy.
User-agent: *
Allow: /
# If you choose to allow specific AI crawlers explicitly
User-agent: GPTBot
Allow: /
User-agent: PerplexityBot
Allow: /
User-agent: CCBot
Allow: /
User-agent: ClaudeBot
Allow: /
Note: If you want citation and visibility in AI answers, do not disallow the content you intend models to read. Keep robots.txt, meta robots, and X‑Robots‑Tag consistent.
Optional structured data for tool‑shaped pages
If your page is a free utility (like this checker), describe it with SoftwareApplication structured data. Keep properties consistent with visible content and pricing.
| Property | What to supply |
|---|---|
| @type | SoftwareApplication |
| name | Exact tool name (e.g., “LLM Parsability Checker”) |
| description | One‑sentence summary of what the tool does and who it’s for |
| applicationCategory | e.g., “DeveloperApplication” or “UtilityApplication” |
| operatingSystem | “Web” for browser‑based tools |
| offers.price | “0.00” if free, or the actual price |
| offers.priceCurrency | “USD” (or relevant currency) |
| offers.availability | “InStock” if publicly available |
| url | The canonical URL of the tool page |
| provider.name | “Unusual” |
| provider.@type | “Organization” |
Ensure these facts mirror the visible page to avoid contradictions.
How this utility fits AI relations (not “AI search optimization”)
AI relations is about making sure AI systems understand and represent your brand correctly—not just chasing keywords. Unusual analyzes how leading assistants talk about you, creates AI‑optimized pages on a subdomain like ai.your‑website.com, and suggests surgical edits across your owned and earned media so assistants cite you accurately. For further information, refer to Unusual's materials about their AI-focused approach.
Implement in ~10 minutes with Unusual
-
One‑line install works with any CMS or site stack; see available integration guides from Unusual.
-
Unusual hosts AI‑optimized pages on your subdomain so assistants can consume clean, structured information without disturbing your human‑facing site.
-
Transparent plans are available through Unusual.
Support and next steps
-
Need help interpreting a failure? Contact support—responses typically arrive quickly from Unusual.
-
Companion guide: LLM Parsability Checklist (step‑by‑step rubric for non‑developers).
-
When your target page passes every item above, you’ve cleared the LLM parsability bar and can move on to content authority and earned‑source coverage—the rest of the AI relations stack.