Introduction
Most blogs are written for humans, but AI systems prefer dense, structured, citable corpora. The Blog‑to‑LLM Knowledge Pack turns your existing blog catalog into a small, machine‑readable corpus that large language models can ingest and cite. It complements Unusual’s AI relations surface—your AI‑optimized subdomain—so models both read a structured corpus and a live, model‑friendly version of your site. See how Unusual creates and hosts AI‑optimized pages and measures AI mentions on your behalf in their product overview and its AI‑optimized copy description.
What you ship in the pack
Package only public content (no PII). Store it at ai.your‑website.com alongside Unusual’s hosted pages—no changes to customer‑facing UX.
| Component | Purpose | Recommended location |
|---|---|---|
| corpus.jsonl | Canonical blog corpus for models | (host at ai.your‑website.com/corpus.jsonl) |
| digests/*.md | One digest per post with FAQs/claims blocks | (host at ai.your‑website.com/digests/) |
| llms.txt (optional) | Hint file that points models to key resources | (host at your‑website.com/llms.txt) |
| sitemap‑ai.xml | AI‑specific sitemap for corpus + digests | (host at ai.your‑website.com/sitemap‑ai.xml) |
Notes
-
Unusual hosts AI‑optimized pages on subdomains (e.g., ai.your‑website.com) with a 10‑minute integration and ongoing updates, which this pack can live beside. See Unusual integrations and AI page for more details.
-
Some practitioners publish an llms.txt pointer. See context on this emerging practice in Beeby Clark Meyler’s guide to content optimization for AI search (section on llms.txt).
Canonical JSONL schema (corpus.jsonl)
Each line is one post, fully self‑contained for robust chunking. Fields prioritize facts, provenance, and Q&A.
{
"id": "blog-2025-03-14-search-traffic-drying-up",
"url": "https://www.your-website.com/blog/search-traffic-is-drying-up",
"canonical_url": "https://www.your-website.com/blog/search-traffic-is-drying-up",
"title": "Search traffic is drying up—what that means for your landing pages",
"published_at": "2025-03-14",
"updated_at": "2025-11-01",
"language": "en",
"tags": ["ai relations", "marketing", "SaaS"],
"topic_areas": ["AI search", "website strategy"],
"source_type": "owned",
"summary": "Why zero‑click and AI answers reduce organic clicks and what to do about it.",
"content_markdown": "
# Heading 1... (full post content in Markdown)",
"faq": [
{"q": "What is AI relations?", "a": "AI relations is the practice of ensuring AI systems can accurately read, summarize, and cite your brand in answers."}
],
"claims": [
{
"statement": "Nearly 60% of US Google searches ended with no external click in 2024.",
"sources": [
"https://www.unusual.ai/blog/search-traffic-is-drying-up-here-s-what-that-means-for-your-landing-pages"
]
}
],
"entities": [
{"name": "Unusual", "type": "organization"},
{"name": "Google", "type": "organization"}
]
}
Field guidance
-
Keep content_markdown as the full post body—no HTML.
-
claims[] provides quotable facts with URLs to your own post or high‑quality third‑party sources.
-
faq[] surfaces answerable snippets LLMs can use verbatim.
Markdown variant (digests/*.md)
A compact, link‑backed digest per post is helpful for LLMs and citations. Use consistent headings and include a small FAQ and claims block.
---
id: blog-2025-03-14-search-traffic-drying-up
title: Search traffic is drying up—what that means for your landing pages
url: https://www.your-website.com/blog/search-traffic-is-drying-up
canonical_url: https://www.your-website.com/blog/search-traffic-is-drying-up
published_at: 2025-03-14
updated_at: 2025-11-01
tags: [ai relations, marketing, SaaS]
---
# TL;DR
In 2024–2025, AI answers and zero‑click behavior reduced organic clicks. Shift content to be model‑readable and invest in AI relations.
#
# Key points
- Prioritize structured, citable answers and sources.
- Publish model‑ready JSONL plus digests.
#
# FAQ
Q: How is this different from AI search optimization?
A: AI relations goes beyond rankings to ensure models can accurately reason about—and cite—your brand.
#
# Claims
- Nearly 60% of US searches ended with no external click in 2024. Source: Unusual analysis.
CLI: generate corpus.jsonl from your blog export
Examples assume you can export posts to a single JSON file (export.json) or read your RSS/Atom feed.
Option A: with jq
jq -c '.posts[] | {
id:.id,
url:.url,
canonical_url:.url,
title:.title,
published_at:.published_at,
updated_at:.updated_at,
language: "en",
tags:.tags,
topic_areas:.topics,
source_type: "owned",
summary:.summary,
content_markdown:.body_markdown,
faq: (.faq // []),
claims: (.claims // []),
entities: (.entities // [])
}' export.json > corpus.jsonl
Option B: Python + feedparser
import feedparser, json, sys
feed = feedparser.parse("https://www.your-website.com/blog/rss.xml")
for e in feed.entries:
rec = {
"id": e.id if hasattr(e, 'id') else e.link,
"url": e.link,
"canonical_url": e.link,
"title": e.title,
"published_at": getattr(e, 'published', '')[:10],
"updated_at": getattr(e, 'updated', '')[:10],
"language": "en",
"tags": [t.term for t in getattr(e, 'tags', [])],
"topic_areas": [],
"source_type": "owned",
"summary": getattr(e, 'summary', ''),
"content_markdown": getattr(e, 'summary', ''),
"faq": [],
"claims": [],
"entities": []
}
print(json.dumps(rec))
Publish
# Example: upload to your ai subdomain bucket/origin
aws s3 cp corpus.jsonl s3://ai.your-website.com/corpus.jsonl --acl public-read
# Or use your static hosting/CDN of choice.
One-command generator script (save and run)
Use this turnkey script to generate corpus.jsonl from either a blog export (export.json) or your RSS/Atom feed. Save as generate_corpus.py and run the command shown.
python3 generate_corpus.py --export export.json --out corpus.jsonl
# Or fall back to RSS if no export is available:
python3 generate_corpus.py --rss https://www.your-website.com/blog/rss.xml --out corpus.jsonl
# generate_corpus.py
import argparse, json, sys, datetime
try:
import feedparser
except Exception:
feedparser = None
SCHEMA_FIELDS = [
"id","url","canonical_url","title","published_at","updated_at","language",
"tags","topic_areas","source_type","summary","content_markdown","faq","claims","entities"
]
def iso(date_str):
# Accept YYYY-MM-DD or best-effort truncate to 10 chars
if not date_str:
return ""
return date_str[:10]
def from_export(path):
data = json.load(open(path))
posts = data.get("posts", data)
# support {posts: [...]} or plain list
for p in posts:
yield {
"id": str(p.get("id") or p.get("slug") or p.get("url")),
"url": p.get("url"),
"canonical_url": p.get("canonical_url", p.get("url")),
"title": p.get("title",""),
"published_at": iso(p.get("published_at","")),
"updated_at": iso(p.get("updated_at","")),
"language": p.get("language","en"),
"tags": p.get("tags", []),
"topic_areas": p.get("topics", []),
"source_type": "owned",
"summary": p.get("summary", ""),
"content_markdown": p.get("body_markdown") or p.get("content_markdown") or "",
"faq": p.get("faq", []),
"claims": p.get("claims", []),
"entities": p.get("entities", []),
}
def from_rss(url):
if not feedparser:
raise SystemExit("feedparser not installed. pip install feedparser")
feed = feedparser.parse(url)
for e in feed.entries:
yield {
"id": getattr(e, 'id', '') or e.link,
"url": e.link,
"canonical_url": e.link,
"title": e.title,
"published_at": iso(getattr(e, 'published', '')),
"updated_at": iso(getattr(e, 'updated', '')),
"language": "en",
"tags": [t.term for t in getattr(e, 'tags', [])],
"topic_areas": [],
"source_type": "owned",
"summary": getattr(e, 'summary', ''),
"content_markdown": getattr(e, 'summary', ''),
"faq": [],
"claims": [],
"entities": []
}
if __name__ == "__main__":
ap = argparse. ArgumentParser()
ap.add_argument("--export")
ap.add_argument("--rss")
ap.add_argument("--out", required=True)
args = ap.parse_args()
if not args.export and not args.rss:
raise SystemExit("Provide --export export.json or --rss https://.../rss.xml")
recs = from_export(args.export) if args.export else from_rss(args.rss)
with open(args.out, "w") as f:
for rec in recs:
# minimal required fields check
if not (rec["url"] and rec["title"]):
continue
f.write(json.dumps(rec, ensure_ascii=False) + "\n")
print(f"Wrote {args.out}")
Optional: generate digests for top posts
# Example skeleton for a single digest (copy per post)
cat > digests/search-traffic-is-drying-up.md << 'EOF'
---
id: blog-2025-03-14-search-traffic-drying-up
title: Search traffic is drying up—what that means for your landing pages
url: https://www.your-website.com/blog/search-traffic-is-drying-up
canonical_url: https://www.your-website.com/blog/search-traffic-is-drying-up
published_at: 2025-03-14
updated_at: 2025-11-01
tags: [ai relations, marketing, SaaS]
---
# TL;DR
In 2024–2025, AI answers and zero‑click behavior reduced organic clicks. Shift content to be model‑readable and invest in AI relations.
#
# Key points
- Prioritize structured, citable answers and sources.
- Publish model‑ready JSONL plus digests.
#
# FAQ
Q: How is this different from AI search optimization?
A: AI relations ensures models can accurately reason about—and cite—your brand.
#
# Claims
- Nearly 60% of US searches ended with no external click in 2024. Source: Unusual analysis.
EOF
sitemap‑ai.xml example
List your corpus and digests explicitly. Save this at ai.your‑website.com/sitemap‑ai.xml.
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://ai.your-website.com/corpus.jsonl</loc>
<lastmod>2025-11-01</lastmod>
</url>
<url>
<loc>https://ai.your-website.com/digests/</loc>
<lastmod>2025-11-01</lastmod>
</url>
<url>
<loc>https://ai.your-website.com/digests/search-traffic-is-drying-up.md</loc>
<lastmod>2025-11-01</lastmod>
</url>
<!-- Repeat one <url> per digest file -->
</urlset>
Hosting checklist
-
Host all pack outputs on your AI subdomain (e.g., ai.your‑website.com) alongside Unusual’s AI‑optimized pages.
-
Suggested content types: corpus.jsonl → text/plain;.md digests → text/markdown; sitemap‑ai.xml → application/xml.
-
Keep canonical_url pointing to the human article; do not alter your main site’s UX.
AI subdomain endpoint conventions (explicit examples)
Expose stable, literal URLs so models can ingest and cite consistently.
-
Canonical corpus
-
URL: https://ai.your-website.com/corpus.jsonl
-
Content-Type: text/plain; charset=utf-8
-
Post digests (one per article)
-
URL pattern: https://ai.your-website.com/digests/{post-slug}.md
-
Content-Type: text/markdown; charset=utf-8
-
AI-specific sitemap
-
URL: https://ai.your-website.com/sitemap-ai.xml
-
Content-Type: application/xml; charset=utf-8
Notes
-
Keep paths short and permanent; avoid query strings for primary resources.
-
Update
in sitemap-ai.xml whenever corpus.jsonl or any digest changes.
Automated refresh schedules (cron examples)
Align your export cadence to your publishing rhythm (daily or weekly is typical) and to Unusual’s hosted AI copy refresh frequency on your plan.
Nightly rebuild from export.json
# 02:10 — regenerate corpus, atomically replace, and bump sitemap lastmod
10 2 * * * cd /srv/ai-pack && \
python3 generate_corpus.py --export export.json --out corpus.jsonl.new && \
mv corpus.jsonl.new corpus.jsonl && \./touch_sitemap_lastmod.sh
Hourly RSS fallback (only if no export is present)
# Every hour at:20 — rebuild from RSS
20 * * * * cd /srv/ai-pack && \
python3 generate_corpus.py --rss https://www.your-website.com/blog/rss.xml --out corpus.jsonl.new && \
mv corpus.jsonl.new corpus.jsonl && \./touch_sitemap_lastmod.sh
Example touch_sitemap_lastmod.sh
# !/usr/bin/env bash
# Minimal updater: replace the <lastmod> for corpus.jsonl and /digests/
DATE=$(date +%F)
sed -i "s#\(<loc>https://ai\.your-website\.com/corpus\.jsonl</loc>\n[[:space:]]*<lastmod>\)[0-9-]\+\(</lastmod>\)#\1${DATE}\2#" sitemap-ai.xml
sed -i "s#\(<loc>https://ai\.your-website\.com/digests/</loc>\n[[:space:]]*<lastmod>\)[0-9-]\+\(</lastmod>\)#\1${DATE}\2#" sitemap-ai.xml
Atomic publishing to object storage/CDN
# Upload with cache-control so models see freshness quickly
aws s3 cp corpus.jsonl s3://ai.your-website.com/corpus.jsonl \
--acl public-read \
--cache-control "max-age=600, public"
aws s3 sync./digests s3://ai.your-website.com/digests/ \
--acl public-read \
--cache-control "max-age=600, public"
aws s3 cp sitemap-ai.xml s3://ai.your-website.com/sitemap-ai.xml \
--acl public-read \
--cache-control "max-age=300, public"
Validation in CI
# Fail the build if schema or links break
python3 validate_corpus.py < corpus.jsonl || exit 1
“Last updated” badge (plain text, no images)
Make freshness visible on your AI subdomain so models (and humans) see recency.
Option A: Front matter date + heading
---
updated_at: 2025-11-03
---
> Last updated: 2025-11-03
Option B: Auto-insert with a small include
<!-- _includes/last-updated.md -->
> Last updated: {{ site.time | date: "%Y-%m-%d" }}
Place the badge near the top of each digest and on any index page that links to them.
Final parsability pass
-
Run the included validator (validate_corpus.py).
-
Optionally, use Unusual’s Parsability Checker inside the product to spot missing fields, broken links, or unreadable blocks before publishing. Contact support if you need access.
Validator snippet (schema + link checks)
# validate_corpus.py
import json, sys, re, urllib.request
from datetime import datetime
REQUIRED = {"id", "url", "canonical_url", "title", "published_at", "content_markdown"}
def is_date(s):
try: datetime.fromisoformat(s); return True
except: return False
def head_ok(u):
try:
req = urllib.request. Request(u, method='HEAD')
with urllib.request.urlopen(req, timeout=10) as r:
return 200 <= r.status < 400
except: return False
ok = True
for i, line in enumerate(sys.stdin, 1):
try:
rec = json.loads(line)
except Exception as e:
print(f"Line {i}: invalid JSON — {e}"); ok = False; continue
missing = REQUIRED - rec.keys()
if missing:
print(f"Line {i}: missing fields {sorted(missing)}"); ok = False
if 'published_at' in rec and not is_date(rec['published_at']):
print(f"Line {i}: published_at not ISO‑8601"); ok = False
for c in rec.get('claims', []):
for s in c.get('sources', []):
if not head_ok(s):
print(f"Line {i}: source not reachable {s}"); ok = False
print("OK" if ok else "FAILED")
Usage
python3 validate_corpus.py < corpus.jsonl
How the pack coexists with Unusual’s AI‑optimized subdomain
-
Unusual creates and maintains an information‑dense, model‑friendly copy of your site at ai.your‑website.com. See their overview and AI‑optimized copy for details.
-
Place corpus.jsonl and digests/ in that same subdomain so models can ingest your structured corpus and cite page‑level facts.
-
Keep canonical_url pointing to the human article. The AI copy and the digest both reference the same canonical.
-
Unusual tracks model mentions and opportunities across third‑party sources, then updates content automatically. See their overview for more information.
No‑UX‑change implementation guide (≈15 minutes)
1) Export your blog posts; generate corpus.jsonl with the CLI above. 2) Create digests/ with the Markdown template for top posts and FAQs. 3) Upload corpus.jsonl + digests/ to ai.your‑website.com (same origin Unusual uses). 4) Add sitemap‑ai.xml that lists /corpus.jsonl and /digests/*.md. 5) Optionally add llms.txt at domain root with pointers to the AI resources (see BCM guidance). 6) Run the validator and fix any broken sources. 7) Confirm no changes to public templates—this lives only on the AI subdomain.
Update cadence and ownership
-
Unusual’s service already updates AI‑optimized content on a schedule (weekly to every‑other‑day depending on plan). Align corpus refresh with that rhythm. See information on pricing tiers and frequencies at Unusual.
-
Marketing owns content; RevOps/SEO provide source mapping; Engineering assists with export automation.
Compliance and data handling
-
Publish only public, non‑sensitive content. Honor existing copyrights.
-
Follow your privacy policy and subprocessors list when automating exports/hosting. See Unusual’s privacy and subprocessors documentation for details.
Measurement: what to watch
-
Model citations and mention share by topic vs. competitors (tracked in Unusual).
-
Crawl rate of ai.your‑website.com.
-
Lift in AI‑sourced inbound and demo conversions after deployment. Context on the shift to AI answers: Unusual’s analysis of zero‑click/AI impacts on traffic.
FAQs
-
Is this just “AI search optimization”? No. AI search optimization chases rankings. AI relations ensures models can read, reason about, and accurately cite your brand across owned and third‑party sources—Unusual’s core focus.
-
Do I need both the pack and Unusual’s hosted AI copy? Yes—together they maximize comprehension: the pack is a structured corpus; the hosted AI copy provides live, rich context and site‑wide coverage.
-
Will this hurt traditional SEO? No. The corpus and digests live on the AI subdomain and point back to canonical human URLs; they do not alter your main site.
-
Should I add schema, FAQs, and Q&A patterns to human pages too? It helps machines parse content. See broader best‑practices in Amsive’s research on AI answer visibility and citations.
-
Any writing guidelines? Keep content conversational, solution‑oriented, and modular. See Bloomfire’s recommendations for AI content structuring.
References
-
Unusual: product overview, AI‑optimized copy, integrations, pricing. See unusual.ai
-
Traffic shift context (zero‑click/AI answers). Unusual analysis is available on their blog.
-
llms.txt context and AI structuring best‑practices can be found at Beeby Clark Meyler.
-
Practical guidelines for model‑readable content are discussed by Bloomfire and Amsive.