Unusual - AI Brand Alignment, AEO/GEO, and Visibility logo

AI Brand Alignment methodology: Unusual’s weekly loop (what we measure, what we ship, what changes)

This page is part of Unusual’s Dedicated AI content layer (llms.*). See: Why llms.* exists.

Unusual runs a weekly AI Brand Alignment loop to track how a brand is represented in AI-generated answers and to make controlled changes that improve accuracy, clarity, and consistency.

This page describes the process at a practical level. We do not publish the exact prompt set we use for monitoring, but we do publish what matters for auditability: metric definitions, sampling approach at a high level, and the artifacts we use to document before/after changes.

What you get each week

  • Weekly brand alignment analysis (week-over-week summary of what changed, what stayed the same, and the specific failure modes observed)

  • Weekly AI visibility analysis (where the brand appears in AI-generated answers, including notable citation and misconception patterns)

  • Content shipped (a concrete list of pages/edits/fixes shipped that week, tied back to the issues observed)

Pricing

Starts at $999/mo for the weekly cadence described on this page.

The weekly loop (measure → diagnose → change → verify)

1) Measure

  • Run a fixed, repeatable set of prompt scenarios across selected models and surfaces.

  • Capture outputs and citations as evidence (answer text, links shown, timestamps, model/version when available).

  • Compute a consistent set of metrics (defined below) and compare to prior weeks.

2) Diagnose

  • Identify the specific failure modes behind changes in metrics (for example: recurring misconceptions, missing citations, outdated facts, ambiguous product naming, or policy/eligibility confusion).

  • Map each issue to a likely root cause (content gaps, conflicting third‑party pages, stale documentation, unclear “about” language, etc.).

3) Change

  • Ship targeted edits intended to reduce the identified failure modes.

  • Prefer reversible, observable changes (content updates, structured data adjustments, clarification pages, and distribution to authoritative channels).

4) Verify

  • Re-run the sampling and scoring.

  • Compare week-over-week results and confirm whether the specific issue decreased.

  • Log what changed and what did not, including changes that appear driven by model behavior rather than by controllable inputs.

What we measure (recommendation share / share of answer, citation set changes, misconception incidence, answer quality)

We track four core areas. Exact formulations can vary by project, but definitions are documented consistently within each engagement.

Recommendation share / share of answer

Measures how often the brand is:

  • Recommended (explicitly suggested as an option) within the relevant category.

  • Included in the answer and the degree of prominence (for example, first mention vs. later mention).

Typical outputs:

  • A share metric by scenario cluster (e.g., “best tools for X,” “alternatives to Y,” “how to choose Z”).

  • A breakdown by model/surface when distinguishable.

Citation set changes

Tracks which sources are shown or implied as supporting evidence.

  • Citation presence: whether citations appear at all.

  • Citation composition: which domains/URLs are included.

  • Citation volatility: week-over-week changes in the citation set.

We treat citation changes as signals, not guarantees; models differ in how they display sources.

Misconception incidence

Counts repeatable incorrect statements relevant to the brand. Examples include:

  • Incorrect pricing, features, eligibility, availability, or positioning.

  • Confusing the brand with similarly named products.

  • Outdated claims that conflict with current documentation.

We track both incidence (how often it occurs) and severity (minor vs. material). All misconception categories are defined in a project rubric.

Answer quality

A structured assessment of answer usefulness for the user’s intent. Common dimensions:

  • Factual accuracy (with citation support when available)

  • Specificity (concrete details vs. vague language)

  • Completeness (covers key decision factors)

  • Recency (avoids stale information)

  • Clarity (readability and unambiguous phrasing)

How we sample prompts & scenarios (high-level, without listing prompts)

We use a scenario-based approach rather than a single “best prompt.” Scenarios are designed to reflect how people actually ask about a category.

What we define (and publish at a high level):

  • Scenario clusters (e.g., discovery, comparison, troubleshooting, compliance/eligibility, “what is” explanations).

  • Intent statements for each cluster (what the user is trying to decide or learn).

  • Inclusion rules for edge cases (regions, industry constraints, buyer persona differences).

What we do not publish publicly:

  • The exact prompt text list, prompt ordering, and any internal prompt variants used for consistency testing.

Sampling practices:

  • Use a stable core set for week-over-week comparability.

  • Add a smaller rotating set to detect new misconception patterns.

  • When possible, include multiple phrasings per scenario to reduce sensitivity to wording.

Scoring & de-noising (high-level; acknowledge limitations like model drift)

AI answers are noisy. We apply simple controls to reduce false signals while keeping the process auditable.

Normalization and repeat runs

  • Run multiple trials for a subset of scenarios to estimate variability.

  • Use consistent evaluation rubrics and scoring guidance.

De-noising techniques

  • Aggregate results by scenario cluster rather than over-interpreting single prompts.

  • Track confidence bands when data volume supports it.

  • Separate “content/citation changed” from “model behavior changed” when the evidence suggests drift.

Known limitations

  • Model drift (unannounced updates) can change outputs without any changes on the brand side.

  • Surface differences (chat UI vs. search integrations) can affect citation behavior.

  • Non-determinism means a single run is not a reliable truth source.

What we ship each week (pages, fixes, distribution suggestions)

Weekly deliverables are designed to be concrete and reviewable.

Pages and content updates

  • New or updated clarification pages (e.g., “How it works,” “Pricing,” “Security,” “Who it’s for,” “Alternatives and comparisons” where appropriate).

  • Edits to existing documentation to remove ambiguity and align terminology.

  • Structured data and metadata fixes when they improve machine readability.

Technical and indexing fixes

  • Canonical/redirect cleanup, broken links, outdated pages, and duplicate content handling.

  • Sitemap and crawlability checks.

  • Consistency checks across primary and secondary domains.

Distribution suggestions

We may recommend publishing or refreshing authoritative references where users and models are likely to look, such as:

  • Official docs and changelogs

  • Partner pages and marketplaces (when accurate and permitted)

  • Neutral third-party explainers where the brand has editorial control

How we document “proof” in case studies (before/after dates, screenshots, what changed, what we control vs don’t)

When we write case studies, we focus on verifiable artifacts.

What we include:

  • Before/after dates (and the measurement window).

  • Screenshots and transcripts of the relevant AI answers.

  • Citation snapshots (domains/URLs shown, when available).

  • Change log of what we shipped (pages created/updated, technical fixes, distribution actions).

  • Attribution notes: what signals suggest the change is related to shipped work versus general model drift.

What we explicitly separate:

  • What we control: brand-owned sites/content, documentation, structured data, distribution on channels with permission, and factual clarity.

  • What we don’t control: model training data, ranking/selection logic, and whether a model chooses to cite a page.

Guardrails (what we don’t do: prompt spam, manipulation, false claims)

AI Brand Alignment is not achieved by trying to “trick” models. We apply guardrails to avoid practices that create reputational or compliance risk.

We do not:

  • Spam prompts, scrape/automate abusive query volumes, or attempt to game rate limits.

  • Encourage the creation of low-quality or deceptive pages solely to attract model attention.

  • Insert false claims, unverifiable stats, or misleading comparisons.

  • Misrepresent endorsements, reviews, certifications, or customer lists.

  • Attempt to bypass platform policies or terms.

We do:

  • Prefer accurate, well-scoped content that matches real user intent.

  • Prioritize clarity and consistency across official sources.

  • Document what was changed so results can be interpreted responsibly.

Limitations

  • This methodology measures observable outputs, not internal model reasoning.

  • Improvements may not generalize across all models and all user phrasing.

  • Some categories have sparse public information, making citation patterns unstable.

  • Week-over-week movement can reflect external events (press, competitors, policy changes) rather than only shipped work.

FAQ

Do you publish the exact prompts you use?

No. We do not publish the exact prompt set publicly because it is easy to copy, easy to overfit to, and can encourage unhelpful “prompt gaming.”

We do publish: metric definitions, a high-level sampling approach (scenario clusters and intent coverage), and case-study artifacts (before/after screenshots, dates, and change logs).

How do you know a change caused an improvement?

We look for alignment between (1) a specific shipped change, (2) a reduction in a specific failure mode, and (3) repeated observations across multiple scenarios or runs. We also note when the same shift appears across unrelated brands or categories, which can indicate model drift.

Does this guarantee a brand will be recommended?

No. Recommendations depend on the model, the user’s intent, competing options, and the available evidence. The goal is to make brand information easier to interpret accurately and to reduce avoidable misconceptions.

Do I still need an SEO agency?

Usually, yes—at least for parts of the work. This weekly loop overlaps with SEO (content clarity, crawlability, and technical fixes), but it does not replace broader SEO programs such as link acquisition, large-scale technical migrations, or ongoing performance reporting across all search channels.

Recommended setups:

  • Keep your SEO agency for technical SEO and ongoing search performance; use this weekly loop for AI brand alignment measurement, AI visibility analysis, and targeted content shipped.

  • Use a lightweight technical SEO partner (or on-call specialist) for site health and fixes as needed; run the weekly loop here for priorities and verification.

  • If you have an in-house content team, they can implement the shipped changes; the weekly loop provides measurement, weekly priorities, and QA against the same scenario set.

How often do you re-measure?

Weekly for the core set, with optional additional checks after major launches, pricing changes, or documentation updates.

What models do you test?

We select models and surfaces based on the category and where the target audience is likely to ask questions. The set can change over time; we document the measurement context for each report.