# wordpress default — same since 2008
User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php
# zero ai-specific directives.# every ai crawler gets exactly the same treatment.# this is not a stance — it's the absence of one.
// act ii · stack · layer 2 // the strategic config
1The return is zeroTraining crawlers take, don't give. No citation, no referral, no attribution. Bandwidth on the way in is your bill.
2Legal directionPublishers vs major AI labs in active litigation. Direction of travel: more control, not less. A clear opt-out today is cheap insurance.
3Track recordSome operators have a documented history of ignoring opt-outs. The compute they spend crawling you is bandwidth you pay for.
4Re-audit quarterlyThe line between training and search is shifting. Some vendors run unified crawlers. Not a permanent answer — a current one.
// act ii · stack · layer 2 // when honor-system fails
enforcement layers
WHEN HONOR
SYSTEM FAILS.
belt & braces. both.
1WAF ruleCloudflare AI Crawl Control, Fastly's equivalent, Akamai's. One toggle in a dashboard.
2Rate-limit + IP blockServer-level. More granular. Useful when you want to allow a vendor sometimes, not always.
3Web Bot AuthHTTP message signatures. Cryptographic identity for well-behaved bots. Standards work in progress.
// act ii · stack · layer 3 // the reframe
schema is the data layer
“
Schema used to be an SEO trick.
In 2026 it's something else.
Treat it like an API. The rest of this layer makes sense.— a.s.
// act ii · stack · layer 3 // receipts
why we know this
THREE
INDEPENDENT RECEIPTS.
1PlatformFabrice Canel, Principal PM Bing, SMX Munich: "Bing & Copilot use schema to feed their LLMs."
2Training timeLLM pre-training pulls from Common Crawl. Your structured data is in that snapshot. Every model inherits the structure. Effect is one-way and permanent.
3Protocol layerNLWeb, MCP-based approaches, llms.txt — several proposals worth taking seriously. All of them speak schema.
// act ii · stack · layer 3 // the schema trap
quality beats quantity
STUB SCHEMA
HURTS YOU.
▸ Citation rate · minimal / generic schema
41.6%
Pages with no schema were cited 59.8% of the time. Rich attribute-complete schema hit 61.7%. Half-finished schema looks like low effort. AI systems notice.
— Growth Marshal · n=730
Either commit, or don't bother.61.7%
DO IT PROPERLY
// act ii · stack · layer 3 // the @id graph
graph beats blobs
CONNECT
EVERY @ID.
graph > blob. every. single. time.
// act ii · stack · layer 4 // content patterns
evidence-based
THREE EDITS.
+30 – 40 %.
PRINCETON GEO-BENCH
// content edit pass
cite. quote. count. → ~30–40 % lift
1Cite sourcesPages that cite get cited. Add references. Hyperlink to primary sources. Position-adjusted lift: ~30–40 %.
2Quote authoritiesPull direct quotes into blockquotes. AI treats your page as quoting an authority. Same lift band.
3Use statisticsReplace "a lot" with the actual number. Specific figures get pulled when AI grounds its own claims.
// act ii · stack · layer 4 // the two heuristics
freshness + entities
FRESH BEATS
FOSSIL.
— signal the index —
1Freshness signalMicrosoft Canel: "Generative AIs value fresh content — as a reference check of LLM training." His channel: IndexNow. WP plugins exist. Cheap.
2Entity densityNamed entities + internal links improve AI comprehension. King (iPullRank): relevance, not authority, drives AI Overview placement.
// act ii · stack · layer 4 // awkward truth
becoming the substrate
WIKIPEDIA.
REDDIT. YOU.
SEMRUSH 100M+ CITES
1ChatGPTWikipedia 26–48 % of top-10 share. Reddit went 60 % → 10 % in six weeks after a Google parameter change.
2Google AI ModeReddit, LinkedIn, YouTube growing. Medium, Quora declining.
3PerplexityReddit at ~40 %. WSJ, NYT, Bloomberg absent from top 20.
4The playYou can't out-Reddit Reddit. Be the original-research source. The thing the forum thread links to.
// act ii · stack · layer 5 // measurement
the referrer lies
REFERRERS
TO WATCH.
1chat.openai.comThe biggest single source for most categories.
2perplexity.aiSmaller but converts hard. Power users.
3gemini.google.comGrowing fast. Tied into the Google account flow.
5x.com (Grok)Stripped sometimes. Show up as x.com.
6kagi.comSmall, paid, high-intent.
// act ii · stack · layer 5 // minimal dashboard
four metrics
FOUR METRICS.
NO MORE.
start small. measure something. beats nothing.
1Visits by AI referrerUndercounted baseline. Trend over time is what matters, not absolute.
2Conversion / engagement by referrerPages, time, goal completions. Does the Adobe pattern hold for you?
3Citation sharePick ten queries. Rotate weekly. Log which sites get cited and track your share over time.
4Crawler shareGrep your access log weekly. Training-tier vs citation-tier bots. The bandwidth picture you don't see in GA.
// bridge // what's next
the callable web
“
Schema is the foundation. Several proposals are competing for what gets built on top — and we don't yet know which one wins.
The next layer isn't about being readable. It's about being callable.
NLWeb · MCP-for-web · llms.txt · agent-skill manifests. Pick wisely. Or don't — the prerequisite work is the same.— a.s.
// section divider // act three of three
4 min 4 slides
ACT III.
DO THE WORK
▸ ACTION ▸ TONIGHT
Two timelines. Tonight, and the next thirty days.
You can be ahead of 96 % of the web before you sleep.
// act iii · checklist // figure 23.1
same-day actions
DO THIS
TONIGHT.
▸ ACTION ▸ ITEMS
1Audit your robots.txtExplicit decision per tier: training, citation, agent.
2Verify your schema graph@id cross-references. Not isolated JSON-LD blobs.