The AI-First WordPress Site — Crawler to Citation

// wceu / 2026
// krakow · jun 06

Saturday · 2:45 PM CEST
Development · Track 1

CRAWLER

TO CITATION.

— server rack —

▸ Abstract

The AI-first WordPress site. Make it readable, citable, and measurable by the machine layer that just appeared on top of the web.

▸ Author

Alain SchlesserAgentic architect & engineer
Google Developers Expert for
Web Technologies and AI

// section divider
// act one of three

5 min
5 slides

ACT I.

THE PROBLEM

▸ PROBLEM
▸ 5 MIN

AI is now a traffic source. It sends visitors, and it takes content.
The shift is settled — we're here to talk about what to do.

// act i · the problem
// the opener

picture this

“

Sometime in the last week, an AI system decided whether to recommend your site as the answer to someone's question. You weren't
in the room.

And most of the time, the answer was no.— a.s.

// act i · the problem
// figure 03.1

the shift, in one number

THE AUDIENCE

JUST DOUBLED.

▸ AI bot traffic · YoY · open web

+187^%

Growth across the open web, Jan – Dec 2025. The direction is one-way. Don't argue with it — design for it.

— Cloudflare Radar · Q1 2026

Humans, same window+3.1%

▸ READ THIS

— page 14 · fold C —

BOTS
OUTNUMBER
HUMANS

// act i · the problem
// figure 04.1

quality, not just volume

A FULL

REVERSAL.

▸ March 2025

AI traffic converted −38 % vs non-AI.
The skeptic position made sense.

−38^%

▸ March 2026

AI traffic converts +42 %, RPV +37 %, time on site +48 %, pages/visit +13 %

+42^%

BEFORE → AFTER
in 12 months.

// act i · the problem
// the gap

where most wp sites are

THREE WAYS

TO BE INVISIBLE.

96 %
OF THE WEB

1InvisibleNo robots.txt strategy. No schema. No measurement. The AI ecosystem flows past you in both directions.

2VulnerableDefault-open to training crawlers. No governance. Content harvested by anyone. No idea who, or how much.

3Old-paradigmOptimised hard for Google 2018. Nothing ready for the AI layer that just appeared on top of it.

// section divider
// act two of three

18 min
16 slides · 5 layers

ACT II.

THE STACK

▸ STACK
▸ 5 LAYERS

Five layers — crawlers · robots · schema · content · measurement.
Take notes on the ones you don't have yet.

// act ii · stack · layer 1
// crawler awareness

the three tiers

THREE BOTS.

THREE STANCES.

→ tip:
agents = users.
treat them so.

▸ Tier 01

Training

GPTBot · ClaudeBot · CCBot
Google-Extended · Bytespider · Amazonbot

Block or rate-limit
unless paid

▸ Tier 02

Search /
citation

OAI-SearchBot · PerplexityBot
ChatGPT-User · AppleBot-Extended · Bingbot

Allow.
These drive citations.

▸ Tier 03

Agent

OpenAI Operator
Anthropic Computer Use
Perplexity Comet

Allow.
Treat like a user.

// act ii · stack · layer 1
// vendor share, q1 2026

multi-vendor reality

89 %

IS TRAINING.

CLOUDFLARE
Q1 2026

— q1 '26 notes —

1Googlebot31.6% — the elephant. Still leads, no longer a monopoly.

2Meta-ExternalAgent16.7% — quietly enormous. Mixed-purpose crawler.

3GPTBot12.0% — OpenAI training. Clear opt-out.

4ClaudeBot11.7% — Anthropic. Distinct UAs for training vs search.

5AppleBot-Extended5.8% — up 124% in Q1. Siri Intelligence is here.

6The rest22.2% — Bytespider, Amazonbot, CCBot, PerplexityBot, dozens more.

// act ii · stack · layer 1
// who's at your door

tools you already have

YOU CAN SEE

ALL OF THEM.

— field guide —

WHO'S
AT YOUR DOOR

1User-agentEvery major vendor publishes theirs. UA-sniffing works — until it doesn't. Trivially spoofable.

2IP / ASNOpenAI, Anthropic, Apple publish ranges. UA + ASN = verified bot.

3Server logsOne grep one-liner on your access log. Visibility today, no plugins.

// act ii · stack · layer 2
// robots.txt strategy

what wp ships

DEFAULT

IS NOT A STRATEGY.

96 %
OF SITES

robots.txtwordpress 6.5 default

# wordpress default — same since 2008

User-agent: *
Disallow: /wp-admin/
Allow:    /wp-admin/admin-ajax.php

# zero ai-specific directives.
# every ai crawler gets exactly the same treatment.
# this is not a stance — it's the absence of one.

// act ii · stack · layer 2
// the strategic config

THREE TIERS.

THREE DECISIONS.

robots.txtthe 2026 config

# tier 1 — training (default: deny)
User-agent: GPTBot
User-agent: ClaudeBot
User-agent: CCBot
Disallow: /

# tier 2 — citation (default: allow)
User-agent: OAI-SearchBot
User-agent: PerplexityBot
Allow: /

# tier 3 — agent (allow, like users)
User-agent: ChatGPT-User
Allow: /

// act ii · stack · layer 2
// the defense case

publisher math

YOUR CONTENT.

YOUR DECISION.

— legal archives —

1The return is zeroTraining crawlers take, don't give. No citation, no referral, no attribution. Bandwidth on the way in is your bill.

2Legal directionPublishers vs major AI labs in active litigation. Direction of travel: more control, not less. A clear opt-out today is cheap insurance.

3Track recordSome operators have a documented history of ignoring opt-outs. The compute they spend crawling you is bandwidth you pay for.

4Re-audit quarterlyThe line between training and search is shifting. Some vendors run unified crawlers. Not a permanent answer — a current one.

// act ii · stack · layer 2
// when honor-system fails

enforcement layers

WHEN HONOR

SYSTEM FAILS.

belt &
braces.
both.

1WAF ruleCloudflare AI Crawl Control, Fastly's equivalent, Akamai's. One toggle in a dashboard.

2Rate-limit + IP blockServer-level. More granular. Useful when you want to allow a vendor sometimes, not always.

3Web Bot AuthHTTP message signatures. Cryptographic identity for well-behaved bots. Standards work in progress.

// act ii · stack · layer 3
// the reframe

schema is the data layer

“

Schema used to be an SEO trick. In 2026 it's something else.

Treat it like an API. The rest of this layer makes sense.— a.s.

// act ii · stack · layer 3
// receipts

why we know this

THREE

INDEPENDENT RECEIPTS.

1PlatformFabrice Canel, Principal PM Bing, SMX Munich: "Bing & Copilot use schema to feed their LLMs."

2Training timeLLM pre-training pulls from Common Crawl. Your structured data is in that snapshot. Every model inherits the structure. Effect is one-way and permanent.

3Protocol layerNLWeb, MCP-based approaches, llms.txt — several proposals worth taking seriously. All of them speak schema.

// act ii · stack · layer 3
// the schema trap

quality beats quantity

STUB SCHEMA

HURTS YOU.

▸ Citation rate · minimal / generic schema

41.6^%

Pages with no schema were cited 59.8% of the time. Rich attribute-complete schema hit 61.7%. Half-finished schema looks like low effort. AI systems notice.

— Growth Marshal · n=730

Either commit, or don't bother.61.7%

DO IT
PROPERLY

// act ii · stack · layer 3
// the @id graph

graph beats blobs

CONNECT

EVERY @ID.

graph > blob.
every. single.
time.

// act ii · stack · layer 4
// content patterns

evidence-based

THREE EDITS.

+30 – 40 %.

PRINCETON
GEO-BENCH

// content edit pass

cite. quote. count.
→ ~30–40 % lift

1Cite sourcesPages that cite get cited. Add references. Hyperlink to primary sources. Position-adjusted lift: ~30–40 %.

2Quote authoritiesPull direct quotes into blockquotes. AI treats your page as quoting an authority. Same lift band.

3Use statisticsReplace "a lot" with the actual number. Specific figures get pulled when AI grounds its own claims.

// act ii · stack · layer 4
// the two heuristics

freshness + entities

FRESH BEATS

FOSSIL.

— signal the index —

1Freshness signalMicrosoft Canel: "Generative AIs value fresh content — as a reference check of LLM training." His channel: IndexNow. WP plugins exist. Cheap.

2Entity densityNamed entities + internal links improve AI comprehension. King (iPullRank): relevance, not authority, drives AI Overview placement.

// act ii · stack · layer 4
// awkward truth

becoming the substrate

WIKIPEDIA.

REDDIT. YOU.

SEMRUSH
100M+ CITES

1ChatGPTWikipedia 26–48 % of top-10 share. Reddit went 60 % → 10 % in six weeks after a Google parameter change.

2Google AI ModeReddit, LinkedIn, YouTube growing. Medium, Quora declining.

3PerplexityReddit at ~40 %. WSJ, NYT, Bloomberg absent from top 20.

4The playYou can't out-Reddit Reddit. Be the original-research source. The thing the forum thread links to.

// act ii · stack · layer 5
// measurement

the referrer lies

REFERRERS

TO WATCH.

1chat.openai.comThe biggest single source for most categories.

2perplexity.aiSmaller but converts hard. Power users.

3gemini.google.comGrowing fast. Tied into the Google account flow.

4copilot.microsoft.comEnterprise traffic. Edge default.

5x.com (Grok)Stripped sometimes. Show up as x.com.

6kagi.comSmall, paid, high-intent.

// act ii · stack · layer 5
// minimal dashboard

four metrics

FOUR METRICS.

NO MORE.

start small.
measure something.
beats nothing.

1Visits by AI referrerUndercounted baseline. Trend over time is what matters, not absolute.

2Conversion / engagement by referrerPages, time, goal completions. Does the Adobe pattern hold for you?

3Citation sharePick ten queries. Rotate weekly. Log which sites get cited and track your share over time.

4Crawler shareGrep your access log weekly. Training-tier vs citation-tier bots. The bandwidth picture you don't see in GA.

// bridge
// what's next

the callable web

“

Schema is the foundation. Several proposals are competing for what gets built on top — and we don't yet know which one wins. The next layer isn't
about being readable.
It's about being callable.

NLWeb · MCP-for-web · llms.txt · agent-skill manifests. Pick wisely. Or don't — the prerequisite work is the same.— a.s.

// section divider
// act three of three

4 min
4 slides

ACT III.

DO THE WORK

▸ ACTION
▸ TONIGHT

Two timelines. Tonight, and the next thirty days.
You can be ahead of 96 % of the web before you sleep.

// act iii · checklist
// figure 23.1

same-day actions

DO THIS

TONIGHT.

▸ ACTION
▸ ITEMS

1Audit your robots.txtExplicit decision per tier: training, citation, agent.

2Verify your schema graph@id cross-references. Not isolated JSON-LD blobs.

3Attribute-complete schemaOrganization. Person. Article. Stub schema underperforms none.

4Run a readiness scannerOne scan today. Capture the baseline score.

5Add llms.txtMarkdown index at site root. Five minutes. Cheap groundwork for the next layer.

6Add referrer trackingChatGPT, Perplexity, Copilot, Gemini. Without it: blind.

// act iii · checklist
// figure 24.1

the 30-day arc

FOUR WEEKS.

FOUR THEMES.

— calendar block —

FOUR
WEEKS

1Week 01 — InstrumentWire the dashboard. Baseline referrals, citations, crawler share. Measurement is slowest to start, so front-load it.

2Week 02 — SchemaTop 10 pages to full attribute-completeness. Don't add types — fill in what's there.

3Week 03 — ContentTop 5 cornerstone articles. Citations, quotations, statistics. Five well-done beats fifty half-done.

4Week 04 — Review & publishIterate where the signal is. Publish your AI-readiness page on your site. Public commitment for round two.

// act iii · take it home
// interactive guide

audit your own site

RUN THE

CHECKLIST.

▸ interactive guide

Walk every layer
on your own site,
one tick-box at a time.

Crawlers. Robots. Schema.
Content. Measurement.

— start tonight.

▸ scan to start QR code to the interactive checklist at tiki.tf/wceu2026

tiki.tf/wceu2026

// the end
// thank you + q&a

fin.
over to you

THANK YOU.

QUESTIONS?

▸ your turn

Ask me anything.

Pick a layer,
share what's working,
or push back on a claim.

— let's talk.

▸ scan to connect QR code to connect with Alain Schlesser

@schlessera · alainschlesser.com

BOTSOUTNUMBERHUMANS

Training

Search /citation

Agent

WHO'SAT YOUR DOOR

FOURWEEKS

BOTS
OUTNUMBER
HUMANS

Search /
citation

WHO'S
AT YOUR DOOR

FOUR
WEEKS