// wceu / 2026
// krakow · jun 06
Saturday · 2:45 PM CEST
Development · Track 1
CRAWLER
TO CITATION.
★ WORDCAMP EUROPE · KRAKOW ★ REGISTERED · DEV TRACK 01 WCEU 2026 No. 06 · 06 · 26
— server rack —
▸ Abstract
The AI-first WordPress site. Make it readable, citable, and measurable by the machine layer that just appeared on top of the web.
▸ Author
Alain SchlesserAgentic architect & engineer
Google Developers Expert for
Web Technologies and AI
// section divider
// act one of three
5 min
5 slides
ACT I.
THE PROBLEM
▸ PROBLEM
▸ 5 MIN
AI is now a traffic source. It sends visitors, and it takes content.
The shift is settled — we're here to talk about what to do.
// act i · the problem
// the opener

picture this
Sometime in the last week, an AI system decided whether to recommend your site as the answer to someone's question. You weren't
in the room.
And most of the time, the answer was no.— a.s.
// act i · the problem
// figure 03.1

the shift, in one number
THE AUDIENCE
JUST DOUBLED.
▸ AI bot traffic · YoY · open web
+187%
Growth across the open web, Jan – Dec 2025. The direction is one-way. Don't argue with it — design for it.
— Cloudflare Radar · Q1 2026
Humans, same window+3.1%
▸ READ THIS

BOTS
OUTNUMBER
HUMANS

// act i · the problem
// figure 04.1

quality, not just volume
A FULL
REVERSAL.
▸ March 2025
AI traffic converted −38 % vs non-AI.
The skeptic position made sense.
−38%
▸ March 2026
AI traffic converts +42 %, RPV +37 %, time on site +48 %, pages/visit +13 %
+42%
BEFORE → AFTER
in 12 months.
// act i · the problem
// the gap

where most wp sites are
THREE WAYS
TO BE INVISIBLE.
96 %
OF THE WEB
1InvisibleNo robots.txt strategy. No schema. No measurement. The AI ecosystem flows past you in both directions.
2VulnerableDefault-open to training crawlers. No governance. Content harvested by anyone. No idea who, or how much.
3Old-paradigmOptimised hard for Google 2018. Nothing ready for the AI layer that just appeared on top of it.
// section divider
// act two of three
18 min
16 slides · 5 layers
ACT II.
THE STACK
▸ STACK
▸ 5 LAYERS
Five layers — crawlers · robots · schema · content · measurement.
Take notes on the ones you don't have yet.
// act ii · stack · layer 1
// crawler awareness

the three tiers
THREE BOTS.
THREE STANCES.
→ tip:
agents = users.
treat them so.
▸ Tier 01

Training

GPTBot · ClaudeBot · CCBot
Google-Extended · Bytespider · Amazonbot
Block or rate-limit
unless paid
▸ Tier 02

Search /
citation

OAI-SearchBot · PerplexityBot
ChatGPT-User · AppleBot-Extended · Bingbot
Allow.
These drive citations.
▸ Tier 03

Agent

OpenAI Operator
Anthropic Computer Use
Perplexity Comet
Allow.
Treat like a user.
// act ii · stack · layer 1
// vendor share, q1 2026

multi-vendor reality
89 %
IS TRAINING.
CLOUDFLARE
Q1 2026
— q1 '26 notes —
1Googlebot31.6% — the elephant. Still leads, no longer a monopoly.
2Meta-ExternalAgent16.7% — quietly enormous. Mixed-purpose crawler.
3GPTBot12.0% — OpenAI training. Clear opt-out.
4ClaudeBot11.7% — Anthropic. Distinct UAs for training vs search.
5AppleBot-Extended5.8% — up 124% in Q1. Siri Intelligence is here.
6The rest22.2% — Bytespider, Amazonbot, CCBot, PerplexityBot, dozens more.
// act ii · stack · layer 1
// who's at your door

tools you already have
YOU CAN SEE
ALL OF THEM.

WHO'S
AT YOUR DOOR

1User-agentEvery major vendor publishes theirs. UA-sniffing works — until it doesn't. Trivially spoofable.
2IP / ASNOpenAI, Anthropic, Apple publish ranges. UA + ASN = verified bot.
3Server logsOne grep one-liner on your access log. Visibility today, no plugins.
// act ii · stack · layer 2
// robots.txt strategy

what wp ships
DEFAULT
IS NOT A STRATEGY.
96 %
OF SITES
robots.txtwordpress 6.5 default
# wordpress default — same since 2008

User-agent: *
Disallow: /wp-admin/
Allow:    /wp-admin/admin-ajax.php

# zero ai-specific directives.
# every ai crawler gets exactly the same treatment.
# this is not a stance — it's the absence of one.
// act ii · stack · layer 2
// the strategic config
THREE TIERS.
THREE DECISIONS.
robots.txtthe 2026 config
# tier 1 — training (default: deny)
User-agent: GPTBot
User-agent: ClaudeBot
User-agent: CCBot
Disallow: /

# tier 2 — citation (default: allow)
User-agent: OAI-SearchBot
User-agent: PerplexityBot
Allow: /

# tier 3 — agent (allow, like users)
User-agent: ChatGPT-User
Allow: /
// act ii · stack · layer 2
// the defense case

publisher math
YOUR CONTENT.
YOUR DECISION.
— legal archives —
1The return is zeroTraining crawlers take, don't give. No citation, no referral, no attribution. Bandwidth on the way in is your bill.
2Legal directionPublishers vs major AI labs in active litigation. Direction of travel: more control, not less. A clear opt-out today is cheap insurance.
3Track recordSome operators have a documented history of ignoring opt-outs. The compute they spend crawling you is bandwidth you pay for.
4Re-audit quarterlyThe line between training and search is shifting. Some vendors run unified crawlers. Not a permanent answer — a current one.
// act ii · stack · layer 2
// when honor-system fails

enforcement layers
WHEN HONOR
SYSTEM FAILS.
belt &
braces.
both.
1WAF ruleCloudflare AI Crawl Control, Fastly's equivalent, Akamai's. One toggle in a dashboard.
2Rate-limit + IP blockServer-level. More granular. Useful when you want to allow a vendor sometimes, not always.
3Web Bot AuthHTTP message signatures. Cryptographic identity for well-behaved bots. Standards work in progress.
// act ii · stack · layer 3
// the reframe

schema is the data layer
Schema used to be an SEO trick. In 2026 it's something else.
Treat it like an API. The rest of this layer makes sense.— a.s.
// act ii · stack · layer 3
// receipts

why we know this
THREE
INDEPENDENT RECEIPTS.
1PlatformFabrice Canel, Principal PM Bing, SMX Munich: "Bing & Copilot use schema to feed their LLMs."
2Training timeLLM pre-training pulls from Common Crawl. Your structured data is in that snapshot. Every model inherits the structure. Effect is one-way and permanent.
3Protocol layerNLWeb, MCP-based approaches, llms.txt — several proposals worth taking seriously. All of them speak schema.
// act ii · stack · layer 3
// the schema trap

quality beats quantity
STUB SCHEMA
HURTS YOU.
▸ Citation rate · minimal / generic schema
41.6%
Pages with no schema were cited 59.8% of the time. Rich attribute-complete schema hit 61.7%. Half-finished schema looks like low effort. AI systems notice.
— Growth Marshal · n=730
Either commit, or don't bother.61.7%
DO IT
PROPERLY
// act ii · stack · layer 3
// the @id graph

graph beats blobs
CONNECT
EVERY @ID.
graph > blob.
every. single.
time.
▸ BEFORE — isolated blobs Article no @id Person no @id Organization no @id → AI sees three loose objects. ▸ AFTER — interconnected @id graph Article #article Person #author Organization #publisher WebPage #webpage
// act ii · stack · layer 4
// content patterns

evidence-based
THREE EDITS.
+30 – 40 %.
PRINCETON
GEO-BENCH
// content edit pass
cite. quote. count.
→ ~30–40 % lift
1Cite sourcesPages that cite get cited. Add references. Hyperlink to primary sources. Position-adjusted lift: ~30–40 %.
2Quote authoritiesPull direct quotes into blockquotes. AI treats your page as quoting an authority. Same lift band.
3Use statisticsReplace "a lot" with the actual number. Specific figures get pulled when AI grounds its own claims.
// act ii · stack · layer 4
// the two heuristics

freshness + entities
FRESH BEATS
FOSSIL.
— signal the index —
1Freshness signalMicrosoft Canel: "Generative AIs value fresh content — as a reference check of LLM training." His channel: IndexNow. WP plugins exist. Cheap.
2Entity densityNamed entities + internal links improve AI comprehension. King (iPullRank): relevance, not authority, drives AI Overview placement.
// act ii · stack · layer 4
// awkward truth

becoming the substrate
WIKIPEDIA.
REDDIT. YOU.
SEMRUSH
100M+ CITES
1ChatGPTWikipedia 26–48 % of top-10 share. Reddit went 60 % → 10 % in six weeks after a Google parameter change.
2Google AI ModeReddit, LinkedIn, YouTube growing. Medium, Quora declining.
3PerplexityReddit at ~40 %. WSJ, NYT, Bloomberg absent from top 20.
4The playYou can't out-Reddit Reddit. Be the original-research source. The thing the forum thread links to.
// act ii · stack · layer 5
// measurement

the referrer lies
REFERRERS
TO WATCH.
1chat.openai.comThe biggest single source for most categories.
2perplexity.aiSmaller but converts hard. Power users.
3gemini.google.comGrowing fast. Tied into the Google account flow.
4copilot.microsoft.comEnterprise traffic. Edge default.
5x.com (Grok)Stripped sometimes. Show up as x.com.
6kagi.comSmall, paid, high-intent.
// act ii · stack · layer 5
// minimal dashboard

four metrics
FOUR METRICS.
NO MORE.
start small.
measure something.
beats nothing.
1Visits by AI referrerUndercounted baseline. Trend over time is what matters, not absolute.
2Conversion / engagement by referrerPages, time, goal completions. Does the Adobe pattern hold for you?
3Citation sharePick ten queries. Rotate weekly. Log which sites get cited and track your share over time.
4Crawler shareGrep your access log weekly. Training-tier vs citation-tier bots. The bandwidth picture you don't see in GA.
// bridge
// what's next

the callable web
Schema is the foundation. Several proposals are competing for what gets built on top — and we don't yet know which one wins. The next layer isn't
about being readable.
It's about being callable.
NLWeb · MCP-for-web · llms.txt · agent-skill manifests. Pick wisely. Or don't — the prerequisite work is the same.— a.s.
// section divider
// act three of three
4 min
4 slides
ACT III.
DO THE WORK
▸ ACTION
▸ TONIGHT
Two timelines. Tonight, and the next thirty days.
You can be ahead of 96 % of the web before you sleep.
// act iii · checklist
// figure 23.1

same-day actions
DO THIS
TONIGHT.
▸ ACTION
▸ ITEMS
1Audit your robots.txtExplicit decision per tier: training, citation, agent.
2Verify your schema graph@id cross-references. Not isolated JSON-LD blobs.
3Attribute-complete schemaOrganization. Person. Article. Stub schema underperforms none.
4Run a readiness scannerOne scan today. Capture the baseline score.
5Add llms.txtMarkdown index at site root. Five minutes. Cheap groundwork for the next layer.
6Add referrer trackingChatGPT, Perplexity, Copilot, Gemini. Without it: blind.
// act iii · checklist
// figure 24.1

the 30-day arc
FOUR WEEKS.
FOUR THEMES.

FOUR
WEEKS

1Week 01 — InstrumentWire the dashboard. Baseline referrals, citations, crawler share. Measurement is slowest to start, so front-load it.
2Week 02 — SchemaTop 10 pages to full attribute-completeness. Don't add types — fill in what's there.
3Week 03 — ContentTop 5 cornerstone articles. Citations, quotations, statistics. Five well-done beats fifty half-done.
4Week 04 — Review & publishIterate where the signal is. Publish your AI-readiness page on your site. Public commitment for round two.
// act iii · take it home
// interactive guide

audit your own site
RUN THE
CHECKLIST.
▸ interactive guide
Walk every layer
on your own site,
one tick-box at a time.

Crawlers. Robots. Schema.
Content. Measurement.
— start tonight.
▸ scan to start QR code to the interactive checklist at tiki.tf/wceu2026
tiki.tf/wceu2026
// the end
// thank you + q&a
fin.
over to you
THANK YOU.
QUESTIONS?
▸ your turn
Ask me anything.

Pick a layer,
share what's working,
or push back on a claim.
— let's talk.
▸ scan to connect QR code to connect with Alain Schlesser
@schlessera · alainschlesser.com