Your robots.txt is theater. The actual levers are somewhere else.
Most sites believe they've managed AI access because they edited robots.txt. Robots.txt is a polite request. Compliant crawlers honor it, sort of, and non-compliant ones don't read it. Neither case produces evidence.
The load-bearing mechanism is the verified-bot firewall rule inside your CDN. It checks User-Agent and reverse-DNS against the vendor's published IP ranges. Cloudflare exposes it as cf.client.bot. Every serious CDN has an equivalent. Almost nobody is using it on purpose.
We proved the mechanism on ourselves. Here's what 14 days of one misconfigured firewall rule looked like:
ChatGPT referrals (30d)540 → 73 sessions (-86%)
Google organic (same window)57 → 93 sessions (+63%)
Claude referralsunchanged
Gemini referralsunchanged
Bingbot blocks (23h snapshot)86 events / 78 unique Microsoft IPs
The fixand not cf.client.bot
One vendor starved. Three untouched. One missing firewall clause. We didn't set out to demonstrate the mechanism — our own auto-malicious-IP rule drifted over Bingbot's range and ChatGPT stopped seeing us. That diagnostic trail is the product.
Read the full write-up with receipts →
The Honest Version
We did not invent this mechanism. Cloudflare did. Google did. Microsoft did. What we did was run our own site into a wall and discover the mechanism is load-bearing for AI visibility — and that almost nobody is using it on purpose. The product is: doing it continuously, per-vendor, with receipts, and diagnosing it fast when it breaks.
Who built this, and why you should believe the receipts.
We are DugganUSA LLC — a two-person shop in Minneapolis that built the first commercial HAIC benchmark for AI brand perception, audits 250+ domains on a public leaderboard, publishes a STIX threat feed consumed in 46 countries, and holds 35 patents with #104 in filing on exactly the mechanism described above.
228
domains on leaderboard
1st
commercial HAIC benchmark
35
patents (#104 in filing)
When our own site lost 86% of ChatGPT traffic in two weeks, we had the telemetry to see it, the audit log to prove it, and the diagnostic muscle memory to fix it the same night. That's what we sell.
95% Epistemic Humility
We cap confidence at 95%. Murphy was an optimist. Something will be wrong on any given day — probably the crawler-ID list went stale, or a CDN shipped a breaking change, or a new model's bot appeared before we added detection. The receipts are real. The guarantees aren't absolute. Nobody's are. We just say it out loud.