lr001328
5 hours ago
We built a free scanner that checks 7 layers of AI discoverability — llms.txt, JSON-LD structured data, OpenAPI spec, A2A agent cards, health endpoints, robots.txt/sitemap, and whether you have a machine-readable service catalog.
You enter a URL, it streams results in real time via SSE, and gives you a score out of 100 with specific findings per layer.
Why we built it: 80% of URLs cited by ChatGPT, Perplexity, and Copilot don't rank in Google's top 100 for the same query. AI discovery is a fundamentally different layer from traditional search — and most sites are completely invisible to it.
Some things we learned building the audit engine:
- Structured data matters most. Sites with proper JSON-LD schema see measurably higher AI citation rates. Microsoft has confirmed schema markup helps their LLMs.
- llms.txt is aspirational. We check for it, but we should be honest: no major AI platform has publicly confirmed they read it, and statistical analysis shows no correlation with citation rates. We still think it's worth having as a context primer for dev docs, but it's not the silver bullet people think.
- AI crawlers don't execute JavaScript. GPTBot, ClaudeBot, PerplexityBot — none of them run JS. If your site is a client-rendered SPA with no SSR, AI agents see an empty page.
- The A2A protocol is early but interesting. Google's Agent-to-Agent spec includes agent cards at /.well-known/agent-card.json. Almost nobody has one yet, but the spec exists and crawlers are starting to look for it.
Try it: https://clarvia.dev