reddalo
5 hours ago
I wish people would follow this, instead of coming up with new standards in the root namespace. "llms.txt" [1] comes to mind, for example.
Let's stop polluting the root of a domain!
rickette
4 hours ago
LLMs.txt is also nonsense since it isn't adopted by any of the major AI players.
networked
3 hours ago
Google has recently added `llms.txt` to Chrome's Lighthouse check for agentic browsing (https://searchengineland.com/google-llms-txt-chrome-lighthou...), so adoption may be coming. Admittedly, I put more faith in
<link rel="alternate" type="text/markdown" href="https://example.com/foo.md" title="Markdown version of the <Foo> page">
that I copied from Gwern.net. This convention is discoverable (just read the HTML) and naturally adapts to any website size and structure.I have created an `llms.txt` for my website anyhow. I use a fixed LLM prompt to generate it from the internal links in `index.md`.
iamacyborg
3 hours ago
Giving a markdown version of a page seems like an interesting choice instead of just embedding a schema marked up one
vidarh
2 hours ago
Every page on code.claude.com has a markdown version available by just appending ".md", and Claude Code knows about it. E.g:
9dev
2 hours ago
After some consideration, I also applied this convention to every site I build - including content negotiation: Clients can either send an Accept header with their preference, or append an explicit extension (.md|.markdown for Markdown, .json for JSON API responses, or .html for the human HTML page). Together with the content negotiation part, it feels very much like HTTP was intended to work - especially the fact that API clients, AI agents, and humans all use the same URLs, but get the content in the shape they need.
vidarh
an hour ago
I've done this off and on for various sites over the years too, and probably should be more consistent about it. A number of sites do or used to do some variation of this, and I wish it was more widespread. E.g. Reddit will serve up a json version of a sub-dreddit if you do /r/subreddit.json
dspillett
3 hours ago
The same could be said of robots.txt
And anything else that might tell them not to access something.