agentultra
7 months ago
A nice attempt and another layer for the swiss cheese of technology it will take to try and ease the burden AI companies are putting on people trying to run websites.
I'd be cautious about relying on just the good will of Cloudflare.
It's unfortunate that we need honeypots and tarpits to trap AI scrapers just so that our hosting bills don't get hosed. It's taking a good chunk of value out of running a site on the Internet.
OutOfHere
7 months ago
Feel free to waste your expensive outgoing bandwidth running malware. It is a genius idea really from the cloud companies to enrich their balances.
Definitely don't rewrite your web server more efficiently in Rust instead. /s
Retric
7 months ago
Serving poisoned text can be so cheap it’s effectively free as long as you don’t give them a lot of links.
OutOfHere
7 months ago
Another thing that doesn't make sense is why it has to be poisoned text. Why can't it just be a mix of whitespace? I doubt anyone is using LLMs with streaming inputs to determine whether to continue reading the page.
Retric
7 months ago
Company’s actively harming you should be discouraged, preferably by running them out of business. Whitespace doesn’t do that and makes it easy to identify when the crawlers fail.
Swapping meaning poisons the LLM but makes it really difficult for a preprocessing step to understand the difference between good and bad inputs.
Mars008
7 months ago
Yeh, and say goodby to google search. You didn't want to be there anyway, right?
Retric
7 months ago
Google makes it easy to identify their bot. Often people want to do this to give them more access.
People care about AI companies because they’re ignoring robots.txt etc.
techjamie
7 months ago
Many of these tarpits deliberately serve the data at an excruciatingly low speed to ease the burden on the server resources. It's cheaper than quickly serving the same crawlers your entire website at max speed constantly.
OutOfHere
7 months ago
If we are going for cheaper, how is it cheaper than an HTTP 429 error? It's not.
felurx
7 months ago
I suppose they're going for a trade-off between "cheaper" and "satisfying" (as in, the satiafaction of sticking it to unethical shitty companies)
DamonHD
7 months ago
Virtually nothing pays attention to 429s that I have observed. More things pay attention to 500s and 503s. Some however use those as a trigger to repoll immediately.