kitku
5 days ago
This reminds me of the Nepenthes tarpit [1], which is an endless source of ad-hoc generated garbled mess which links to itself over and over.
Probably more effective at poisoning the dataset if one has the resources to run it.
fleebee
5 days ago
I'm running Iocaine[1] which is essentially the same thing on my tiny $3/mo VPS and it's handling crawlers bombarding the honeypot with ~12 requests per second just fine. It's using about 30 MB of RAM.
treetalker
5 days ago
Odorless, tasteless, and among the more deadly poisons known to crawlers!
BrenBarn
4 days ago
Unfortunately they will spend the next several years building up an immunity.
8organicbits
4 days ago
Do we know if LLM scrapers are running JavaScript on the pages? If they are, maybe it's worth offloading the Markov model to the client side.