phdelightful
16 hours ago
I just put Anubis in front of my self-hosted forge this morning because AmazonBot had helped itself to 750 GiB (!) of traffic to my public repos this month!
At least, it claimed to be AmazonBot…
faangguyindia
40 minutes ago
In my logs it appears like this:
BOT","cluster_name":"EU","cluster_region":"EU","connection_type":"corporate","country":"US","device_type":"ROBOT","duration_ms":0.391,"duration_us":391,"filter":"","ip":"52.1.106.130","isp":"Amazon.com, Inc.","level":"info","msg":"Request evaluated","org":"Amazon.com, Inc.","os":"","ref":"","region":"Virginia","result":false,"time":"2026-05-15T13:33:20Z","ua":"Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Amazonbot/0.1; +https://developer.amazon.com/support/amazonbot) Chrome/119.0.6045.214 Safari/537.36","why":"bot"}
3.227.180.70
23.21.175.228
23.23.137.202
from all these IPs.
Bender
15 hours ago
Are they in this space? [1] One could map the ranges into a web daemon and rate limit them or just 'ip route add blackhole ${cidr}' each cidr block.
rnhmjoj
8 hours ago
I just do this for the IP ranges of Amazon, OpenAI, Huawei and other companies that run these insane crawlers: it's 100% effective and it doesn't annoy real users with a captcha or some PoW thing. There's simply no reason for them to reach my homeserver other than to scrape the hell out of it.
phdelightful
2 hours ago
I didn't check thoroughly, but the first one I happened to grep out was not on that list:
"Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Amazonbot/0.1; +https://developer.amazon.com/support/amazonbot) Chrome/119.0.6045.214 Safari/537.36"
"x-forwarded-for":"44.210.204.255" "x-real-ip":"44.210.204.255"
This is a bit outside my area of expertise, so I don't know how reliable these x-forwarded-for and x-real-ip are.
Bender
2 hours ago
One of the places to look it up would be bgp.tools [1] The IP is purported to belong to Amazon and the ASN has some interesting tags. [2] Any form of forwarded-for can be spoofed and should only be considered from expected up-stream proxies such as a CDN and they should have a CDN specific IP header that would be listed in their documentation. Typically the first column in access logs will be the REMOTE_ADDR which is the actual network connection but if using a CDN that would be the CDN IP.
If a CDN does not have an option to block cloud and Tor CIDR blocks then that should be a feature request.
44.210.204.255 is included in 44.192.0.0/10 which is listed in the AWS CIDR ranges. Use one of the online subnet calculators to find IP ranges of CIDR blocks. This is likely a Tor exit node.
Blocking the CIDR blocks I listed in the thread would have included this node as well. Here [3] are a few shell functions for getting some of the cloud CIDR blocks. I must have been inebriated when I wrote those. This site may not be reachable during blood moons or when the nanosecond is divisible by zero.
Here [4a][4b] are a couple decent subnet calculators. There are some command line tools for playing with CIDR blocks and IP addresses to see if an IP is included in a CIDR block but this varies by Linux distribution so perhaps look for a generic python script.
To get a list of Tor exit nodes to blackhole route, look at [5]. This updates often. Just clone the entire repo. Unless your site is related to government dissent or anonymous porn then most traffic from Tor exit nodes will likely just be bots and thus riff-raff.
Seconds after I linked realhackers bots showed up and got a zero byte response. Poor lil HN servers must get a lot of trash non stop. I hope I get some delicious bots today.
[1] - https://bgp.tools/
[2] - https://bgp.tools/as/14618
[3] - https://ai.realhackers.org/_get_cloud_cidr.txt
[4a] - https://mxtoolbox.com/subnetcalculator.aspx
[4b] - https://www.vultr.com/resources/subnet-calculator/
[5] - https://github.com/firehol/blocklist-ipsets/blob/master/clea...
Symbiote
7 hours ago
That's all of Amazon AWS, not just Amazon's AI system.
Bender
3 hours ago
Yup, mostly. There are more ranges for the Amazon store too.
It would be rather nifty if Amazon and other companies would confine AI to specific CIDR or a dedicated ASN but I would not hold my breath on that one. AI crawlers will likely muddy the waters for everyone else.
lofaszvanitt
7 hours ago
That list is a tad bit too long. Why don't they enforce a rule on these big corps to publicly state which range does what.
Bender
3 hours ago
That would indeed by handy but I think the answer is that people would block specific ranges. By not segmenting into specific groups people are forced to either:
- play the game of whack-a-mole
- use difficult implementations of user validation checks that potentially cause pain for real humans
- block all Amazon CIDR blocks which they know most corporations will not do.
This forces the majority to just tolerate whatever comes out of their networks.
userbinator
11 hours ago
At least, it claimed to be AmazonBot
It's good that you mentioned this; smear campaigns are definitely not a new thing, and I suspect a lot of this DDoS'ing that's going on is a plot to accelerate towards Big Tech's authoritarian dystopia. Basically extortion.
faangguyindia
9 hours ago
i see the bots with user agent claude bot, using AWS IPs.
I've also seen Google bots with AWS IP ranges. You gotta look at their ASN/ISP/ORG
nathanmills
15 hours ago
Do you have a robots.txt?
xena
15 hours ago
> We are writing to inform you that starting Monday, June 15, 2026, crawl preferences for Amazonbot will be managed solely through the industry-standard directives.
They will in the future, but not today.