losteric
5 hours ago
> Plain headless Chromium is easy to detect by websites with anti-bot measures. Plain headless Chromium avoided getting blocked by websites only 2% of the time, according to our stealth benchmark.
> Our browsers avoid blocks 81% of the time on our stealth benchmark, and 84.8% on Halluminate BrowserBench, the highest of any provider.
Seems very unethical, no? Who uses service providers like this? The whole point of anti-bot measures is to get rid of bots - you are not wanted there.
These kinds of services inevitably make the web more human-hostile and expensive. Websites will continue pushing back on automated usage, meaning more hurdles to access content.
No doubt part of why we see this push for verified ID on the web - not just age gating and "protect the children", but also protect sites from bots, and protect ad revenue (not a statement of support; just seems like an obvious higher order effect)
baby_souffle
2 hours ago
> Who uses service providers like this?
I use change detection to monitor all sorts of websites for changes. Some of my favorite authors don't have RSS. I always set up price monitoring for any big ticket item I'm considering like appliances so I can see how their pricing changes over time. I also use scrapers for websites that don't have an API. I like having all of my purchase history indexed in a database where I can do analysis.
> These kinds of services inevitably make the web more human-hostile and expensive.
I would rather not have to spend more time circumventing stupid bot detection things. I would be more than happy to pay for access to some of this data that I cannot access any other way.. but sure, let's keep burning resources on a cat and mouse game that scrapers will always be able to win.
arianvanp
2 hours ago
The litmus test here is whether they support https://blog.cloudflare.com/introducing-pay-per-crawl/ out of the box or not
They do not.
mikeocool
4 hours ago
Whether or not scrapping publically available websites is unethical is probably up for debate. In some cases at least, courts have found it to be legal, even when the site is throwing up technical barriers or issues cease and desists.
What is likely unethical is the fact that they offer residential proxies. The residential providers of those proxies are frequently not aware they’ve been opted in to provide such a service.
eab-
3 hours ago
> courts have found it to be legal
≠ ethical
MayCXC
an hour ago
I built a similar system for an identity protection service that automated removing PII from directory websites like whitepages. Which was less ethical, stealth browser automation or monetized privacy invasion?
embedding-shape
4 hours ago
> Seems very unethical, no? Who uses service providers like this? The whole point of anti-bot measures is to get rid of bots - you are not wanted there.
Unethical just because it does something someone else doesn't want? I guess it depends on why and what the intention is. I don't have time to sit 24/7 in front of a computer to get a ticket to some events, does that mean it's unethical for me to use my own bot so I can purchase a ticket to bands I'm a fan of? Probably not. But if I did so for scalping purposes? Then yeah, I'd agree it's unethical.
The whole point of anti-anti-bot measures is to be able to do things even if others don't think that thing should be automated, so from the hacker news audience, I think quite a lot of us have at one point or another engaged in stuff like that. Doing so merely for profits of course stinks, but for you to be able to have a fighting chance against scalpers? Probably OK.
mystifyingpoi
4 hours ago
> even if others don't think that thing should be automated
It's an interesting thought that can be further explored. Could anything that's considered "unwanted" by a third party considered unethical, if I do it anyway?
If the hotel self-service restaurant has a sign "don't take the food out" and I take 1 apple in my pocket for a snack, is it unethical? Or maybe the sign is just for people that would otherwise take $100 of watermelons out of the cantina daily and try to resell it on the beach.
turtlebits
4 hours ago
Its unethical because you're intentionally bypassing restrictions. Just because others do it doesn't mean its okay.
If you saw a sign in a store that said "1 per person" or "for registered guests only", would you ignore it?
orf
3 hours ago
Was Rosa Parks unethical for sitting down on a bus?
The point is that the context matters: both the users context and the context of the restriction. It’s not as clear cut as “ignoring restrictions = bad”.
The restriction itself can be unethical, in the same way that bypassing a restriction can be unethical.
BoorishBears
an hour ago
Woah now, I'm for headless browsers but let's not start comparing any of this to Rosa Parks lol.
The reality is a lot of interesting, trivially harmful to non harmful things are illegal and we still do them anyways.
windexh8er
3 hours ago
Look at what Google's doing right now with Chrome. On June 30 Chrome will remove the last flag that let uBlock keep working, and there's no workaround. Google says it's about security and performance, but is it? $239 billion in ad revenue last year seems to be the motivational factor. The "restriction" is a rule written by the company that profits when you can't block its ads, dressed up as protecting you. But... CISA recommends ad blockers as a defense against malware spread through ad networks.
The rules aren't always right and sometimes have unintended consequences. I think a bigger issue than Browser Use is all of the copyrighted material in every LLM. Given that precedent has been set with zero legal consequences, I'm not sure there's much of a leg for you to stand on here.
embedding-shape
3 hours ago
> Its unethical because you're intentionally bypassing restrictions
I'd still consider why the restriction is there and why I'm thinking of breaking it, before deciding if it's unethical or not.
It depends, basically. Generally I follow the rules and restrictions, but maybe see them more as guidelines or suggestions.
kube-system
2 hours ago
There are many ethical reasons to bypass restrictions. Colloquially, we just call them exceptions.
There are many valid ethical exceptions for evading anti-bot detections. For example: you are a white hat actor scraping a black hat site. There are hundreds of other plausible examples.
jamiequint
3 hours ago
You're confusing law with ethics, they are not the same.
joatmon-snoo
4 hours ago
An example I ran into recently: I wanted to scrape pricing data for used cars, to better inform a friend's decision about what to purchase.
I know there's a relationship between mileage and depreciation, but wanted to have a better sense of what that relationship is to know whether a given car was over or underpriced.
Similarly, if I was pulling that data to build a service of my own to offer to users... is that unethical?
sroussey
4 hours ago
All of these questions are easily answered by the question: can I run the bot on the same PC I use regularly? If so, then do it there. If not, then don’t do it at all.
adolph
an hour ago
> scrape pricing data for used cars
Time was you could get lovely json feeds from every site by iterating the inspector curl statement. Now-a-days you can't even use Selenium without Cloudflare getting grouchy. Last fall had to make my spreadsheet like a cave-person control c, control v. It wouldn't be so bad if the dealer aggregators' coverage was xor, but you have to dedupe listings. Then there is the whole online salespeople who don't show up at the dealership.
skybrian
4 hours ago
What do you think of Anubis and Cloudflare? If they block your bot, is that unethical?
Seems like doing business with other people should normally be based on mutual consent, not whatever you can get away with technically.
wnevets
5 hours ago
> Who uses service providers like this?
People who don't want their headless browser to get blocked?
nateb2022
4 hours ago
> Seems very unethical, no? Who uses service providers like this? The whole point of anti-bot measures is to get rid of bots - you are not wanted there.
I'm familiar with companies automating access to software only accessible via the web with poor/no API support. This is software they pay (usually a lot of money) for, and usually has built in captchas to guard logins. They aren't a large enough customer to ask the removal of these captchas or whitelabelled (just one out of many SaaS tenants), so they simply work around that restriction.
mystifyingpoi
4 hours ago
> Seems very unethical, no?
I don't think one can judge it ethically without considering the context. Are we talking about mass automated scraping? Or are we talking about me trying to get a good deal by scraping local used car dealership listing once per day for my personal need (just so I don't have to do it manually)?
One of these is strictly more ethical, but both will be blocked by Cloudflare for example. I'd happily use such service in my personal case.
dagi3d
3 hours ago
Obviously don't know what percentage represents "legit" use cases vs other more morally questionable, but in our case we have a cms where content team can include external links and we need to verify periodically whether those links work or not, which is not as easy as making get requests with a client.
sillysaurusx
4 hours ago
(I haven't tried this out yet.) My use case would be to take a snapshot of each HN story. This is surprisingly hard, because most websites prevent bots from doing that.
For example, Claude has a lot of trouble reading HN's front page. HN itself is fine, but the moment you ask it to pick out an article, it often chokes. The website has put up a verification captcha, or it's a paywall, etc. Paywalls can be bypassed by reading HN comments and looking for archive links. But those archives often block bots too, so you're back to square one.
Whether it's unethical is an interesting question. I believe I should have the right to do what I want with internet content, as long as I'm not abusive. Merely having a bot isn't abusive. It would be one thing if the bot is hammering a server or vacuuming up training data, but having a bot at all is presently very hard.
This service caught my attention because it could potentially solve the problem I'm running into. Simply taking snapshots of articles that hit HN shouldn't be so hard, but it is. HN sends millions of views to websites; one bot taking a snapshot isn't going to make a difference. I don't think it counts as "unethical" just because we're going against the website owner's wishes. When you post content to the internet, you sign up to share that content with everyone, other than what's denied by robots.txt. If it's not blacklisted by robots.txt, it should be possible for well-behaved bots to access.
I don't expect very many people here to care about the poor bot creators. Most of the bot creators are malicious anyway. But I personally lament the loss of being able to write a program that can process information from the browser in arbitrary ways. You should be able to, yet we're buying into the notion that it's okay for website owners to say "this content is only accessible by approved bots like Google, and everyone else can sod off."
HN proves it doesn't need to be like that. It gets dozens of millions of page views a day, a lot of which is bot traffic. HN only uses captchas for creating accounts or logging in. You're free to scrape any content as long as you respect the crawl delay of 30 seconds specified in robots.txt, and don't try to visit links that perform actions a human would take (like adding things to favorites or voting). That's how the internet should work: just deliver content.
dist-epoch
3 hours ago
> one bot taking a snapshot isn't going to make a difference
until half of HN users start asking their agent to do the same, to summarize the top HN articles every day
ge96
4 hours ago
I briefly tried to do his job where it was scraping steam for CS GO skins (think a knife skin for $2,000.00) and yeah trying to find proxy poviders/get around the ip limit... tough one but market for it people paying for the tool (not mine).
figmert
3 hours ago
Antibot measure also block real users at the slightest change they don't like. Anti-fingerprinting measure? You're a bot. Adblockers? You're a bot.
__alexs
3 hours ago
There's no ethical consumption of... ad supported content.
cute_boi
4 hours ago
Exactly these crappy companies like browser use is causing more captcha etc.. All these scraper companies should've been regulated heavily. They use residential proxy creating incentive for hacking IOT devices etc..
stogot
4 hours ago
I wish simpler bots existed for consumers. I want to know when someone replies to me, when a price drops, when airlines open new seat reservations, when a new seat opens for a college class, when a concert is coming to my area for a musician I listen to, when my local grocer has new stock, when a new Hyatt offer is available in a city I want to visit, etc. doesn’t mean I’m abusive. I can have it check once a day. In almost all those cases, I want to spend money with the business but I don’t want to manually check
hollerith
3 hours ago
The people who've been in charge of the web (i.e., mostly the browser makers, but also the owners of the most popular sites) have made decisions that are IMHO severely anti-user. Although these anti-user design decisions have been accumulating for 30 years, users have had no alternative because all the content was on the web with way to get it other than to visit web sites with a web browser.
Now that there is an alternative (namely AI) people (including me) are flocking to the alternative. You want frame this as unethical bots versus ethically-acceptable human site visitors, but the main motivation for the use of scraping bots these days is to provide services (i.e, AI-based question answering) that users (like me) consider far superior to going directly to web sites for information because visiting web sites with a web browser is a frustrating tedious experience.
ranger_danger
4 hours ago
Web archival/preservation services/projects that need to get past captchas and other bot checks are a prime target for a service like this... but I think their main customers are people just mass scraping parts of the internet for less altruistic reasons.
zuzululu
4 hours ago
Once again I'd like to remind that violating Terms of Service isn't the same as violating some moral ethics. They are literally just expectations with no enforceable or legal boundaries.
For example I could write in my Terms of Service that you do not view more than one page on my website and expect you to send me a written permission to read the rest. I don't expect anybody to follow and I sure don't think less of those that do.
The push for verified IDs is not related to this, its more of a politically motivated attempt at selling fear to justify more surveillance.