hackernews client

CAPTCHAs: 'a tracking cookie farm for profit masquerading as a security service'

198 pointsposted 2 months ago

142 Comments

jp191919

2 months ago

I'm at the point now that if I get a CAPTCHA, I'm just going to leave the site. I'll spend my money elsewhere or find an alternative

a2128

2 months ago

My government's websites require solving a reCAPTCHA for basic services, which is horrifying. They also use Cloudflare which blocks me sometimes. This is in the EU

Confirming this. I am also completely certain that gratuitous CAPTCHA use is banned for government systems by my country's set of laws governing their implementation. The judicial system and the community have not matured enough to consider this a breach of law worthy of fighting against...

openplatypus

2 months ago

Name and shame, please!!

ReCAPTCHA due lack of opt out is effectively illegal in the EU.

phoronixrly

2 months ago

reCAPTCHA (and others based outside of EU) is illegal on privacy ground (in any site, not just owned by EU entities). Homebrew CAPTCHAs are illegal due to their general lack of accessibility (in any site owned by an EU entity), and in Bulgaria their gratuitous use is banned in government sites on account of them being poor UX (not enforced unless caught during the acceptance phase of a project).

An example of an inaccessible homebrew CAPTCHA that causes very poor UX can be found on the portal that provides access to the legal acts of the Bulgarian judicial system: https://legalacts.justice.bg/ . Try taking the legal system to court. I tried for this one, you can see for yourself how it went.

veeti

2 months ago

If it's "effectively" illegal can you name a single court decision saying so?

cccbbbaaa

2 months ago

In my country: MED-2020-015 (against the ministry of health), SAN-2023-023. Maybe more.

openplatypus

2 months ago

Sure thing

https://wideangle.co/blog/is-recaptcha-illegal-under-gdpr#cn...

bmacho

2 months ago

Where in the EU? Maybe you can file a GDPR complaint

cyberax

2 months ago

This automatically means that you're penalizing smaller websites. And killing off the independent alternatives to Reddit/Disqus. Do you want this?

Large sites like Amazon or CNN can afford to eat the bot traffic. Smaller sites can't.

cryptoegorophy

2 months ago

Problem isn’t a bot traffic. I run an Ecommerce site and scammers run python scripts to test 1000s of cards per hour if there is no captcha. I hate it, my customers hate it, scammers hate it, but it is the only thing that keeps my merchant account running. Any advise is welcome!

technion

2 months ago

Logon forms are another whole issue. "Lock out the account" is just a DoS vector. People are quick to talk about systems that can defeat a captcha but if the brute force goes from 50 passwords/sec to one password/10 sec it's mission accomplished.

LightHugger

2 months ago

Can't you just put a 5 second "loading bar" delay instead of a captcha then i wonder?

mike_hearn

2 months ago

Not easily: if it's enforced client side it may as well not exist, if it's enforced server side you just let anyone lock anyone else out of their account by running a constant brute force attack against their account (a DoS vuln). It also does nothing for attackers who try a giant list of accounts but only one or two passwords for each.

I worked on Google's system for solving this. It's a pretty sophisticated analysis that decides whether to demand CAPTCHAs or not based on a lot of factors. The CAPTCHA was needed (at that time) because it's the only way to slow down the attacker without bothering the real user, who won't be shown one.

LightHugger

2 months ago

Serverside of course, but i would think the loading bar can be per connection rather than be per account right? Like a connection is attempted, starting the loading bar, and then 5s later you only allow that connection to continue the load? I do non-web dev stuff so maybe i'm missing something but it sounds like should be easy enough.

mike_hearn

2 months ago

What stops you dropping the connection the moment you realize you're being loading barred.

LightHugger

2 months ago

Presumably the point is that the user (or bot) wants to access the content, a connection would have to complete the load successfully to do what they came for. if they just drop it instantly then that's a bot turned away.

mike_hearn

2 months ago

But you don't want to throw an arbitrary five second delay into login for every good user, they hate that sort of thing whereas the bot doesn't care.

account42

2 months ago

> without bothering the real user, who won't be shown one.

bullshit

mike_hearn

2 months ago

Lots of signals used to prevent people seeing captchas when their passwords were being attacked, but you never see it so never think about it.

user

2 months ago

[deleted]

aja12

2 months ago

If you do that on the server side, per account, it works. Small DoS risk, but it remains acceptable

aja12

2 months ago

If you do that on the server side, per account, it works. Small DoS risk, but it remains acceptable

onetokeoverthe

2 months ago

would requiring a un/pw sent to an email address work￶?

Zak

2 months ago

> killing off the independent alternatives to Reddit/Disqus

I haven't encountered a captcha using Lemmy. There might be one on some servers for account creation.

KennyBlanken

2 months ago

What are you on about?

I've used Amazon from the same IP address for years and I still regularly get the "you look like a bot, solve this" crap.

mouse_

2 months ago

Did you read the article? What you said directly goes against the study's conclusion.

cyberax

2 months ago

I'm helping a neighbor to run a small e-commerce website with reviews. Review forms are being spammed by bots that get even through CAPTCHAs, and the owner needs to clean them up constantly. Without CAPTCHAs, it becomes unsustainable.

They don't get a lot of bots trying stolen credit cards, but mostly because they are pretty niche.

theamk

2 months ago

I can believe study's results on user interaction, but their "security analysis" section (6.2) is deeply flawed - it only looks at the best bots, and not at the average ones. Meanwhile, as many other people in this thread can attest, (1) most of the bots are not really sophisticated, and get stopped by CAPTCHa, (2) the defense does not have to be 100% efficient, as long as form spam goes from 100/day to 1/day, things are OK.

Of course authors really wanted to write their conclusion, so they just ignored all the practical considerations. It's really a shame on the part of paper's reviewers.

NoGravitas

2 months ago

My thinking is that these days, the unsophisticated bots will still be stopped by literally any effort, like a hidden form field that causes the form to be rejected if it's filled in. Almost nothing will stop sophisticated bots, and nothing will stop a boiler room. This doesn't really leave a place for more sophisticated CAPTCHAs.

noah_buddy

2 months ago

Sounds a heck of a lot like the bots are killing off these websites. Gross overuse of automated scraping is a fact of life but individual choice is intolerable. What if I told you they were the same thing?

cyberax

2 months ago

Yes, bot traffic is killing the open web. What's your point?

phoronixrly

2 months ago

Regulate against inaccessible and privacy invasive CAPTCHA (done and done in EU) and regulate against disruptive bot traffic (what's the hold-up there? Oh, I see, it requires actual competent law enforcement... That's a hard one). Me only half-joking while standing on my high EU horse.

cyberax

2 months ago

Have you seen this: https://trog.qgl.org/20081217/the-why-your-anti-spam-idea-wo... ? You're proposing a legislative change, and bots in the US, Russia, China, whatever can care less about the acts of the EU parliament.

I also don't see how EU solves issues with CAPTCHAs. Anonymous CAPTCHAs are allowed in the EU.

phoronixrly

2 months ago

Thank you for that, though I'd rather choose 'The police will not put up with it' and either 'The politicians are too incompetent' or 'Underestimate how much money there is in it'

Anonymous CAPTCHAs are fine so long as they're accessible for people with disabilities. I would not venture to say I know of such one as a service...

im3w1l

2 months ago

If you are a EU government wanting a no-captcha experience on your website you could solve it thusly:

* Don't have captcha.

* Make it illegal for bots to access.

* Block foreign ips.

* Make it illegal to provide a proxy for foreign bots.

PoignardAzur

2 months ago

Yeah, that's great until you're a european citizen who needs to access a government service while travelling in the US, or living in French Guyana, or any amount of exceptions to your clever idea.

avgd

2 months ago

If you travel to places unwilling to enforce basic rules of civility you should be willing to suffer additional consequences, rather than having the entirety of Europe continue to suffer because we are unwilling to do the right thing, which is to put an end to the far west of the internet.

Countries unwilling to sign regulations that would have them lock up scammers/DDoS botters and other toxic e-criminal, or unwilling to enforce their own laws when they exist, should not be allowed to continue to pollute the internet at large. Block them until they learn their lesson.

In the west, you can and will lose your access to the internet even if you are not doing this sort of shit on purpose but have an infected computer. See for example : https://it.slashdot.org/story/05/04/13/0320249/major-aussie-...

We enforce the rules on our own citizens, why are we tolerating this level of criminal traffic from China, India, and, the worst of them all, Russia, the country through which we are very much fighting a proxy war right now in funding Ukraine?

We can send missiles to kill russians but we can't cut them from the internet at large ? Really?

Internet access is not a human right. Just like driving on the roads is not a human right and terrible drivers get their license revoked.

im3w1l

2 months ago

I guess you could offer travelers a captcha. Also note that the fourth point didn't prohibit proxies in all cases, just proxying for bad things.

phoronixrly

2 months ago

Also confirming that banning foreign traffic is the go-to eGovernment without CAPTCHA experience in Bulgaria.

ChadNauseam

2 months ago

Once regulations against disruptive bot traffic are effective enough that you don't need captchas, please ban captchas. But I don't see why you would do it the other way around. (Unless you secretly know that effective regulations against bot traffic are impossible.)

phoronixrly

2 months ago

Oh I do believe they are possible. It is just high time that the units fighting against organised crime heavily expanded their scope to illicit online activities that actually cause financial harm by entities to entities smaller than the movie/music/publishing industries.

j16sdiz

2 months ago

> (Unless you secretly know that effective regulations against bot traffic are impossible.)

Have you tried register your phone number in the DO-NOT-CALL list (or your local equivalent)?

Did it stop anything?

Dotnaught

2 months ago

Google addressed the claims in this paper last year, and one of the authors challenged the company's responses. See: https://www.theregister.com/2024/07/24/googles_recaptchav2_l...

kevin_thibedeau

2 months ago

As of two weeks ago my locked down Firefox profile gets hit with captchas on every visit to Google search. DDG has also gone to shit with captchas and stupid low cache lifetime because I use their non-javascript site. I'm giving Bing a test run before making the leap to Kagi.

yegg

2 months ago

We (at DuckDuckGo) shouldn’t have a lot of captchas and when we do (intended to keep away non-human traffic) they are completely anonymous, self-hosted, and not related to any AI or other machine learning. As such, I’d love to figure out why you are getting ensnared in them when you aren’t supposed to. If you want to reach out via email (see my profile) I will look into it.

user

2 months ago

[deleted]

EVa5I7bHFq9mnYK

2 months ago

Try also Startpage, it doesn't give me any captchas even though I am a career criminal: guilty of adblocking under Firefox influence, while commiting a VPN. They also have a nice Anonymous view.

vitehozonage

2 months ago

You might want to try Mullvad Leta, it's what i use for this issue. I would try Kagi if it could be used privately but i suppose it still requires an account and has no way to pay privately

eykanal

2 months ago

The problem with this paper is that, while technically true, there are many website owners who have found that CAPTCHAs have effectively reduced the spam on their site to zero. The fact that a CAPTCHA _can_ be bypassed doesn't mean that it _will_, and most spam bots are not using cutting-edge tech because that's expensive.

To say "it's worthless from a security perspective" is a pretty harsh and largely inaccurate representation. It's been tremendously useful to those who have used it. If it wasn't valuable, it wouldn't be so widely used.

Definitely agree with the whole "tons of free $$$ for Google", but that's kind of their business model, so yeah, Google is being Google. In other breaking news, water is still wet.

Scaevolus

2 months ago

Far too many people talk about security as if it's a simple binary and not about effort levels and dissuading attackers.

rachofsunshine

2 months ago

People really struggle with things that have measurable, probabilistic effects. You see it with healthcare ("Steve smoked his whole life and never got cancer, so cigarettes aren't bad for you!"), environmental effects ("Alice was poor and she didn't rob anyone, so poverty is no excuse!"), hiring ("Charlie is a great employee and he had no experience, so you should never look at backgrounds!"), etc.

It should be a general standard of proof for any sort of sociological claim that you look at rates, not just examples, but it usually isn't.

jchw

2 months ago

Well, I would at least ask what the baseline was. The vast majority of websites on the internet don't really have to deal with sophisticated bot traffic, and a very simple traditional CAPTCHA, one that can be trivially solved using existing technology, will also cut SPAM to zero or very close. I don't know exactly why this is, but I suspect it's because most of the bot operations that scale far enough to hit low volume websites are very sensitive to cost (and hence unlikely to deploy relatively-expensive modern multi-modal LLMs to solve a problem) and not likely to deploy site-specific approaches to SPAM.

There are a lot of things that can trivially cut down SPAM ranging from utterly unhelpful to just simply a bad idea. Like for example, you can deny all requests from IPs that appear to be Russian or Chinese: that will cut out a lot of malicious traffic. It will also cut some legitimate traffic, but maybe not much if your demographics are narrow. ReCAPTCHA also cuts some legitimate traffic.

The actual main reason why people deployed reCAPTCHA is because it was free and easy, effectiveness was just table stakes. The problem with CAPTCHAs prior to reCAPTCHA is simply that they really weren't very good; the stock CAPTCHAs in software packages like MediaWiki or phpBB were just rather unsophisticated, and as a double whammy, they were big targets for attack since developing a reliable solver for them would unlock bot access to a very large number of web properties.

Do you need reCAPTCHA to make life hard for bots, though? Well, no. Having a bespoke solution is enough for most websites on the Internet. However, reCAPTCHA isn't even necessarily the best choice even for something extremely high-volume. Case-in-point, last I checked, Google's own DDoS protection system still used a bespoke CAPTCHA that largely hasn't changed since the early 2010s; you can see what it looks like by searching for the Google "sorry" page.

I agree that reCAPTCHA is not "worthless" but it's worth is definitely overstated. Automated services that solve CAPTCHAs charge less than a cent per-solve. For reCAPTCHA to be very effective against direct adversaries rather than easily-thwarted random bots, the actual value of bypassing your CAPTCHA has to be pretty damn low. At that point, it's very reasonably possible that even hashcash would be enough to keep people from SPAMing.

chrbr

2 months ago

Yeah, we've used CAPTCHAs to great effect as gracefully-degraded service protection for unauthenticated form submissions. When we detect that a particular form is being spammed, we automatically flip on a feature flag for it to require CAPTCHAs to submit, and the flood immediately stops. Definitely saves our databases from being pummeled, and I haven't seen a scenario since we implemented it a few years ago where the CAPTCHA didn't help immediately.

Reminds me of the advice around the deadbolt on your house - it won't stop a determined attacker, but it will deter less-determined ones.

lavezzi

2 months ago

Cutting edge tech like paying cents to captcha solving services?!

theamk

2 months ago

Yep. You'd be surprised how stupid some bots are.

(And while I don't have hard data on this, I suspect that bot authors that don't know how to properly set up rate-limits and don't know how to set up captcha solving service bypass, so captchas are especially effective against them)

lavezzi

2 months ago

You can easily get an API key from 2Captcha, AntiCap, CapSolver, etc. - it costs basically nothing and it works.

btown

2 months ago

The "cookie farm for profit" point is worth elaborating on. From the original paper https://arxiv.org/pdf/2311.10911 :

> More concretely, the current average value life-time of a cookie is €2.52 or $2.7 [58]. Given that there have been at least 329 billion reCAPTCHAv2 sessions, which created tracking cookies, that would put the estimated value of those cookies at $888 billion dollars.

The cited paper is https://www.sciencedirect.com/science/article/pii/S016781162... - but it doesn't deal with CAPTCHAs, just with the general economics of third-party cookies.

In practice, many of these cookies will have already been placed by other Google services on the site in question, with how ubiquitous Google's ad and analytics products are. And it's unclear whether Google uses the _GRECAPTCHA cookies for purposes other than the CAPTCHA itself (in the places where this isn't regulated).

But reCAPTCHA does gives Google an ability to have scripts running that fundamentally can't be ad-blocked without breaking site functionality, and it's an effective foot in the door if Google ever wanted to use it more broadly. It's absolutely something to be aware of.

ghuroo1

2 months ago

That made us spend 819 million hours clicking on traffic lights to generate nearly $1 trillion for Google.

voisin

2 months ago

At an approx 750,000 hours in a human lifespan, they wasted 1100 human lives in totality. Unbelievable.

thechao

2 months ago

There's a dystopian short story in your comment about AI that can't self-bootstrap without ground-truth from humans, so they keep us around just to mark images, music, etc. Lives wasted annotating things. I like to think they'd drag us from solar system to solar system for this purpose.

taftster

2 months ago

Gosh. This is too perfect. I feel like you've just captured the exact moment we're living in.

djmips

2 months ago

So that's what they are doing inside the pods in The Matrix!

extraduder_ire

2 months ago

Does solving captchas generate $1000/hour? I assume you're conflating amounts here, or messed up an order of magnitude somewhere.

rozab

2 months ago

That's just the headline of the article.

The researchers put the vast majority of this value to tracking cookies, and this revenue happens whether or not a manual challenge is completed.

scubadude

2 months ago

That's just the direct value, how about the whole dimension of training the AI models

ChrisArchitect

2 months ago

[dupe] Earlier: https://news.ycombinator.com/item?id=42997755

https://news.ycombinator.com/item?id=42970780

breppp

2 months ago

I get that people are here to hate on Google, but I am just here to say that reCAPTCHA albeit acquired, is an absolutely brilliant idea. The kind that solves two (three? if you count tracking) problems so elegantly

phoronixrly

2 months ago

Absolutely agreed on the 'very elegant solution for global-scale tracking' part!

extraduder_ire

2 months ago

The people who created the initial version that got bought went on to create duolingo, with a similar goal of getting people to produce translations of text.

therein

2 months ago

Multi-purpose trojan horse. Not only will it look beautiful in your city but you can use it as scaffolding to repair tall buildings or children in your community could use it as a play gym.

darkwater

2 months ago

Naive question: how can clicking on the motorbike or traffic light image help to train an ML algorithm if they already know what image has a motorbike in it, or otherwise the captcha would not make sense. Maybe they put 3 image which are already with a score of >0.90 and one which is just 0.40?

mbb70

2 months ago

Yes, known images are used for validation, unknown images are used for training.

user

2 months ago

[deleted]

inetknght

2 months ago

> Naive question: how can clicking on the motorbike or traffic light image help to train an ML algorithm if they already know what image has a motorbike in it, or otherwise the captcha would not make sense.

It's more than just your answers that are fed into ML and more than just what others have already said: there's also the way that your browser functions and the way you interact with it. Your IP address, browser, OS, screen size, input type, timezone and current time of day, how fast do you select different images, etc etc. All of this gets fed into ML algorithms and answers to the obvious images are used as corollaries to support/deny your ancillary information.

michaelt

2 months ago

Hypothetically speaking, if they've got a 97% good ML model, they could implement a captcha where if you disagree with their model you have to do a second image, and a third image and so on. Then they could show each image to several different humans, and only if a bunch of people disagree with the model do they take a closer look.

Frankly a lot of the images I get are... kinda easy? This isn't the classic book-reading recaptcha where you could see why the text had confused the OCR.

hyperman1

2 months ago

They ask the same question to multiple people. Whatever the majority answers is right.

woleium

2 months ago

they ask you to solve two. one they know, the other they don’t

DougN7

2 months ago

I’m not sure. If I don’t click on one that is a bus it won’t let me forward. It’s not like I click an “Ok, I’m done” button. I guess we could all delay clicking and maybe it would give up and assume the unknown bus wasn’t really a bus after all?

user

2 months ago

[deleted]

pupppet

2 months ago

What's the alternative?

josefresco

2 months ago

I can tell you on the small level asking a simple question to activate the form action stops 99% of spam. Something like "What color is snow?" Granted, with a well trained "AI" system solving these questions would be trivial but I have yet to see it in practice.

dewey

2 months ago

Sounds easy, but at this point everyone is trained to solve these captchas and implementing the questions is not a quick thing either on a bigger scale (Translations, cultural differences, bots easily bypassing them etc.). I've used captchas on my sites before because bots were just hammering the login form, checking checkboxes and causing me to rack up email sending bills.

phoronixrly

2 months ago

Sorry for nitpicking but you need a puzzle that is knowledge-agnostic (be it cultural or scientific), otherwise you're guarding your site from both bots and people unfamiliar with the concept of or lacking the pre-existing knowledge necessary to solve the puzzle.

What colour is snow is close but you can't assume that everyone knows what snow is, let alone what colour it is. This includes both people with disabilities and in parts of the world where there is no snow...

josefresco

2 months ago

I agree, and thankfully we're dealing with mostly regional visitors to small local business/organization websites. Not a global audience. That being said, it's hard to think of a simple question, with little to no ambiguity.

Once example is for a landscaper: What is the color of healthy grass?

The answer is "green" of course, but grass is common in our region. That question would not work in a culture or region unfamiliar with "lawn grass".

phoronixrly

2 months ago

Yes, I would go for simpler stuff (word or digit puzzles) and package it in a way that is friendly for screen readers. So... No images or video, or at least one alternative to them that at the same time does not make it easy for the bots...

This has the added benefit that translators will be forced to come up with a translation that makes sense when your projects gets to a point that it needs i18n.

throwaway7283

2 months ago

> What colour is snow is close but you can't assume that everyone knows what snow is, let alone what colour it is. This includes both people with disabilities and in parts of the world where there is no snow...

Google will happily ask you to point out which squares contain fire hydrants. Is there a captcha that meets your standards?

phoronixrly

2 months ago

Yes, I also saw the post about the fire hydrants the other day. No I was not able to confirm that fire hydrants are shown in countries in which they are not a common sight.

However, I am far from arguing in favour of reCAPTCHA. It is also an example of shit CAPTCHA that also bans people. I am often one of these people.

No there is no example of a CAPTCHA-as-a-service that I know of that I would be fine to impose on my users.

idunnoman1222

2 months ago

There are no humans that know the word snow who don’t know what colours Snow is

phoronixrly

2 months ago

> There are no humans that know the word snow who don’t know what colours Snow is

Sorry, I don't follow, English is a second language to me, but how does this stand against my statement that 'many people don't know the concept of snow, let alone what colour it is'?

harshreality

2 months ago

There's no reason for an English language website to cater to people who don't know what snow is. How can it be discriminatory to have a question a user can't comprehend, when they won't be able to comprehend the rest of the website either? Even blind people who can read English Braille and input text in English know that snow is white, even if they've never seen it.

If a website is multilingual, it can offer language/region selection and add appropriate questions for each of them.

phoronixrly

2 months ago

I did not say it was discriminatory -- I stick to basic terms -- you may inadvertently be guarding against people who for one reason or another don't possess the knowledge to solve the puzzle. For example I could copy over an integral from one of my undergrad exams. 'Please calculate the value of the integral and enter it in the field below' (completely accessible to screen readers as well). This would effectively ban not only people who have not taken a calculus class, but many of my uni colleagues who have happily forgotten everything about calculus after they took their exams 10 years ago...

Another example for an inadvertently hard puzzle, this time due to a lack knowledge as a consequence of being part of a different culture, would be asking US people what colour is the edelweiss. In my country children learn about it in first grade if not in kindergarten. Another -- asking Europeans/US people what colour is romduol... I don't consider this discriminatory, I don't consider people in the US or Europe uneducated because they cannot solve such a simple puzzle... It is just poor/lazy/stupid design that fails the single requirement to block bots and only bots. And I get it 'I would just google it'... But how many conversions will you lose if a considerable part of your users need to google something to go to the next step of your funnel? It's just inexcusably shit UX...

You would indeed be fine with the 'snow' question if your site must only be visited and used by fellow citizens of your country (where citizens implies similar education -- both cultural and scientific). You would indeed be fine if you can make sure the puzzle will be translated intelligently (including the solution) if your site may be used in a foreign country or by users speaking the language in your own country.

I usually cannot make any of these assumptions for any of the projects I work on. The site's audience is but a whim of the Product team, and I18n is outsourced to (once) translation agencies and now directly to an LLM... This can even be done (and frankly should be done) without the knowledge or input of the dev team. Also, neither translators nor LLMs can be expected to understand that they must come up with basically a new puzzle that will not be hard for people that use the specific language. And I as a developer that does not speak the specific foreign language while I can roughly validate their translation (if by any chance it passes by me for review and I go above and beyond what is expected of me and pass it trough a translation service) and return it with feedback for fixes, I cannot rely that they will abide by the feedback, or how long it would take... Those are a lot unknowns to consider these assumptions reliable, and it seems much less effort to come up with a simpler puzzle that contains the answer in itself... Its effectiveness against spam will be exactly the same.

Also, you will definitely not be fine if your puzzle contains a concept foreign for a considerable part of people who can't for example see or hear. You would also not be fine if your puzzle's technical implementation makes it impossible to be perceived by them. The latter part is very simple to get wrong. For example, one of the best ways to protect any site from blind people is to implement a hero image slidshow that steals the focus on each slide. Their screen readers' focus gets moved each second and they literally cannot perceive, let alone navigate the site...

Finally, none of the peculiarities above excuses straight up going for reCAPTCHA. Even if you don't give a f about your users' data EU users can and will get you in trouble with EU regulators exactly when you get to a scale at which CAPTCHA use is a necessity. There's a cultural difference for you.

LordKeren

2 months ago

e2le

2 months ago

There are two alternatives I'm aware of, one is Attestation of Personhood[1] proposed by Cloudflare, the other is a proof-of-work[2] which the Tor project have themselves introduced[3].

[1]: https://blog.cloudflare.com/introducing-cryptographic-attest...

[2]: https://github.com/mCaptcha/mCaptcha

[3]: https://blog.torproject.org/introducing-proof-of-work-defens...

jszymborski

2 months ago

While I get the draw, I never understood how PoW is ever supposed to work practically.

PoW tasks are meant to work on a wide range of mobile phones, desktops, single-board computers, etc... you have vastly different compute budgets in every environment. For a PoW task that is usable on a five year old mobile phone, an adversary with a consumer RTX 50 series card (or potentially even an ASIC) can easily perform it many, many, many orders of magnitude faster.

Am I missing something?

johnmaguire

2 months ago

PoW isn't meant to make something impossible, it's meant to attach a cost to it. Now you need to extract a value higher than the cost.

jszymborski

2 months ago

I understand that, but what I'm saying is that due to the wide gulf between the compute budget of the slowest device one is meant to support and a couple commodity VPSs adversaries need anyway to conduct a DDoS or to spam, there is ostensibly no extra cost.

In fact, all you are doing is slowing down legitimate clients with old equipment and doing nothing against adversaries.

phoronixrly

2 months ago

I've seen a PoW CAPTCHA https://github.com/mCaptcha/mCaptcha and at the time it did not make any sense to me. I would still get spam, just a tiny bit slower, and spammers would have to expend more resources for just my site, which would barely register on their bill.

I bet that requiring JS stops more spam than the PoW itself. Can anyone who tried it chime in?

Oh, I see, it's effective against 'someone [who] wants to hammer your site'. That is usually never the case with my sites. I do get a steady stream of spam, but it is quite gentle as to not trigger any WAFs. The load comes from LLMs scraping this everliving shit of my sites and fortunately they don't seem to bother with filling in forms...

lq9AJ8yrfs

2 months ago

You are not missing something, you are finding it: the game theory of bots vs anti-bots is subtle and somewhat different from regular software engineering and cyber security.

For the most part bots wish to be hidden and sites wish to reveal them, and this plays out over repeat games on small and large scales. Can be near-constantly or intermittently.

The bot usually gets to make the first move against a backdrop that the anti-bot may or may not have a hand in.

jszymborski

2 months ago

Are you suggesting that ultra-quick solves would be a signal that a user-agent is malicious? That's interesting...

vitehozonage

2 months ago

Perhaps you think all PoW algorithms are still crackable by ASICs? A few years ago that was the case, but some years ago Monero developers made a breakthrough with RandomX. Now it is no longer true that a GPU or ASIC can outperform a typical consumer device to the extent that you seem to imagine. The Tor project uses a similar algorithm, i think with the same developer contributing to it as RandomX. It is nothing like bitcoin's SHA256 PoW - with that, the performance of an ASIC does indeed mean a consumer PC becomes completely useless at the algorithm

theamk

2 months ago

Will RandomX work on the old cell phones, via Javascript interface only?

The website says: "Fast mode - requires 2080 MiB of shared memory. Light mode - requires only 256 MiB of shared memory, but runs significantly slower"

If you want your website challenge to work on the cheap phone - slow CPU, with little memory, and when implemented in Javascript, you'd have to tune complexity way down. And when a modern PC with fast CPU and tons of memory tries to solve it.. it probably will take only a few milliseconds, basically being useless.

vitehozonage

2 months ago

I don't know, I dont understand the details and your reasoning is confusing for me. My understanding is that the effectiveness of particular hardware is complex to predict; it depends on the sizes of the CPU caches and effectiveness at certain instructions, and the algorithm can of course be tuned in all sorts of ways. The Tor project is already using it so presumably it is working for them to some extent. More info here: https://blog.torproject.org/introducing-proof-of-work-defens...

nonchalantsui

2 months ago

Since this was focused on v2 and other interactive captcha, the alternative is to upgrade to new versions that don’t do so. Still some downsides (and the study does address very briefly the use of AI to trick v3), but at the very least it does address some of the concerns.

Important to note though that as AI gets more accessible then the downsides of v3 start to weigh more.

eurekka

2 months ago

I've been using Truesign [0] for several months and been impressed with the results, it detects bots, VPNs, disposable emails and suspicious traffic patterns all in one. I use it to protect my payment form but it seems it can also protect APIs.

Still in beta though.

[0] https://truesign.ai

Zak

2 months ago

For a lot of places where I've encountered captchas, they could just do nothing. Simple rate limiting should probably be the next step. It's not one-size-fits-all of course.

atoav

2 months ago

Building your own captcha or running one that doesn't sell your users data to the highest bidder?

What a time where people on a site called "Hacker News" ask such a question..

phoronixrly

2 months ago

And if you ever get so big that people start writing bespoke software to break your CAPTCHA, then investing some more engineering effort into it will quite likely not be a problem.

Of course reCAPTCHA is also still vulnerable to the use of a mechanical turk so even giving away your users' data won't save you.

ThatPlayer

2 months ago

I've come across a CAPTCHA on a website I was scraping that was absolutely terrible. It was 10 multiple image choice answer, with a question to click the image that had "X". Their implementation didn't even have a nonce, so I would just attempt every single answer and get past it.

gtsop

2 months ago

I think we need to critically re-evaluate what is it exactly we are doing on the internet, how we do it, and examine existing assumptions. For instance, do we really need all services to be centralised? Do we really need services to be "free" (part of the payment is selling your data ok). A server serving static files doesn't care about bot users, but apps... why would you let a stranger use your cpu/ram over the internet? I know i am not providing an answer but i believe we need to take a look again at all of these before we try to come up with an answer

theamk

2 months ago

Who are "we" and what are "we" going to do with the answer once "we" come up with it?

For example, there is a someone's personal blog, which is beset by comment spammers. The blog's owner is tired of deleting spammy comments, and do not want their comment section to look like garbage bin, so they want some bot protection. The website's author is not that technical, so they do some googling and install reCaptcha (or cloudflare) and this cuts off bad comments to 1/week, which is easy to clean manually.

So in that story, who should be re-evaluating what, and what answer do you expect?

(keep in mind the blog's author cannot host their own captcha service / AI bot detector, as they are not proficient enough to install all the required dependencies for such a complex task, nor is their VPS powerful enough to keep it running.)

gtsop

2 months ago

"we" is another word for "the people building the systems living on the web"

Once, "we" come up with the answer we are going to build based on the newly defined assumptions.

Here is what critically re-examining everything means: Do i really need to have a comment section? And if I do, shall I be the one responsible for managing it's technical infrastructure? The answer could be to use an off-the-shelf solution in your particular story. When you pivot to the side of the off-the-shelf solution developers who actually needs to do the spam filtering, then answers may differ, as will assumptions.

Edit: What are your thoughts of the mechanism hacker news have used to reduce bot comments?

cccbbbaaa

2 months ago

I've heard about form fields hidden with CSS multiple times. No idea how effective this is though.

etchalon

2 months ago

Honeypot have gotten less effective as bots have moved to using headless-browser agents vs simple POSTs.

kalleboo

2 months ago

It works for low-stakes stuff, e.g. it has eliminated the spam sent to our contact form.

user

2 months ago

[deleted]

unethical_ban

2 months ago

What proof of humanity is sufficient? Today it is a phone call, or a verification sent to a real address (limit one registration per household), or a video call. How will we verify humanity in 20 years when audio and video emulation is foolproof?

We'll have to have in-person attestation or make all services paid, perhaps.

phoronixrly

2 months ago

I would wager all services will be linked to a verified credit or debit (non-temporary) card. Most of them are now...

How are you going to connect the physical person with an identity with in-person attestation? Many (several of which major English-speaking) countries don't have mandatory government IDs...

A commenter below suggests that government eIDs could be used. I bet this will be harder to implement and will have much worse conversion rates than (the already terrible) mandatory credit/debit cards... Not to mention the hell that we as non-US citizens will have to endure if anyone tries to impose any form of mandatory ID there... One can only take so much complaining about government overreach about something that is basic necessity here in the EU...

thatguy0900

2 months ago

Realistically it will be a government or private service that everyone will have to have to verify that it is a real person. Or at least tied to a real person so that banning will be more sticky.

apitman

2 months ago

Like Google

nervysnail

2 months ago

Nothing less than drinking a "verification can", as presciently propounded by a 4chan user a decade ago.

kykeonaut

2 months ago

Wouldn't some sort of proof of work be a good solution to the captcha problem?

Specially since all of the sudden, a bot service running hundreds of thousands of requests will suddenly and inadvertedly have to compute cryptographic hashes at the cost of the user running the bots?

theamk

2 months ago

no, because a lot of bot service run on botnets, made out of hacked regular residential computers, routers and so on. They will feel a bit more sluggish, but it won't cost the botnet authors that much more.

On the other side, an amount of work reasonable for modern desktop will absolutely overwhelm an older cell phone.

user

2 months ago

[deleted]

jvdvegt

2 months ago

To prevent the cookie wall with no 'reject all': https://archive.is/oHc1e

user

2 months ago

[deleted]

nonrandomstring

2 months ago

You can get people to do almost anything if you lie to them that it's for "security".

nonrandomstring

2 months ago

Literally. Social engineering 101. Grab a clipboard, put on a hi-viz, speak in an authoritive, directive voice and people will absolutely do what you ask at the sound of the word "security". Social engineering defence 101: teach scepticism and not being intimidated by the word "security"... ask "whose security?", "security from what?", "security to what end?", and "show me your ID and the written policy".

catlikesshrimp

2 months ago

Except captcha is not supposed to be security for the user, but security for the website.

But in the end it is not (effective) security for a website, is an antifeature for users and is profit for google.

jisnsm

2 months ago

As a website developer and host, I can assure you recaptcha works very well to stop spam and automated login requests. It is not perfect, but no system is.

phoronixrly

2 months ago

As a website developer and host can you compare running your own CAPTCHA in place of any CAPTCHA-as-a-service? In my experience even a simple static how much is 3 + 39 stops the flood of spam in a form... It is also not perfect, but as you say no system is, and it does not pilfer my users' data...

nonrandomstring

2 months ago

I had a great conversation about this last week. I'll just casually leave this [0] here for anyone who has time (50 mins - ausio only) for a deep-dive into machine learning to protect sites (APIs). TLDR - a lot of serious defenders have given up on PoW/CAPTCHA human filters because the cost to AI solve them has dropped to almost nothing. YMMV.

[0] https://cybershow.uk/episodes.php?id=39

internetter

2 months ago

yeah, a sufficiently motivated attacker can deploy some countermeasures to bypass it, but only really worth it for targeted attacks. Anyone who has a form on the internet knows that without any sort of captcha, you get lots of stupid bots just typing in jumbo. Likely you could tone back the captchas and still get a similar result in stopping the dumb bots[0]

[0] on my contact page my email is protected via a custom cypher. if the bots execute javascript and wait 0.5s they can read it, but most don't. It’s the dumbest PoW imaginable, but it works

johnmaguire

2 months ago

> Anyone who has a form on the internet knows that without any sort of captcha, you get lots of stupid bots just typing in jumbo.

I recall a form of "CAPTCHA" that involved a text input which was hidden via CSS, but which bots would fill in anyway. Any text in the input caused the entire form to be rejected. I wonder if that style still works today.

phoronixrly

2 months ago

I've had an issue with this approach -- many browsers (via autofill/autocomplete) and many password managers (when filling in password, e-mail, etc.) tend to also get trapped in this honeypot... The spam does still get stopped though.

nonrandomstring

2 months ago

> It’s the dumbest PoW imaginable, but it works

Nice one! I guess you mainly need to get above a certain novelty threshold, because all ML is based on what has already been seen/learned rather than actually outsmarting the defence.

joseppudev

2 months ago

[dead]

aaron695

2 months ago

[dead]

jdietrich

2 months ago

loloquwowndueo

2 months ago

It’s not three years, it’s thirteen.

> A lifetime value of $888 billion for all of reCAPTCHAv2's tracking cookies produced between 2010 and 2023.

phoronixrly

2 months ago

jdietrich, I feel your pain, I am also completely convinced that 2010 was 3 years ago :(

bigbuppo

2 months ago

819 million hours of unpaid labor. And just think, a large chunk of that was performed by children. CAPTCHAs are slave labor in small doses. It's also a way of avoiding paying taxes on that labor. But hey, what's a few billion dollars in unpaid taxes and unpaid wages and child labor violations between friends?