Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I dislike the widespread use of captcha regardless of provider.

I realize anything connected to the internet will be subject to automated abuse, and it's impossible to run some types of services without taking some steps to defend against it, but it seems to me there's usually a way to handle that without invading the user's privacy or wasting their time. The exact details will vary based on the type of service, of course.

One particularly egregious misuse of captcha in a service I use presents one after I enter a correct username and password. An incorrect login says so without presenting a captcha. The potential reward for an attacker who successfully gains access to an account is high, so it seems almost certain anyone running a targeted attack would defeat this by handing it off to a human upon detecting that they had a good account.



> it seems to me there's usually a way to handle that without invading the user's privacy or wasting their time

As much as I agree with your dislike of captchas, I don't think this is true at scale (unless universal online identities existed, which could and should include anonymous identifiers by design). When you need to accept information from anonymous users (comments, votes, forms, registrations), there's no way to not invade users privacy and not waste their time, unless you are manually filtering / moderating all the input data, in which case you can't really say it scales. You might say emails can solve the problem. Well, they don't really solve the problem against dedicated attackers / spammers, and they do invade privacy for the average user. You can use statistical approaches to try to reduce privacy invasion or others, but I don't know of anything that really solves the problem without manual identity verification at some point.


I built an alternative[0] that takes a proof of work approach. As a site owner you set the difficulty that makes sense for you: so perhaps you would want 20 seconds of computation before you can submit. The nice thing is that this can happen entirely in the background while the user fills in the form.

Also with multiple requests from the same IP in a short timespan, the difficulty increases.

There are downsides to to any captcha, but in my opinion make a much better tradeoff. Accessibility and privacy are respected, and there are no annoying tasks.

[0]: https://friendlycaptcha.com


proof of work really doesn't work well in practice. spammers have huge farms of compute, often on residential ips, and legit users are accessing the service from a device that is often power-constrained (like a phone). you end up either hugely penalizing legitimate users, or having to employ many of the standard antispam techniques (IP/ISP reputation, captcha, rate limiting etc) on top, so the proof of work adds a lot less incremental value.


It's not perfect, and you are right about the downsides. These resources that spammers have can be applied as easily to re/hcaptcha (either through ML or clickfarms). No CAPTCHA will actually lock out targeted attacks.

The difficulty increase per IP can be seen as a form of soft rate limiting, it's shared between all websites (which is where it's different from ordinary rate limiting). In the future we may use IP reputation lists to guide the initial difficulty too - but we haven't implemented that yet.

I think that no perfect captcha can exist, which is inherent to the problem. Proof of work makes different fradeoffs, and perhaps it is cheaper to attack still - I think it's a much more friendly solution for users though (accessibility, privacy, simplicity, fairness, UX).

Maybe in the future the solution would be something like this: a long PoW-based captcha that runs in the background as well as a vision task for the user, whichever gets solved first.


I get re-captcha'ed all the time from the same IP. And if I don't use Chrome, the captcha count is like 4x-5x higher just for using Firefox.


That's why I have even stopped using google services. If I literally have to get another browser to use your snowflake site, then why would I use your service anyway?


This reminds me of a similar solution I saw on PH last year, I think it's a great alternative for smaller websites that are less likely to be targets for spams/bots

But say, there's a website and it's a likely target, you implement IP protection, fine, the user uses residential proxies. Now your best bet is to go off fingerprinting, but there are marketplaces which sell those too in bulk.

Maybe I'm wrong, but wouldn't the best approach be to stick to human interaction puzzles, which are hard and don't have a set way to solve by a machine(for now)?


bangladeshi click farms[0] are cheaper to use to bypass captcha than renting residential proxies to solve PoW. Also image captcha cannot scale automatally in difficulty (as an incident response) but PoW can (see how bitcoin adjusts with the miners)

[0] https://2captcha.com/


Just did the math from the numbers on their site and on average a "worker" doing captchas for them gets paid 0.2$/hour.

Adjusting based on average monthly salary in Bangladesh (157$) [1] and the US (4056$) [2] that would be similar to an American making 5.2$/hour which is surprisingly close to the current minimum wage in the US (7.25$/hour) [3]

So I guess this must be a fairly decent way to earn money if you're young/poor in Bangladesh...

[1] https://tradingeconomics.com/bangladesh/wages#:~:text=Wages%.... [2] https://www.thestreet.com/personal-finance/average-income-in... [3] https://en.wikipedia.org/wiki/Minimum_wage_in_the_United_Sta...


5.2 is not close to 7.25. That's like 30% less than the minimum wage. Minimum wage itself is a massive struggle but 30% less is just plain offensive and dehumanising.

> So I guess this must be a fairly decent way to earn money if you're young/poor in Bangladesh...

Solving dumb captchas is never a fair or decent way to earn money, not when you are poor and definitely not when you are young. Creating living conditions for other human beings where they can be easily exploited and used for mindless degrading work such as solving dumb captchas is one of the most grotesque things of the 21st century.


It's a bit like locking your bike. That doesn't work against targeted attacks, but the presumptive thief is more likely to choose another bike that has a smaller or no lock.

The arms race is bad for everyone, in both examples, but the underlying problem is a fundamental one of misaligned incentives.


Ignoring the other criticisms because they generally seem valid, to everyone saying that proof of work doesn't matter because bots can just use more machines, that depends a lot on the economics of any specific automation project. I scrape a little data here and there, and a reliable proof of work system costing ~20s on a commodity core would make some of my personal projects cost tens of thousands of dollars monthly. Maybe that's worth it to someone (e.g. if they have an army of hacked machines without anything better to do), but I think it'd keep a lot of the riffraff out.


> FriendlyCaptcha will prevent 99.9% of spam

For someone who has little expertise in this specific field, how are you calculating this?


As a counter-point, the uncaptcha[0] research project used Google's free Speech-to-Text service to solve reCAPTCHA at a reported 85% success rate.

I'm convinced CAPTCHA are no better than fake/dummy security cameras.

[0] https://github.com/ecthros/uncaptcha


Admittedly it's not calculated so it may be a stretch, it's based on the assumption that the vast majority of spam out there just looks for forms to submit without smarts (which is also why honeypots can be pretty effective, especially if you have a small website that nobody will take the effort to work around it.)

I've seen people report that they have reduced spam to near nothing already with just a honeypot, but of course I can't verify those claims.


Judging by the downvotes (despite answering the question truthfully), I see it's not a good way to present ourselves, and frankly we don't have to make that claim. It's hard to estimate the real percentage, our customers are happy but measuring what is no longer there is tricky in the real world.

I will change the wording on the website and remove the percentage.


People take quantitative claims seriously. I wouldn't make them without being able to defend them in an intellectually rigorous way.


> I've seen people report that they have reduced spam to near nothing already with just a honeypot, but of course I can't verify those claims.

Can verify from personal experience. I once implemented a simple honeypot approach on a small blog site. It immediately cut down automated "drive by" comment spam to almost nothing. I never tried to quantify it, but it was the difference between dozens of spam comments a day and maybe one or two a week (which I assumed were probably manual submissions).

Most spam bots are pretty unsophisticated it seems, and do not pay any attention to a honeypot field being hidden either by CSS or JS.


It should be fairly easy to set up two open wordpress blogs, one with the captcha and one without.

After a few months you check how much spam arrived at either and get your number?


How do handle low-end devices? Do you reduce the difficulty for them and can this be abused by pretending to be a low-end device that really isn't?


Everybody gets the same difficulty initially which you determine as a site admin, so one should base this on their audience (e.g. Gitlab would have a different device profile from a government website).

The solving can be a few times slower on a low end device which you should keep in mind. To aid with this when setting the difficulty for your website it shows you an estimate for various device types. This is indeed a downside of PoW approaches.

There is one factor that helps: you can start solving as soon as the form loads, so as the user enters their details/comment it can start solving - I have a hunch that people on mobile devices are inherently slower at entering their data which should help a bit..

Anyway - if you set the difficulty quite high and the solving takes 30 seconds, it takes the user 15 seconds to enter the form - the user would still have to wait 15 seconds. That's not very different from the time to solve image captchas (it's actually lower and doesn't come with a 2MB payload download which isn't great on phones either, and they can keep their privacy + sanity). You could give the user something to do that makes sense for your website (ask them for feedback?).


> I have a hunch that people on mobile devices are inherently slower at entering their data.

In general, I use form autocomplete to fill this quickly. And on the contrary, my mobile signups are faster than desktop because the lastpass firefox extension on desktop takes longer to detect the form before the autocomplete can begin than my phone does.


Why bother with a proof of work scheme when you can just rate-limit directly? It accomplishes the same thing, while eating way fewer CPU cycles, doesn’t require JavaScript, and guarantees uniform cost between all client types.


This sibling comment was responding to you: https://news.ycombinator.com/item?id=25215024


So, if your app takes off on a college campus, you’d block them?


Do you collect metrics on how many people end up waiting for the PoW and share that with the admins?

In all, this sounds really promising though. I’d venture that most spammers already have higher end machines than end-users to solve existing captchas.

I’d probably approach this with a different strategy. I’d send an encrypted time stamp as the nonce. On the client, I’d first do a few easy PoW tasks and estimate the PoW difficulty for the given machine that would take at least X seconds to do. Then, send the PoW, the encrypted time stamp, and difficulty to the backend. If it’s been shorter than X (with a margin of error), or the PoW is wrong, it’s not valid.

In this scheme, it doesn’t matter how powerful the device is, a core is going to do some work for X seconds or at least be throttled by time.


> Why bother with a proof of work scheme when you can just rate-limit directly?

Tad amusing after all this time people still don't understand why proof-of-work schemes exist.

Rate limiting has zero cost to an adversary. PoW has physical costs. It's in the name :)


And Cloudflare already does that—that's what the "Checking your browser before accessing xyz—Please allow up to 5 seconds" message means. It's clearly not enough for them though, because they then go to also require CAPTCHAs.


First one to make this mine an altcoin for proof-of-work wins.

But seriously I like the idea, although it seems trivial for someone to attack a protected site by exhausting its subscription level? Are there any protections against that?


We don't disable the service if a protected site goes over their limit.

Right now we manually look at the limits and are reasonable with overages - also we can see how many captchas were unsolved.


Wow, this is an awesome idea. I can imagine this could be extended to solve tasks to mine cryptocurrency. If you get attacked by a botnet, you would actually make a profit!


Proof of work by itself is nearly useless, unfortunately. Compute is cheaper than people. This is one reason why CAPTCHA services will likely be with us always.

As someone working in the field, I also doubt your claim "will prevent 99.9% of spam" is based on real data. Modern headless browser spambots are not deterred by this kind of approach.

(Edit: looks like the poster admitted this number was entirely made up later in the thread.)


I just tried loading the demo of Friendly Captcha in 8 browser windows, and click the verify button, refresh the window and on repeat for about 3 minutes. Not once did it tell me that I'm a robot so seems your alternative fails the most basic of captcha functionality, limiting people/machines to spam functionality that the website owner wants to be limited.

Maybe not everyone but a lot of people use captcha services to prevent automation from being used to extract/insert data. I know as a developer that there is always a chance of bypassing this, even with Google's reCaptcha, but your service seems to make this trivial, so many won't even go beyond your demo.


>Not once did it tell me that I'm a robot

Right, unfortunately you've completely misunderstood the point of Friendly Captcha, a question which is answered right there on its main page.

>>How does FriendlyCaptcha tell apart bots from humans?

>>It doesn't, FriendlyCaptcha adds a small cost and complexity for spammers that becomes large at scale.


Right, I guess it's time for you to upgrade the UI of your tool then, as when it's inactive it says "Anti-Robot Verification" and once the challenge is done it says "I'm not a robot", while in reality, none of those things are true, as you said yourself.

You might also want to rebrand to use a different word than "Captcha" as you're not actually telling robots and humans apart, you're simply adding PoW to an action, nevermind if they are robots or humans.

So instead of blaming users for misunderstanding your message, maybe try working on making your messaging a bit clearer so for the people who know what captcha is, don't get confused by your own definition of it.


Actually the user you're replying to is not the author of the service, from what I can tell.


Oh dear, it seems so. Thanks for letting me know, I guess I just assumed it would be the creator of the service who would defend it, not someone else, but seems you're right.


It doesn't work for me, comes back with the error: Verification failed: Background worker error undefined

I'm using latest Firefox on GNU/Linux. Admittedly I've got a lot stuff blocking all sorts of things, and I'm not really sure what's kicking to block background workers, but I'm glad it's blocked. Anyway, after disabled literally all blocking tools that I have, it still refuses to load.


That's not good, could you maybe provide more details in the Github repo [0]? The widget is open source, hopefully we can figure out what is blocking it here.

We test the captcha in browsers up to 8 years old and on many devices, do you perhaps have background workers disabled entirely? Here is a link to the widget on its own [1], does that have the same behavior? How about a minimal worker example [2]?

[0]: https://github.com/FriendlyCaptcha/friendly-challenge [1]: https://unpkg.com/friendly-challenge@0.6.1/index.html [2]: https://jsfiddle.net/christopheviau/90syrp0q/


Didn't try the other links but the jsfiddle link just says Preparing worker in Firefox here and neither button ever does anything.


Curious why it wouldn't start 'verifying' immediately on load? The fact that it runs in the background is really key--I'd hate to fill out an entire form, click the button at the end, and still have to wait around to submit.


You can change this behavior of the widget (data-start="auto" instead of default data-start="focus"), or you can start it programmatically.

The reason you wouldn't always want to start it in the background is if the user may not intend to submit the form (perhaps it's a form that is in your footer of every page and only a small percentage of users intend on sending it). Starting it on focus of the form is a good default.


So your solution is to technically waste electricity to replace captcha? It's for sure an interesting concept, the first point and low-end devices requiring 20+ seconds to pass are not a very good points to sell your service.


You're right that there is an electricity cost to solving this type of captcha - the same as there is an electricity cost to loading 2MB of JS+images and clicking the pictures with the fire hydrants (and the infrastructure behind that). It's hard to estimate how they compare (and what value you assign to the human labor performed and privacy loss).

20 seconds would be a fairly high difficulty. It's up to the site owner to decide what makes sense for them.

If anybody comes up with a useful computational task with a small bundle size that can be verified cheaply that would be the holy grail - until then the computation is only there as a form of hashcash.


20s doesn’t matter when it’s someone else’s hardware (eg spammers using malware installed on victim machines).

It’s also nonsense to compare the computational cost of N seconds of sustained, maxed out useless computation to the milliseconds of compute time needed to decode an image, or the minimal power usage of waiting on network data.


Electricity is a lot cheaper than my time.


Right? I think they’re describing those crypto mining scripts people were being inflicted with a while back :)


This is very interesting. Can you change the questions in the form? Those questions seem too personal and are offputting.


This looks very interesting and clean. Well done!


CAPTCHA does not scale. CAPTCHA spams real people with requests and wastes my VALUABLE time, and still labels disabled people as subhuman. It's offensive. It's ineffective. It's outdated.

It's reaching a point where encapsulating a VPN with anti-captcha is something I'd pay for.


> CAPTCHA is the worst option, except for all the others that have been tried.


I would happily pay 15¢ or so per site to bypass captchas if it were done in a way that would preserve my privacy. Has something like this ever been offered?


> unless universal online identities existed, which could and should include anonymous identifiers by design

Yes but no. Anonymized identifiers can be deanonymized. They should utilize zero-knowledge proofs in such a way that they can prove "yes, I have an identity verified by entity X (and Y and Z) (based on passport/phone number/...)", without disclosing any of those details.

It could, optionally, yield an identifier unique to each requester and unlinkable to others unless an explicit proof of the link is provided. Though if this is included, there has to be some mechanism to avoid huge ad networks sharing the same "requester entity".

This is a solved problem. All that's left is politics, implementation and alignment.


Maybe "anonymous identifiers" has some very technical and exact meaning that I didn't know, but when I said "anonymous identifiers" I did it very abstractly, no need to assume a specific underlying implementation from those two words.

I have actually discussed the concept in the past [0], and I exchanged some emails with the guy in that thread to talk more about technical details. We all seem to agree that design and political will are the problems, not technology.

What I was basically saying in the comments, in general terms, is that you might have one primary identifier, and then somehow you can get more identifiers that are tied to your main one, but that might have different expiration periods, might grant access to different levels of information about you, and might be limited to a certain number for each service you use. Of course, there are quite a few ways to implement such a system. And that's precisely why I'm more focused on the design, usability and characteristics than the underlying technical implementation; I think the best we can do if we ever want to see this happen is to spread the idea in terms that anyone can understand [1]. I mean, I'm interested in the technical details too, so I'm just complementing and contextualizing a bit here.

[0] https://news.ycombinator.com/item?id=22180120

[1] ...or discuss more the idea among those that are interested and setup a demo website to make it easier to spread the word, even if there's no actual implementation behind it and it's just a mock-up. I'm quite busy at the moment, but I'll definitely do something along those lines when I have some time.


>"yes, I have an identity verified by entity X (and Y and Z) (based on passport/phone number/...)", without disclosing any of those details.

Actually, I would prefer: "Yey, this one-use temporary ID is tied to an identity which is known to behave on public websites." Or "... is tied to an identity which is known to be knowledgeable on topic X". Or whatever information is needed at the time.

Next time a new ID will be generated and the identity provider will vouch for it. No passport or phone number should be required.

Edit: fixed spelling.


One such solution would be a small payment, something like 1 cent for access. That's not too much, because I am already paying 3 cents to a service solving captchas for me.


Maybe but how would you transfer 1 cent in a way that's fast enough not to impair UX and cheap (where the transfer doesn't cost more than the validation fee)? Additionally (as with all online payments) there are privacy concerns.


Please, what is the service? I want to pay someone to solve Captchas for me.


https://anti-captcha.com/ is one such service. There are others, but this one that has browser plugins for visually impaired people in addition to APIs.

I've used the service in the past, though it's far enough in the past all I can say is it worked once upon a time, no clue if it's still reliable.


WTF Did you see the super man like guy shooting at the sweatshop workers? This looks pretty bad...

https://imgur.com/a/CvYyBQH


Holy crap that's terrible and offensive.


Websites spamming me with captchas is terrible and offensive.

"Because scaling. It is our God-given right."


It is possible to solve captchas without glorifying violence against workers.


Interestingly, it takes them under 20 seconds to solve a recaptcha and 70 for hCaptcha.

I wonder if they’ve partially automated recaptcha, or if hCaptcha is just a bigger pain in the neck. (I usually can’t solve a reCaptcha in 15 seconds...)


I use Buster browser extension with free IBM Watson speech to text node, free limitation is 500minutes of speech per month, which is plenty for solving captchas. The crux of all this privacy (blocked 3rd party cookies) is that i need to solve 3 ReCaptchas every time.


For a lot of people, they want to run a service and not have to spend a significant amount of time and energy investing in anti-abuse. In general anti-abuse work is not nearly as useful as product work, a day off, or a variety of other things.

I agree, there should be better ways to do anti-abuse. Yet I find myself coming up empty when I try to find better options for the common scenario where people would really rather invest deeply in their service than in anti-abuse.

I would love to hear some ideas about how to solve this nasty general problem while also respecting user time and privacy. Unfortunately, I've found that entirely too often the vague sense that there must be a better way fails to translate into substantive better way.


Better way? I'd be hard pushed to come up with a worse way.

The number of things that are "wrong" with reCatcha etc, have been mentioned on here ad nauseam. In fact, I'll quote myself from another debate on the subject, a while back:

  >1: It's never made clear exactly what you're supposed to click on. For example. If I'm told to click on "traffic lights" does that mean just the lights?... or the poles as well?... and what about a square that only has a tiny bit in it? Does that count too, or is it only squares which are mostly filled by the object in question?

  >2: They make no concession to non-US English speakers. I've been asked to identify things before, where I had to guess what the word means because the same thing is called something completely different in UK English.

  >The only thing that approaches the level of rage that reCaptchas instil in me are those captchas where you've got to transcribe what's in a photo of some letters & numbers and where they NEVER fecking tell you whether it's case sensitive or not, or where they use identical characters for zero and letter O, one and letter I, etc.


So in other words you have no better ideas either?


I have lots of better ideas.


Are you executing on them? Otherwise you should share them here, so that others may.


Also, the outcome of the captcha is only loosely correlated to whether you answer correctly.


  >One particularly egregious misuse of captcha in a service I use presents one after I enter a correct username and password
Is it eBay by any chance?

That recently started randomly showing reCaptchas to me when I'm already logged in and have been using the site for some time. When this happens, it descends into a never-ending cycle of more login screens and then more reCaptchas.

But thankfully eBay have taken note of the dozens of complaints about this on their user forums, dating back to 2018 and rushed their best people in to fix it.

[That last sentence was dripping with sarcasm, in case anyone unfamiliar with the company thought eBay ever took any notice whatsoever of their users' concerns]

I'm not a violent person at all. But if I ever meet the person who spawned reCaptcha and all its equally annoying clones, which are a pox on the internet, I won't be responsible for my own actions.


I'm about as anti-Google as it comes, but I didn't even mind the first incarnation of reCaptcha as a concept. You prove that you're human, and you also help transcribe books so that they're more accessible/searchable! Sure, it's in Google's interest in that it improves Google Books, but it at least seems like a symbiotic exchange (to, e.g. humanity in general.)

Contrast that with today's form of reCaptcha where you identify stop signs/crosswalks/et c. for Google's benefit, but at the same time you're also improving...oh, wait, Google again. It almost seems like forced labor, in a sense.


It is forced labour (to a very light degree, but still).

It is additionally resource-theft, when recaptcha-protected sites are used for business purposes. You are stealing valuable business time (possibly very valuable business time, if the person in question is a high-paid role like a CEO or surgeon) to power your pet "spot the crosswalk" project.


Yeah, I certainly don't disagree. I was just trying to use 'forced labor' in a literal sense rather than try to imply any of the awful things that usually come to mind...


>eBay

>I'm not a violent person at all. But if I ever meet the person who spawned reCaptcha and all its equally annoying clones, which are a pox on the internet, I won't be responsible for my own actions.

be careful, else they start sending you dead horses head and planting gpses in your cars https://www.justice.gov/usao-ma/pr/two-former-ebay-executive...


With our hCaptcha Enterprise product (https://www.botstop.com), showing a CAPTCHA actually only happens in rare cases (relatively speaking..) - vast majority of bots are caught and stopped in the background (using ML), and most users will never see one.


I'm curious what how rare it is / what triggers it. In my experience, at least Google triggers hard mode if you use any sort of privacy preserving technology, etc ublock, brave, etc. It's very frustrating.


Gave up on google search because of this ... not a big loss.


I have a VPN so like ~50% of web sites present a captcha to me ... had to subscribe to a service solving captchas automatically.


> had to subscribe to a service solving captchas automatically.

What a bizarre world we've made for ourselves.


Do you allow by click type?


Not sure what you mean by click type?


I find that when I solve a Captcha too quickly, I get another one. And another one. And another one. So instead, I wait a short time, click a few wrong boxes, then enter the correct Captcha. Maybe this is part of it, but I don't like it.


If the Buster plugin can't solve the reCaptcha for me [It does fail from time to time] then I just don't bother visiting that website. Or if it's a site I need to use, then I'll try again later and see if I either get let in without being asked to jump through hoops, or get a reCaptcha Buster can solve.

I simply refuse to waste my time and drive up my blood pressure by doing unpaid training work for Google's AI, in order to visit some crappy website. I really wish more people would start boycotting any site which uses reCaptcha [or its derivatives], so we could get rid of this blight on the internet.

I've spotted this new hCaptcha junk show up recently on a couple of sites I used to frequent. I don't visit those sites any more. So well done webmasters. Apparently annoying the shit out of visitors to your site tends to drive them away. Who'da thunk it?!


I sent a email to my representative, which got my automatically added to her newsletter. But the unsubscribe link doesn't work without solving one...


Where? That would not be legal in many countries and I suggest you try reporting it.


Let me guess... You can try reporting it to the authorities, but they need you to solve a CAPTCHA first.


As in US house of representives. Cheri Busto to be exact in case someone works with her.


Can you provide an alternative that doesn’t involve my contact form getting spammed to hell with crap?


I have my email address listed in plain text on my website and with a simple regular expression to to reject the standard pharma/bitcount/etc. spams at the SMTP level based on subject there is at most a couple of spam emails pair day. Hardly takes any time to go through that.


Use the recaptcha after form submission only, rather than on the whole website. Then at least the user is incentivised to do it as a last step of a process, as opposed to being stopped in their tracks before they even got to visit the website.


What service are you using? Some browser plugin I assume?


    One particularly egregious misuse of captcha in a
    service I use presents one after I enter a correct
    username and password.
That's nothing.

eBay will CAPTCHA me after I enter my e-mail address, and then again after I enter my password too. Every time. And I'll be damned if I don't "fail" this CAPTCHA at least once a week, with it telling me to try again.

Come on, there are only so many mountains/hills, taxis, traffic lights, bicycles, and cross-walks I can look at before I go cross-eyed.

They even have the nerve to suggest that I can avoid this by using the latest version of my browser (Firefox), which I already am and always do.


> The potential reward for an attacker who successfully gains access to an account is high, so it seems almost certain anyone running a targeted attack would defeat this by handing it off to a human upon detecting that they had a good account.

Then it may surprise you to know that simply preventing automation makes many types of account takeover attacks infeasible in practice. It won't mitigate the attack if you are personally a high value, named target. But most account takeover attacks operate en masse and are coordinated after large security breaches, so having to hand over accounts to a human operator as part of the auth loop would make the campaign uneconomical. It also introduces another step at which an attack can be logged, recognized, fingerprinted and stopped by an incident response team.

This is something your security team would probably gladly tell you about if you asked them. There's also a bunch of talks about this presented at conferences like Blackhat, DEFCON, USENIX, etc.

Stated in another way: not all potential rewards for successful account takeover are high. The modal account in the modal campaign is low value, which is made up for by volume and particular purpose of accessing accounts. If you model these campaigns economically, you can eliminate entire classes of "low margin, high volume" attacks simply by introducing friction that mitigates automation.

Then there is a natural cost-benefit tradeoff as to how much friction is allowable on a per-user basis to prevent the most common types of account takeover attacks.


>One particularly egregious misuse of captcha in a service I use presents one after I enter a correct username and password.

I run a problem validation community platform. Couple of days back an individual launched automated spam/DDOS attack by commenting an abusive, demoralising text on every single thread by creating different users.

Fortunately, I had systems in place to identify and mitigate it with Cloudflare. So, in this case even genuine users would have received captcha. I found out soon enough who the attacker was from the firewall, he had earlier created an account with his own name and was using the same IP to attack, after I blocked his IP he tried with couple of other IP addresses incl. Tor; but stopped with his activity after couple of hours.

I generally don't like re-captcha because it takes cultural background for granted(e.g. 'Pie' is not a common food worldwide), Accessibility as a disabled person myself and has no mitigation for captcha-solving farms.

But in nuisance cases like the one I detailed above, captcha is the easiest method available en masse.


Why can't they just allow automated user agents? I should be able to scrape websites if I want to. Why do user agents have to be browsers?


Exactly, or be able to just use a text-mode browser.

Or wget to save a set of pages for later.

I understand protecting commenting with captcha, or contact forms. But captcha on regular read-only access to public web pages in the style of Cloudflare is a bit ridiculous.

One thing contact forms should have is a static indication there's a captcha in use. I've filled all too many forms that just sent my written text to void, because I block some domains.


This doesn't mix well with the ad-based compensation model.

Sadly, there still doesn't seem to be much in the way of micropayment infrastructure.


That's a feature if you ask me. The whole point of scraping websites is to get the data I want while discarding noise like interface chrome and advertising directly to the garbage.

If they'd like me to pay for access, they should return HTTP 402 Payment Required instead of letting me download the page for free. Perhaps they could also rate limit the network connection to prevent denial of service. Why straight up block automated user agents though? That sucks.


> they should return HTTP 402 Payment Required

That's illegal.

They must first return HTTP 418 Know Your Customer Required.


> there still doesn't seem to be much in the way of micropayment infrastructure.

Anti Money Laundering regulation killed it: KYC doesn't scale down to micropayment levels.

If you want to fix the web, you have to roll back the AML/KYC insanity. Until that happens, the web will stay broken, because paying with attention (ads) is magically exempt from the AML/KYC insanity, whereas paying with money or anything money-equivalent (fungible and transferable) is not.


> the AML/KYC insanity

Just finished reading about this and I completely agree. I can't imagine having a company and being literally obligated by law to violate everything I personally believe in about privacy and freedom just to help the government be even more efficient at marginalizing people.


Being scraped isn't free, if it's at a large enough scale.

Plus, it's not just benign read-only scrapers. Have you looked at the spam folder of your email recently? That's what every comment section and user bio and god knows what else would look like if you just blindly allow all automated traffic.


It's exactly how email spam filters evolved.

They used to be completely local and even some DIY solutions, evolved to signature updates, but eventually the attacks grew so advanced that only online services could be updated and aggressive enough, which is of course how gmail took over the internet with near perfect spam filter (when was the last time you checked a gmail spam folder).

The last generation of local spam filters were pretty good though. Anyone remember Eudora and Spamnix?


Local spam filtering still works quite fine. It just needs a lot of data most users probably don't have when starting out.

I just use bogofilter, and it worked almost perfectly from the start, just because I saved years upon years of SPAM and HAM. 10's of thousands of messages each.

It got slightly worse over years, because I incrementally only train it on new SPAM but not on new HAM, because of laziness.

People probably have HAM archives, but don't usually save their SPAM, to be able to start using Bayesian spam filters right away with great results.

Personally I find it much better than whatever Google uses. I don't even bother with SMTP level domain/IP blacklists, or reverse IP/domain checks anymore. All mail is just passed right to the mailbox and is then pre-filtered by a bogofilter to SPAM folder that I check once weekly, and barely find any HAM there. I receive about 500k mails a year.


And don't spammers just click farm captchas out to Facebook users filling out "what Hogwarts House are you?" quizzes, anyway?


That's a little amusing just to imagine: 'Which Hogwarts house are you? Identify these traffic signals and we'll sort you into the proper house!'


Here's a thought experiment. This one requires some long-term thinking, outside the box and well past recent history and the status quo.

What if the majority internet usage is non-interactive, from so-called "bots", what we may refer to as "automated use". Google and Facebook, among others, rely on the use of automation and "bots". The non-interactive clients ("bots") being used by these companies are not asked to solve captchas. (In turn, after collecting data from public sources, these websites attempt to prohibit the use of automation by their users wishing to access it. What is interesting is that neither company provides any definition of "automated" nor any clearly stated limits on the speed at which a user may access resources or the quantity of resources they may access in a stated time period. One might be apt to find such limits associated with an "API".)

In 2013 an Incapsula report suggested that the majority of internet usage is in fact automated and not "malicious"^1 -- what if public information sources on the internet catered to the use of automation rather than trying to limit such use, e.g., with speed bumps^2 like "captchas". What if servers treated all clients equally, instead of having data forcibly collected by a few large clients that receive preferential treatment, then siloed and protected from "automation". What effects would this have on "centralisation" and levelling the playing field.

"Do not ask for permission, ask for forgiveness." What does it really mean when applied to the internet. Perhaps it means there is an endemic lack of clarity about "the rules". Prohibiting "automation" is far too vague and in many cases it makes no sense. The growth of computers and the internet is the growth of automation. Both servers and clients may have concerns about resource utilisation. Websites do not ask for permission when they decide to use large amounts of the user's computer resources.

Consider that a Google could not exist without being "given permission" to use automation. Does the GoogleBot have to solve captchas. No automation means no company such as this could exist. How useful would the web be without anyone being able to use automation to create an index. Based on the HN comments about web search I have read over the years, I would guess that for many commenters, it means the usefulness of the web would be dramatically reduced.

Imagine an automation-friendly internet. The truth is, I think (the data shows) we already have one, except we are in denial that "the rules" actually allow it. An early metaphor for internet and web use was "surfing". It may be that those who are constantly fighting against automation are fighting against the waves instead of riding them. Time will tell. It stands to reason, IMO, that every internet user, whether a server or a client, should be expected to use automation.

1. https://www.incapsula.com/blog/bot-traffic-report-2013.html

2. An early metaphor for the internet was a "superhighway". Speed bumps would seem out of place on a superhighway.


Could the captcha be there to keep spam bots from posting? Sometimes it is trivial to get a new or just valid account, so just checking for that wouldn't stop spam bots.


It's easier for me to switch from google to ddg. Then to actually complete a captcha. I don't understand why businesses don't understand this.


There's a good reason for what you're identifying as misuse.

If you show a captcha after a failed password, you need to show a one after a correct password as well. Otherwise you leak information. You can have other solutions, e.g. in a login flow that splits the username and password entry, it's advantageous to put the captcha between those two steps. But even in those solutions the display of the captcha must be independent of password correctness.


There's a lot of arguments against captchas, but I do not agree with this one. You will always leak whether or not a password is correct based on how your app behaves - a correct password will grant entry to the application. If you only ask for a captcha when a user account exists but fail to ask if they use a made up username, that's an information leak.


The trick is to ask for captcha before validation.


> If you show a captcha after a failed password, you need to show a one after a correct password as well. Otherwise you leak information.

Presumably, if the person has entered the right username and password they're going to get access to the service at which point they'll know they entered the right one. What information exactly is leaked here?


The reason you'd want a captcha on a login page is to protect against brute-forcing of some sort. For example credential stuffing or a dictionary attack.

The information the attacker is looking for is the validity of the password. If you want to use a captcha to protect against this, the outcome must be the same whether the password is valid or not. Because if you only show the captcha for failed logins, the attacker can find out that the password was incorrect without solving a captctha, which by symmetry means they can also find out if it's correct without solving one.


Usually when you Captcha on a failed attempt, you captcha every request from that IP (or other session identifier) for a period of time. Try Google Accounts for instance. They behave this way.

You don't captcha the success path because you don't need it. You captcha the pre-login flow once you have a failed attempt. It's a trip switch that is a prelude to the flow.


But this entire thread is about a case where the captcha happens after password entry!

The point is that it is an entirely legit design, and kind of is the way you have to go when the username and password are entered together. As long as the captcha is shown regardless of the password validity, both the security properties and the amount of user annoyance due to having to solve unnecessary captchas is the same as if you had had to pass a captcha up front.

The example of Google Accounts is interesting, because they use split username and password entries. So there is indeed a natural point in the flow to show the captcha between the username and the password, which is what they do. But at least up to a year ago they were doing it after the password. So enter username + submit, enter password + submit, and if the login attempt was sufficiently dodgy get shown a captcha regardless of the validity of the password.


Sorry, because of how common the method I described is and how absurd the idea of showing a captcha only to give you a login failed message is, I "corrected" it before responding.


You use rate limiting to stop brute force attacks, not a captcha


That's not what's happening:

> An incorrect login says so without presenting a captcha.


Thanks, I misread. Then that indeed makes no sense!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: