Amongst the varied and interesting topics in the field of digital fraud detection and identity proofing that I discuss with my clients is the topic of bot detection and mitigation. And when on this topic, I can usually guarantee a lively debate when we discuss CAPTCHAs. I find that the clients tend to divide quite cleanly into two camps of those being for or against using CAPTCHAs. And in the latter camp, they tend to have a rather visceral reaction to the notion of using a CAPTCHA and are militantly against it.
I find the topic of CAPTCHAs supremely fascinating. Let’s take a quick detour into the stages of their evolution (if you don’t want the history lesson, just skip to the next section).
The evolution of CAPTCHA
- Stage1. CAPTCHAs first started to be used in the early 1990s, most prominently by search engine Alta Vista that had a huge spam problem – they introduced the concept of the warped letters.
- Stage 2. After this, reCAPTCHA was created, which involved a pair of words – and was the start of using this process to solve AI problems, in this case book scanning. The first word was known to the system, the second word was unidentified to the system and was part of a book scan. Once a number of people had typed in the same answer for the second word, it would be resolved for the book scan. Google acquired reCAPTCHA.
- Stage 3. ReCAPTCHA 2 then came along, which involves ticking a box to say you’re not a robot, and Google doing checks on your user behaviour, and then a potential secondary check, the now infamous selection of images. This is another good example of CAPTCHAs being used to solve machine vision problems, in this case in the traffic environment, one would assume for self-driving cars. Given the focus in academia and commercially on this particular machine vision problem, bad actors have been able to leverage lots of tools to develop systems to defeat them. In fact, given that Google Cloud sells machine-learning systems, it’s entirely likely that some of Google’s servers are creating CAPTCHAs, and others are breaking them.
- Stage 5. But Google is not the only player in town. CAPTCHAs continue to evolve, with many vendors investing heavily in their own version. Some claim to have exceptionally high solve rates for humans and low solve rates for bots. The approach that some have taken is to focus on using machine vision challenges that are easy for humans to solve and also have no commercial applications, so that unlike the self-driving car applications, bad actors can’t leverage academic, commercial and open source research to build tools to solve these CAPTCHAs.
The case for and against
So back to the point in hand – why do some businesses like to use CAPTCHAs? Well, the logic is straightforward. If your bot detection solution believes the user is a bot, then you can block that user. But no solution is perfect, what if the user is not a bot? Then you’ve just blocked a good user. And so the CAPTCHA represents an opportunity, it represents hope, that if this is actually a good user then they can prove it by solving the CAPTCHA and continue on their way.
So why do some businesses hate CAPTCHAs? Well, the solve-rate for (good) humans on CAPTCHAs can sometimes be quite low, resulting in good users not being to proceed. Plus, for those good users who do solve the CAPTCHA, it’s often an annoying addition to the UX. And the solve rates for bad actors on CAPTCHAs can sometimes be quite high………….although this is a nuanced point since in some cases yes perhaps bots have been trained to solve the CAPTCHAs, but in others cases the bots hand over the session to a human to solve the CAPTCHA – there is a thriving industry of human-powered CAPTCHA solving services – typically people in low-income environments being paid a pittance per CAPTCHA, solving thousands of them daily.
So should you use a CAPTCHA or not?
I think so. Just don’t use the ‘pick a street sign from this matrix of images’ Google version of a CAPTCHA. There are many far more evolved CAPTCHAs available today from the likes of Arkose Labs, GeeTest or PerimeterX that have their own approaches and nuances but consistently do a better job than the dreaded matrix of traffic images. Given the pressure to reduce false positives on most digital commerce businesses, giving users a chance to prove that they’re human and not just blocking them is worth exploring. In a world where bad actors use humans to augment bots, though, you need a CAPTCHA that’s smart enough to detect when the humans solving it are just a little too fast – perhaps a sign that they spend their day solving these things over and over. When that’s detected, the CAPTCHA should dynamically be made harder to discourage such activity and render it economically nonviable. Using a CAPTCHA also forces interaction with the user in a controlled manner, allowing more telemetry to be obtained about the user and their level of humanity.
Just don’t use CAPTCHA as a default for all sessions. Rely on your bot detection vendor to spot and block most of the bot traffic, and just deploy CAPTCHAs in those genuine grey-area cases where you’re not sure of humanity – I’d expect that to be less than 5% of all sessions for sure.
And perform A/B testing………split your traffic and use a CAPTCHA on one segment only and compare the results – make decisions based on actual data and metrics, not preconceptions based on outdated approaches such as the traffic image matrix.
I’m always interested to hear about experiences of implementing different CAPTCHAs – reach out and let me know what you experienced!
As an interesting final aside, Amazon filed a patent in 2017 for a new type of CAPTCHA that is easy for machines to solve, but presents a visual challenge that humans would typically get wrong – and thus the process is subverted – human fallibility may in fact be the future when it comes to defeating bots……..