There is a lot of technology we use in our everyday lives that has so much more advanced features than we know. One such example is reCAPTCHA, ie those small “tests” you get to do e.g. when logging in or or when you want to leave a comment in order to prove that “I am not a robot”.
Did you know that what you do when you click on “all images showing traffic lights” not only proves that you are a human being – you also help to make artificial intelligence smarter?
Let’s go through how this technology really works and what this purpose is besides stopping spam.
What is reCAPTCHA and who created the system?
CAPTCHA stands for “Completely Automated Public Turing Test to Tell Computers and Humans Apart” and was originally created by computer scientists at Carnegie Mellon University in the United States. The Turing test was developed in 1950 by Alan Turing and states that if you talk to a machine and can not determine if it is a human or a machine, the machine can be said to have human intelligence.
In this case, it’s not about determining how intelligent an AI is, but reCAPTCHA is there to stop spam and automated bots.
There are (unfortunately) people who sit and create programs that browse around the web and look for forms to fill out with advertising or scam messages. Or who try to log in to different accounts or leave comments with links to different spam pages on blogs.
This is what drove the computer engineers at Carnegie Mellon to develop software that will be able to easily separate people from robots online. The technology was then acquired by Google, which runs reCAPTCHA today.
You have helped digitize books and train AI in image recognition
What the inventors of reCAPTCHA did was not just create a simple test to identify human users. They also created a smart way to use the data entered into the service.
You probably remember what reCAPTCHA looked like not so long ago. Then there were always two very strange words or speeches that you would interpret? They were blurred, curved, in strange fonts or with dashes.
These pictures came from books or newspaper articles! The technique was made so that you got a “confirmed” word and a new one. It was only the confirmed word that was really the test, where you had to write correctly. The second word was sent to the database and then other users could get the same word and when enough people wrote the same word for that particular image, that word was confirmed. In this way, the system became self-controlling and automatic.
So all of us internet users who have been sitting and trying to fill in difficult-to-read words have actually done a very big job for Google – we had already in 2011 digitized the entire Google Books archive. We have also helped to digitize the New York Times article archive back to 1851. A job that would have been enormous, expensive, and time-consuming if people had sat and transcribed the old printed texts.
Self-driving cars need to recognize traffic lights and cyclists
As you have probably noticed lately, you get further up those difficult-to-read words, but instead, it is a series of pictures where you have to “click on all pictures that contain a cyclist”, “all that contains a traffic light” or as the example above “all pictures showing a soup ”.
A major challenge for machine learning and AI programs is image recognition. To be able to look at things and identify what it is. By “tagging” thousands of images representing pedestrian crossings and buses, we can help AI get better at this.
That it is precisely traffic that is often in focus is because this is extra important for software for self-driving cars, something that Google obviously prioritizes. But the same software can also be used to automatically sort images in large image archives and make the images searchable (you know Google is very interested in search…).
Smarter AI programs also provide better spam programs
Yes, by now you may have started to think that if we use reCAPTCHA to teach programs to recognize images – will not the spam programs also use it then? And thus be able to overcome reCAPTCHA?
Well, that’s exactly what it is and this is a problem that of course, Google has realized. Therefore, they have invented a new solution that on the surface seems very simple but is much more advanced than it seems.
Nowadays, you are increasingly met by a small simple check box where you confirm: “I am not a robot”.
A simple check box with an advanced back
Google realized that we humans became less and less tolerant with those time-consuming small tests. reCAPTCHA began to be seen as negative and they realized that something was needed that was much faster and easier for us users.
What the “I’m not a robot” box actually does behind the scenes is, among other things, to check how we move the mouse pointer before we click in the box. There we reveal our humanity! Since the service comes from Google, it can also check our “Google cookie” on the computer where they know how we browsed online, what we searched for, and so on.
So the system is structured so that we first get this very simple test that most people “manage”. If there is any doubt about our human status, we get such an image test to confirm that we are not an unusually cleverly designed spam program.
In the long run, reCAPTCHA is becoming more and more invisible and software that acts completely in the background of web pages, where we do not have to do anything active ourselves at all to confirm our human status.
There is always a bit of a race going on between those who develop software that should be helpful and those who develop software for dark motives. In the middle of it all, we are users who want spam-free internet while we want to be able to use functions easily and quickly.
The price we pay is that we give Google more and more data about ourselves. Which they can use to develop their products. At least now you know that this is what you do when you prove that you are a human being online!