The evolution of those annoying online security tests

Captcha images (from top left, clockwise) Facebook, YouTube, Ticketmaster, Captcha

A free tutorial website, Duolingo, aims to translate the entire web with the help of people starting to learn a new language. It's a project born out of guilt from the man behind one of the most annoying features of web surfing - those online security checks involving random words.

Duolingo hopes to convince millions of people to work for free and thus translate all web content in a matter of years.

It may sound like an ambitious plan but it's not the first time founder Luis von Ahn and his colleagues at Carnegie Mellon University have enlisted a global workforce to work for nothing.

As a 22-year-old graduate student in 2000, von Ahn invented the Captcha - those distorted images of words and numbers used to sign in to ticketing and social media websites, among others, which users have to decipher to prove they are human.

The software is used by more than 350,000 websites to prevent computer programs from attacking them with spam. In 2007, von Ahn realised that 200 million Captchas were being typed by people all over the world every day.

The evolution of Captcha

  • Captcha: Security check of two randomly generated words - stands for Completely Automated Public Turing test to tell Computers and Humans Apart
  • ReCaptcha: Another security check of one randomly generated word, the other a photo of a word from a text that's being digitised
  • Duolingo: Website that uses similar principle applied to language tutorials in order to translate the web

"At first I felt really good about that because I thought, 'Look at the impact that I've had'," he says. "But then I starting feeling bad."

Typing each Captcha takes about 10 seconds, he estimates. Multiply that by 200 million, and humanity as a whole is wasting about 500,000 hours on these security codes every day.

He decided to put these hours to good use and devised ReCaptcha, a system that uses each human-typed response as both a security check and a means to digitise books one word at a time.

At the same time the New York Times was digitising 156 years of its archive using a team of typists. Over a decade, the typists had transcribed 27 years of newspapers. The paper began using von Ahn's software and in 24 months had transcribed the remaining 129 years of archived newspapers.

ReCaptcha was acquired by Google in 2009, and it is still used widely to tell humans and spamming programmes apart. But its translating software is exclusively available to Google's Books project to transcribe every book in the world.

How ReCaptchas work

Ticketmaster captcha

ReCaptchas use two words - one generated by the computer, the other taken from the pages of an old book, newspaper or journal that the system is digitising.

Each page has to be scanned individually, then run through a programme that transcribes every word. Computers have trouble reading text when pages are more than 50 years old, where paper is torn or yellowed or the typeface faded.

A human can do this easily - but can't always be relied on to get it right. When a user gets the first word right, the system logs their second response.

It then collates the most popular responses from a number of people.

All of that doesn't detract from the fact that for most people, these security codes are nothing more than a frustrating waste of time. For those with dyslexia or sight problems, they can be a serious barrier to internet use.

Dr Sue Fowler, at the Dyslexia Research Trust, says the codes only add to the trouble dyslexics have filling in web forms. "Even looking at it closely, I wouldn't know what to do with it," she says.

There is an audio alternative, but these are even more confusing as most just sound like a flurry of noise.

And the automated security codes are getting more and more difficult. Some of the latest manifestations can appear as a jumbled blur of letters, numbers and punctuation that is almost indecipherable.

"As of a few months ago, if we showed someone a ReCaptcha they were successful at it about 93% of the time," says von Ahn, adding that once that drops to 75%, users give up on trying to access a site.

Since selling ReCaptcha, von Ahn has teamed up with one of his graduate students, Severin Hacker, to create software that gives the user something in return for their time and effort.

The answer is Duolingo, a site that gives free language tutorials and in exchange solicits aspiring linguists to translate sentences from the internet.

At present, it only caters for English speakers looking to learn French, German or Spanish, and Spanish speakers who want to learn English. They start with very simple sentences and work up towards more complex ones, increasing their value as a translator as they progress.

Say what?

  • 1.2bn people are learning another language
  • With 100,000 active users, von Ahn says Duolingo could translate Wikipedia from English into Spanish in five weeks
  • With one million users, it would take about 80 hours

Human input is needed as although computers can translate individual words, they struggle to put these in context and construct sentences that make sense.

"The computer always knows what each word can translate to, all the possibilities - that's just a bilingual dictionary. But the computer doesn't know that in this case, a word means girl, and in that case, it means daughter," says von Ahn.

So when Duolingo presents the user with a sentence, it offers all the possible translations for each individual word. The user has to build the sentence, using their understanding of their native language.

To weed out bad translations, the site asks users to rate each others' answers and chooses only the top-ranked solutions.

Find out more

  • Luis von Ahn will appear on The Forum on the BBC World Service at 22.05 on 23 June.

After only a few weeks, users work on real sentences taken from creative commons websites.

And the site has echoes of a computer game. Points are offered for each translation attempted; completing a round earns the user a shiny gold medal; and learners can follow each other, adding a competitive edge.

But is Duolingo really able to teach people enough to reach fluency? Mickael Pointecourteau, an experienced language teacher who has used the software, has his doubts.

"There are some mistakes in their translation from the very first level, which worries me for when users will get to a higher level," he says. "Four main skills must be taken into account when learning a new language - speak, listen, write, read. I doubt this kind of software prepares for that."

Luis von Ahn Luis von Ahn wants to make amends for the annoyance of his security checks

Von Ahn, who grew up in Guatemala and is himself bilingual, argues that it will.

"We've been doing a lot of tests and we can get you to the point where you are an intermediate speaker of a language, you can go to a country that speaks that language and you can get around," he says.

"Of course in order to become bilingual you probably need to go to a particular country and live there for a few months, it takes that level of practice."

For some people, Duolingo will be nothing more than a game or distraction from work, but von Ahn believes its potential goes far beyond that.

"In the US and in the UK too, learning a language is more of a hobby. In South America you learn a language, particularly English, to make more money and to climb the social ladder."

He hopes his software will offer that leg-up to some who can't otherwise afford it.

Luis von Ahn will appear on The Forum on the BBC World Service at 22:05 GMT (23:05 BST) on 23 June. Listen to the programme after it has broadcast here.

More on This Story

In today's Magazine

The BBC is not responsible for the content of external Internet sites

Features

BBC © 2014 The BBC is not responsible for the content of external sites. Read more.

This page is best viewed in an up-to-date web browser with style sheets (CSS) enabled. While you will be able to view the content of this page in your current browser, you will not be able to get the full visual experience. Please consider upgrading your browser software or enabling style sheets (CSS) if you are able to do so.