Recycling clicks to benefit humanity
At the height of its construction, 44,733 people worked on the Panama Canal. The Great Pyramid of Giza required 50,000 workers and the Apollo Project 400,000. No matter what you put on this list, humanity's largest achievements have been accomplished with less than a few hundred thousand workers because it has been impossible to assemble more people to work together - until now. With the Internet, we can coordinate the efforts of millions or even billions of humans. If 400,000 people put a man on the moon, what can we do with 400 million? That's the question that motivates my work.
An example of this is the reCAPTCHA project, in which hundreds of millions of people have helped digitize books by solving CAPTCHAs on the Internet. CAPTCHAs are widespread security measures that you've all seen: images of squiggly characters on the Web that people must type to obtain free email accounts and access to other sites. By asking humans to do a task that computers cannot, CAPTCHAs prevent automated programs from abusing online services.
For example, CAPTCHAs prevent scalpers from writing programs to buy millions of tickets for concerts or sporting events. It is estimated that over 200 million CAPTCHAs are typed every day, each taking roughly ten seconds of human effort - that's 500,000 hours a day. ReCAPTCHA re-cycles this human mental effort into a dual purpose: transcribing books.
Physical books and other texts written before the computer age are currently being digitized en masse (e.g., by Google Books and the Internet Archive) to preserve human knowledge and make information more accessible. The pages are photographically scanned and then computers must decipher each word in the scanned images in order to index the books and allow people to search through them. Unfortunately, computers are not perfect at deciphering this text. In older prints where the ink has faded, computers cannot recognize about 30 per cent of the words. On the other hand, humans are extremely accurate at doing this.
ReCAPTCHA demonstrates that old print material can be transcribed, one word at a time, by people typing CAPTCHAs on the Internet. Whereas the original CAPTCHAs displayed images of random characters rendered by a computer, reCAPTCHA displays words taken from scanned texts that computers could not decipher. The solutions entered by humans are then used to improve the digitization process.
It is important, of course, that the ultimate purpose of clicks online be revealed to the users. Sites using reCAPTCHA display a message that the words entered are being used to digitize books.
To date, over 400 million people - 6% of humanity! - have helped transcribe at least one word through reCAPTCHA, making it perhaps the largest example of massive collaboration in the history of humanity.
Image above: the reCAPTCHA system displays words from scanned texts to humans on the World Wide Web. In this example, the word 'morning' was unrecognizable by the computer. re-CAPTCHA isolated the word, distorted it using random transformations including adding a line through it, and then presented it as a challenge to a user. Since the original word ('morning') was not recognized by the computer, another word for which the answer was known ('overlooks') was also presented to determine if the user entered the correct answer.