Google Acquires reCAPTCHA

By kfreelskfreels (1253662282|%a, %b %e at %I:%M%p)

Someone had their thinking cap on over at Google.


On September 16th, 2009, Google announced in a Google Official Blog post that it had acquired reCAPTCHA, the company that provides CAPTCHAs (which actually stands for Completely Automated Public Turing test to tell Computers and Humans Apart), to help protect websites from spam and fraud. Computers have a hard time reading CAPTCHAs, so websites use them as a way to let humans in and keep malicious computer programs out. I've always used them when buying tickets on, among other things. Without the CAPTCHAs, it would be fairly easy (well, not for me) to write a program that automatically buys tickets for the purpose of later scalping them.

So why would Google care about CAPTCHAs and what does this have to do with search capability?

The interesting thing about reCAPTCHA is that their images are scanned from old newspapers or books. So while computers have a hard time distinguishing the letters at first, as human users fill out CAPTCHA text boxes on legitamate sites, it can train the computer to recognize what those scanned letters or words are. Google already uses a basic form of this technology that converts scanned images into plain text for large scale text scanning projects like Google Books and Google News Archive. Google will now use the reCAPTCHA technology to allow everyday users to decode decades and centuries-old texts that computers can't read. Eventually, it will enable us to search for newspaper articles and snippets of books from way back when and Google will actually be able to pinpoint keywords and phrases in those articles.

My Take on This CAPTCHA Thing:

I believe this is a brilliant move on Google's part. They're going to utilize a technology, which was originally meant as a protective mechanism, to expand the number of publicly available books and newspapers on the internet. Additionally, it's good for the University of Michigan, who has a big contract with Google through the Michigan Digitization Contract to scan the entire print collection at UofM libraries. This new technology will benefit UofM as we try to scan the many historical documents and rare books we have here.

However, I feel a bit weird about the fact that Google is now using the masses to do all the work for them and teach computers to read. Next time I buy concert tickets, I'm going to ask myself, do I really want to type in this squiggly word so that some employee at Google doesn't have to? I'm not getting paid!

Also, this is probably something that some smart person at Google has already figured out, but what will happen when we have succeeded in teaching computers how to read? Then will they not be capable of protecting sites against malicious programs and spam bots? Then will they be able to take over the world? Never mind that last question…

Another reaction from this blog post focuses on the demise of the print industry caused by Google's large text scanning projects:

"As with other forms of media, when books are digitized, value is destroyed- the $25B US book industry won't be worth $25B in 5 years, but new value will be also be created in the process. Google intends to benefit from the value being created in digital books through advertising, and this is huge. There are few other media that currently contain no ads- cocktail napkins, highways, and bathroom stalls have all succumbed, but books are blue ocean for advertising."

Aha! So Google isn't out to spread knowledge to the masses, they're just looking to make a profit. Again, brilliant.

In Conclusion:

Academically, Google's acquirement of reCAPTCHA is really going to help them achieve their goal of making a publicly available library online. Their use of this technology is definitely unique, but also leaves some unanswered questions. While some would argue that this is another blow to the print media industry, I would argue that the materials Google will be targeting with this technology are the kinds of things that are out of copyright or out of print (i.e. really old stuff). We'll have to see what Google does with the protectionist function of the CAPTCHA technology, and how, if at all, that improves Google's search tools. Stay tuned!

Also, if you're super interested in this, watch this 50 minute long video about the technology, how it works, and what it can do. I watched the first 5 minutes and it was very cool. Teaser: you'll find out how the porn industry uses CAPTCHAs!

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-NonCommercial-ShareAlike 3.0 License