Friday, March 12, 2004

Alan Turing - The EnigmaHave been thinking about Captchas recently (see below for details if you're not familiar with the term). Namely, are there better approaches to Captchas than digitally altered text? Because there's a possibility that OCR software -- say, the kind that the Post Office uses to read hand-written ZIP codes -- can already defeat the "munged text" strategy.

For example, here's a different kind of image-based approach (I don't have any suitable clip-art handy, so bear with me). Imagine, if you would, that each "bird image" and "car image" -- below -- is a different photo of a bird or car, respectively:

Bird image
Bird image
Car image
Bird image
Bird image
Bird image
Bird image
Bird image
Bird image
Bird image
Car image
Bird image
Bird image
Bird image
Bird image
Bird image


Check the boxes underneath the 2 Cars, then Press


My stats are rusty, but I believe the odds that a computer could pick the correct two images (say, cars in this example) is 2/16 * 1/15 or about 1 in 120. Not good enough? Making the user match 3 images ups the odds to about 1 in 600. 4 matches yields odds of about 1 in 1800.

Still not good enough? What if we randomly produce 2, 3 or 4 matches - and force the user to pick all of them? (Obviously, we would change the caption to Check the boxes underneath ALL of the cars). Again, I'm not a stat-dude, but I think the odds now soar to about 1 in a million. I think that's probably good enough. Plus, it relies upon recognition of dynamically chosen images -- not alphanumeric characters -- which requires substantially more computing power to analyze.

No comments: