 |
|
|
Online search giant Google has introduced a new element to
make its search engine even more useful: scanned documents. According to a
recent announcement, scanned documents have not been previously included in
search result because they couldn’t be sure of their content. However now, with
the help of the Optical Character recognition (OCR) technology, they are able
to covert PDF images into words that can be indexed and searched.
Google explained that while they’ve been able to index
documents saved as PDF before, the scanned images remained out of reach, since
they were merely digital images of the physical text or paper. This made it difficult
for certain signs to be mistaken, due to either the quality of the paper, the
ink smudges, or fold creases in the pages.
Even if for us, reading a picture of a document makes little
difference from reading a document in its physical form, the computer finds it
hard to make a distinction between some symbols or words and ink smudges for
example.
The Optical Character Technology makes it easy for the symbols
in PDF scanned documents to be recognizable, in order to be indexed and
included in search results. “This is a small step forward in our mission of
making all the world’s information accessible and useful,” Google wrote in the
blog posting announcing the breakthrough.
From now on, searches will return results of PDF documents,
but users will be able to choose how to view them, either as original PDF
documents or as HTML.
Google included a few examples of how the new system works: [repairing aluminum wiring], [spin lock performance], [Mumps and Severe
Neutropenia], [Steady success in a
volatile world].
© 2007 - 2009 - eFluxMedia