OCR, DPI and accuracy
2012/06/14 Leave a comment
What is OCR? OCR is short for optical character recognition. It’s a technology which anbles you to change any scanned images of handwritten, typewritten or printed text into machine-encoded text. OCR is now used in many enterprises, not just libraries and goverments.(Note: OCR is not new. Because it has been developed since 1912)
Now, OCR can reach 98% accuracy. But it will decrease the accuracy when processing the images with different quality. And for OCR softwares, the quality of images is really important. Basically, the OCR softwares require a 200 megapixel camera with auto focus at least.
The quality of image can be measured by a term called DPI(Dots Per Inch). Unofficially, 300 DPI is the standard quality for image. Because 300 DPI scanning is able to reach the most accuracy without sacrificing speed and file size. Just take a example, the improvement gap between 200 DPI scanning and 300 DPI scanning will be almost 2 times comparing the improvement gap of any other resolutions. However, for 300 DPI scanning and 400 DPI scanning, the improvement gap is nearly zero. Below is an example of a problem not having enough DPI can cause.