Prepared by Elisabeth Lindsay.
OCR refers to Optical Character Recognition. a type of software or a feature within a software program that converts scanned images into text information that can then be manipulated or edited like normal text. For example, the individual lines of text in a document or book scanned onto computer using OCR software can be edited just as if the text were entered manually. Information that is not relevant can be deleted and relevant information stored for reference or copied elsewhere and saved. Although OCR has improved significantly over the years, the process is not perfect, and errors may occur in character recognition; however, these errors can be corrected in the editing process. OCR allows for the conversion of materials that would be extremely time-consuming or cost-prohibitive to transcribe by hand, newspapers for example. Perhaps the biggest benefit of OCR over simply scanning documents as images is the search capability -- OCR documents are word searchable.
The application of OCR to newspapers is of great advantage to genealogy researchers. Many online resources offer digitized newspapers, but not all are searchable. Other applications are books that have been digitized and made searchable such as yearbooks, local area histories, family histories, biographies, etc. And, not only can you search a book of your choosing, you may also be able to enter a name, place, or term into an online search engine and find a book or other document of which you were unaware, as a result of OCR technology.
<< The Genealogy Guide
<< Archived Materials