Identification problems for OCR characters for text retrieval in ancient books: A case study in the Ancient Collections of the Central Library at UNAM
DOI:
https://doi.org/10.22201/dgb.0187750xp.2012.1.39Keywords:
Text recognition, OCR, ancient collections, digitization.Abstract
This article describes, in general terms, the problems faced for proper text retrieval through optical character recognition (OCR) in ancient books, by taking a sample of works from the fifteenth to the eighteenth centuries that are protected in the Ancient Collections of the Central Library at UNAM, and digitized by the General Directorate of Libraries. It first presents a conceptual theoretical exposition of OCR and its application in text retrieval to continue with the exemplification of the factors that determine the correct or incorrect identification of the graphemes in these books, by means of some tests applied with Adobe Acrobat 8 Professional and, last, it shows some findings obtained as a result of the analysis and interpretation of the data corresponding to the variables in question.Downloads
Download data is not yet available.

Downloads
Published
2012-06-20
How to Cite
Ballesteros Estrada, S. S., Morales Romero, G. and Cedillo Pérez P. A. (2012) “Identification problems for OCR characters for text retrieval in ancient books: A case study in the Ancient Collections of the Central Library at UNAM”, Biblioteca Universitaria, 15(1), pp. 25–34. doi: 10.22201/dgb.0187750xp.2012.1.39.
Issue
Section
Articles
License
Descargar el formato de Cesión de derechos en formato PDF:
Formato de Cesión de derechos
Imprímalo y una vez que lo haya firmado envíenoslo vía fax o por correo a:
Revista Biblioteca Universitaria:
Dirección General de Bibliotecas y Servicios Digitales de Información de la UNAM
Departamento de Publicaciones
Edificio de la Biblioteca Central, 11o Piso,
Circuito Interior, Ciudad Universitaria, 04510 México, D.F.
Tel 5622-1616,
Fax: 5622-1601,