MANUSCRIPT DOCUMENT DIGITALIZATION AND RECOGNITION: A FIRST APPROACH
Keywords:patrimonial conservation, digitalization, thinnig, connected components
The handwritten manuscript recognizing process belongs to a set of initiatives which lean to the preservation of cultural patrimony gathered in libraries and archives, where there exist a great wealth in documents and even handwritten cards that accompany incunabula books. This work is the starting point of a research and development project oriented to digitalization and recognition of manuscript materials. The paper presented here discuss different algorithms used in the first stage dedicated to “image noise-cleaning” in order to improve it before the character recognition process begins. In order to make the handwritten-text recognition and image digitalization process efficient, it must be preceded by a preprocessing stage of the image to be treated, which includes thresholding, noise cleaning, thinning, base-line alignment and image segmentation, among others. Each of these steps will allow us to reduce the injurious variability when recognizing manuscripts (noise, random gray levels, slanted characters, ink level in different zones), and so increasing the probability of obtaining a suitable text recognition. In this paper, two image thinning methods are considered, and implemented. Finally, an evaluation is carried out obtaining many conclusions related to efficiency, speed and requirements, as well as ideas for future implementations.
 Radmilo M. Bozinovic, Cursive Scrtipt Word Recognition”. IEEE Transactions on Pattern Analysis and Machine Intelligence. Vol 11 No. 1
 K. Badie and M. Shimura. Machine recognition of roman cursive script in Proc. 6th. Int. Conf. Patternt Recognition, Munich, West Germany, Oct 1982.
 R. Manmatha, Chengfeng Han, E. M. Risema B. Croft; “Indexing Handwriting Using Word Matching”. Center for Intelligent Information Retrieval, Computer Science Department, University of of Massachusetts, Amherst.
 Toni M. Rath and R. Manmatha; Features for Word Spotting in Historical Manuscipts. Multi-Media Indexing and Retrieval Group. Center for Information Retrieval. University of Massachusetts, Amherst.
 L. Fletcher and R. Kasturi. “A robust algorithm for text string separation from mixed text/graphics images”. IEEE Transactions on Pattern Analysis and Machine Intelligence, 10:910-918. 1988.
 F. Wahl, K. Wong and R. C and text extraction in mixed text/image documents”. Computer Vision Graphics and Image Processing, 20:375-390,1982.
 D. Wang and S. N. Srihari. “Classification of newspaper image blocks using texture analysis”. Computer Vision Graphics and Image Processing, 47:329-352.
 Michel Weinfeld, “Reconnaissance d'ecriture manuscrite: segmentation de mots”. Département d'Enseignement et de Recherche en Informatique. École Polytechnique, Paris, France.
 William Prat, John Wiley & Sons, Digital Image Processing, 1991, Second Edition.
 Gonzalez Rafael, Woods, Addison-Wesley Digital Image Processing, 1992, Second Edition.
 T. M. Rath, R. Manmatha. “Word Spotting for Handwritten Historical Document Retrieval”.