Machine printed text and handwriting identification in noisy document images

In this paper, we address the problem of the identification of text in noisy document images. We are especially focused on segmenting and identifying between handwriting and machine printed text because: 1) Handwriting in a document often indicates corrections, additions, or other supplemental infor...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on pattern analysis and machine intelligence. - 1979. - 26(2004), 3 vom: 24. März, Seite 337-53
1. Verfasser: Zheng, Yefeng (VerfasserIn)
Weitere Verfasser: Li, Huiping, Doermann, David
Format: Aufsatz
Sprache:English
Veröffentlicht: 2004
Zugriff auf das übergeordnete Werk:IEEE transactions on pattern analysis and machine intelligence
Schlagworte:Comparative Study Evaluation Study Journal Article Research Support, U.S. Gov't, Non-P.H.S. Validation Study
Beschreibung
Zusammenfassung:In this paper, we address the problem of the identification of text in noisy document images. We are especially focused on segmenting and identifying between handwriting and machine printed text because: 1) Handwriting in a document often indicates corrections, additions, or other supplemental information that should be treated differently from the main content and 2) the segmentation and recognition techniques requested for machine printed and handwritten text are significantly different. A novel aspect of our approach is that we treat noise as a separate class and model noise based on selected features. Trained Fisher classifiers are used to identify machine printed text and handwriting from noise and we further exploit context to refine the classification. A Markov Random Field-based (MRF) approach is used to model the geometrical structure of the printed text, handwriting, and noise to rectify misclassifications. Experimental results show that our approach is robust and can significantly improve page segmentation in noisy document collections
Beschreibung:Date Completed 12.10.2004
Date Revised 10.12.2019
published: Print
Citation Status MEDLINE
ISSN:1939-3539