Performance evaluation and benchmarking of six-page segmentation algorithms
Informative benchmarks are crucial for optimizing the page segmentation step of an OCR system, frequently the performance limiting step for overall OCR system performance. We show that current evaluation scores are insufficient for diagnosing specific errors in page segmentation and fail to identify...
Veröffentlicht in: | IEEE transactions on pattern analysis and machine intelligence. - 1998. - 30(2008), 6 vom: 20. Juni, Seite 941-54 |
---|---|
1. Verfasser: | |
Weitere Verfasser: | , |
Format: | Online-Aufsatz |
Sprache: | English |
Veröffentlicht: |
2008
|
Zugriff auf das übergeordnete Werk: | IEEE transactions on pattern analysis and machine intelligence |
Schlagworte: | Comparative Study Evaluation Study Journal Article Research Support, Non-U.S. Gov't Validation Study |
Zusammenfassung: | Informative benchmarks are crucial for optimizing the page segmentation step of an OCR system, frequently the performance limiting step for overall OCR system performance. We show that current evaluation scores are insufficient for diagnosing specific errors in page segmentation and fail to identify some classes of serious segmentation errors altogether. This paper introduces a vectorial score that is sensitive to, and identifies, the most important classes of segmentation errors (over-, under-, and mis-segmentation) and what page components (lines, blocks, etc.) are affected. Unlike previous schemes, our evaluation method has a canonical representation of ground truth data and guarantees pixel-accurate evaluation results for arbitrary region shapes. We present the results of evaluating widely used segmentation algorithms (x-y cut, smearing, whitespace analysis, constrained text-line finding, docstrum, and Voronoi) on the UW-III database and demonstrate that the new evaluation scheme permits the identification of several specific flaws in individual segmentation methods |
---|---|
Beschreibung: | Date Completed 27.06.2008 Date Revised 10.12.2019 published: Print CommentIn: IEEE Trans Pattern Anal Mach Intell. 2009 Apr;31(4):762; discussion 763-4. doi: 10.1109/tpami.2008.192. - PMID 19358365 Citation Status MEDLINE |
ISSN: | 0162-8828 |
DOI: | 10.1109/TPAMI.2007.70837 |