Performance evaluation and benchmarking of six-page segmentation algorithms

Informative benchmarks are crucial for optimizing the page segmentation step of an OCR system, frequently the performance limiting step for overall OCR system performance. We show that current evaluation scores are insufficient for diagnosing specific errors in page segmentation and fail to identify...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on pattern analysis and machine intelligence. - 1998. - 30(2008), 6 vom: 20. Juni, Seite 941-54
1. Verfasser: Shafait, Faisal (VerfasserIn)
Weitere Verfasser: Keysers, Daniel, Breuel, Thomas
Format: Online-Aufsatz
Sprache:English
Veröffentlicht: 2008
Zugriff auf das übergeordnete Werk:IEEE transactions on pattern analysis and machine intelligence
Schlagworte:Comparative Study Evaluation Study Journal Article Research Support, Non-U.S. Gov't Validation Study
LEADER 01000caa a22002652 4500
001 NLM179010530
003 DE-627
005 20250209095735.0
007 cr uuu---uuuuu
008 231223s2008 xx |||||o 00| ||eng c
024 7 |a 10.1109/TPAMI.2007.70837  |2 doi 
028 5 2 |a pubmed25n0597.xml 
035 |a (DE-627)NLM179010530 
035 |a (NLM)18421102 
040 |a DE-627  |b ger  |c DE-627  |e rakwb 
041 |a eng 
100 1 |a Shafait, Faisal  |e verfasserin  |4 aut 
245 1 0 |a Performance evaluation and benchmarking of six-page segmentation algorithms 
264 1 |c 2008 
336 |a Text  |b txt  |2 rdacontent 
337 |a ƒaComputermedien  |b c  |2 rdamedia 
338 |a ƒa Online-Ressource  |b cr  |2 rdacarrier 
500 |a Date Completed 27.06.2008 
500 |a Date Revised 10.12.2019 
500 |a published: Print 
500 |a CommentIn: IEEE Trans Pattern Anal Mach Intell. 2009 Apr;31(4):762; discussion 763-4. doi: 10.1109/tpami.2008.192. - PMID 19358365 
500 |a Citation Status MEDLINE 
520 |a Informative benchmarks are crucial for optimizing the page segmentation step of an OCR system, frequently the performance limiting step for overall OCR system performance. We show that current evaluation scores are insufficient for diagnosing specific errors in page segmentation and fail to identify some classes of serious segmentation errors altogether. This paper introduces a vectorial score that is sensitive to, and identifies, the most important classes of segmentation errors (over-, under-, and mis-segmentation) and what page components (lines, blocks, etc.) are affected. Unlike previous schemes, our evaluation method has a canonical representation of ground truth data and guarantees pixel-accurate evaluation results for arbitrary region shapes. We present the results of evaluating widely used segmentation algorithms (x-y cut, smearing, whitespace analysis, constrained text-line finding, docstrum, and Voronoi) on the UW-III database and demonstrate that the new evaluation scheme permits the identification of several specific flaws in individual segmentation methods 
650 4 |a Comparative Study 
650 4 |a Evaluation Study 
650 4 |a Journal Article 
650 4 |a Research Support, Non-U.S. Gov't 
650 4 |a Validation Study 
700 1 |a Keysers, Daniel  |e verfasserin  |4 aut 
700 1 |a Breuel, Thomas  |e verfasserin  |4 aut 
773 0 8 |i Enthalten in  |t IEEE transactions on pattern analysis and machine intelligence  |d 1998  |g 30(2008), 6 vom: 20. Juni, Seite 941-54  |w (DE-627)NLM098212257  |x 0162-8828  |7 nnns 
773 1 8 |g volume:30  |g year:2008  |g number:6  |g day:20  |g month:06  |g pages:941-54 
856 4 0 |u http://dx.doi.org/10.1109/TPAMI.2007.70837  |3 Volltext 
912 |a GBV_USEFLAG_A 
912 |a SYSFLAG_A 
912 |a GBV_NLM 
912 |a GBV_ILN_350 
951 |a AR 
952 |d 30  |j 2008  |e 6  |b 20  |c 06  |h 941-54