Semi-Supervised Scene Text Recognition

Scene text recognition has been widely researched with supervised approaches. Most existing algorithms require a large amount of labeled data and some methods even require character-level or pixel-wise supervision information. However, labeled data is expensive, unlabeled data is relatively easy to...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on image processing : a publication of the IEEE Signal Processing Society. - 1992. - 30(2021) vom: 01., Seite 3005-3016
1. Verfasser:	Gao, Yunze (VerfasserIn)
Weitere Verfasser:	Chen, Yingying, Wang, Jinqiao, Lu, Hanqing
Format:	Online-Aufsatz
Sprache:	English
Veröffentlicht:	2021
Zugriff auf das übergeordnete Werk:	IEEE transactions on image processing : a publication of the IEEE Signal Processing Society
Schlagworte:	Journal Article


LEADER	01000naa a22002652 4500
001	NLM32034133X
003	DE-627
005	20231225173352.0
007	cr uuu---uuuuu
008	231225s2021 xx \|\|\|\|\|o 00\| \|\|eng c
024	7		\|a 10.1109/TIP.2021.3051485 \|2 doi
028	5	2	\|a pubmed24n1067.xml
035			\|a (DE-627)NLM32034133X
035			\|a (NLM)33471761
040			\|a DE-627 \|b ger \|c DE-627 \|e rakwb
041			\|a eng
100	1		\|a Gao, Yunze \|e verfasserin \|4 aut
245	1	0	\|a Semi-Supervised Scene Text Recognition
264		1	\|c 2021
336			\|a Text \|b txt \|2 rdacontent
337			\|a ƒaComputermedien \|b c \|2 rdamedia
338			\|a ƒa Online-Ressource \|b cr \|2 rdacarrier
500			\|a Date Revised 19.02.2021
500			\|a published: Print-Electronic
500			\|a Citation Status PubMed-not-MEDLINE
520			\|a Scene text recognition has been widely researched with supervised approaches. Most existing algorithms require a large amount of labeled data and some methods even require character-level or pixel-wise supervision information. However, labeled data is expensive, unlabeled data is relatively easy to collect, especially for many languages with fewer resources. In this paper, we propose a novel semi-supervised method for scene text recognition. Specifically, we design two global metrics, i.e., edit reward and embedding reward, to evaluate the quality of generated string and adopt reinforcement learning techniques to directly optimize these rewards. The edit reward measures the distance between the ground truth label and the generated string. Besides, the image feature and string feature are embedded into a common space and the embedding reward is defined by the similarity between the input image and generated string. It is natural that the generated string should be the nearest with the image it is generated from. Therefore, the embedding reward can be obtained without any ground truth information. In this way, we can effectively exploit a large number of unlabeled images to improve the recognition performance without any additional laborious annotations. Extensive experimental evaluations on the five challenging benchmarks, the Street View Text, IIIT5K, and ICDAR datasets demonstrate the effectiveness of the proposed approach, and our method significantly reduces annotation effort while maintaining competitive recognition performance
650		4	\|a Journal Article
700	1		\|a Chen, Yingying \|e verfasserin \|4 aut
700	1		\|a Wang, Jinqiao \|e verfasserin \|4 aut
700	1		\|a Lu, Hanqing \|e verfasserin \|4 aut
773	0	8	\|i Enthalten in \|t IEEE transactions on image processing : a publication of the IEEE Signal Processing Society \|d 1992 \|g 30(2021) vom: 01., Seite 3005-3016 \|w (DE-627)NLM09821456X \|x 1941-0042 \|7 nnns
773	1	8	\|g volume:30 \|g year:2021 \|g day:01 \|g pages:3005-3016
856	4	0	\|u http://dx.doi.org/10.1109/TIP.2021.3051485 \|3 Volltext
912			\|a GBV_USEFLAG_A
912			\|a SYSFLAG_A
912			\|a GBV_NLM
912			\|a GBV_ILN_350
951			\|a AR
952			\|d 30 \|j 2021 \|b 01 \|h 3005-3016