Instruction-Guided Scene Text Recognition
Multi-modal models have shown appealing performance in visual recognition tasks, as free-form text-guided training evokes the ability to understand fine-grained visual content. However, current models cannot be trivially applied to scene text recognition (STR) due to the compositional difference bet...
Ausführliche Beschreibung
Bibliographische Detailangaben
| Veröffentlicht in: | IEEE transactions on pattern analysis and machine intelligence. - 1979. - 47(2025), 4 vom: 28. Apr., Seite 2723-2738
|
| 1. Verfasser: |
Du, Yongkun
(VerfasserIn) |
| Weitere Verfasser: |
Chen, Zhineng,
Su, Yuchen,
Jia, Caiyan,
Jiang, Yu-Gang |
| Format: | Online-Aufsatz
|
| Sprache: | English |
| Veröffentlicht: |
2025
|
| Zugriff auf das übergeordnete Werk: | IEEE transactions on pattern analysis and machine intelligence
|
| Schlagworte: | Journal Article |