Weakly-Supervised 3D Spatial Reasoning for Text-Based Visual Question Answering

Text-based Visual Question Answering (TextVQA) aims to produce correct answers for given questions about the images with multiple scene texts. In most cases, the texts naturally attach to the surface of the objects. Therefore, spatial reasoning between texts and objects is crucial in TextVQA. Howeve...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on image processing : a publication of the IEEE Signal Processing Society. - 1992. - 32(2023) vom: 31., Seite 3367-3382
1. Verfasser: Li, Hao (VerfasserIn)
Weitere Verfasser: Huang, Jinfa, Jin, Peng, Song, Guoli, Wu, Qi, Chen, Jie
Format: Online-Aufsatz
Sprache:English
Veröffentlicht: 2023
Zugriff auf das übergeordnete Werk:IEEE transactions on image processing : a publication of the IEEE Signal Processing Society
Schlagworte:Journal Article