Variational Context : Exploiting Visual and Textual Context for Grounding Referring Expressions

We focus on grounding (i.e., localizing or linking) referring expressions in images, e.g., "largest elephant standing behind baby elephant". This is a general yet challenging vision-language task since it does not only require the localization of objects, but also the multimodal comprehens...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on pattern analysis and machine intelligence. - 1979. - 43(2021), 1 vom: 08. Jan., Seite 347-359
1. Verfasser:	Niu, Yulei (VerfasserIn)
Weitere Verfasser:	Zhang, Hanwang, Lu, Zhiwu, Chang, Shih-Fu
Format:	Online-Aufsatz
Sprache:	English
Veröffentlicht:	2021
Zugriff auf das übergeordnete Werk:	IEEE transactions on pattern analysis and machine intelligence
Schlagworte:	Journal Article Research Support, Non-U.S. Gov't

Online verfügbar	Volltext