Variational Context : Exploiting Visual and Textual Context for Grounding Referring Expressions
We focus on grounding (i.e., localizing or linking) referring expressions in images, e.g., "largest elephant standing behind baby elephant". This is a general yet challenging vision-language task since it does not only require the localization of objects, but also the multimodal comprehens...
Publié dans: | IEEE transactions on pattern analysis and machine intelligence. - 1979. - 43(2021), 1 vom: 08. Jan., Seite 347-359 |
---|---|
Auteur principal: | |
Autres auteurs: | , , |
Format: | Article en ligne |
Langue: | English |
Publié: |
2021
|
Accès à la collection: | IEEE transactions on pattern analysis and machine intelligence |
Sujets: | Journal Article Research Support, Non-U.S. Gov't |
Accès en ligne |
Volltext |