Saying the Unseen : Video Descriptions via Dialog Agents
Current vision and language tasks usually take complete visual data (e.g., raw images or videos) as input, however, practical scenarios may often consist the situations where part of the visual information becomes inaccessible due to various reasons e.g., restricted view with fixed camera or intenti...
Ausführliche Beschreibung
Bibliographische Detailangaben
Veröffentlicht in: | IEEE transactions on pattern analysis and machine intelligence. - 1979. - 44(2022), 10 vom: 29. Okt., Seite 7190-7204
|
1. Verfasser: |
Zhu, Ye
(VerfasserIn) |
Weitere Verfasser: |
Wu, Yu,
Yang, Yi,
Yan, Yan |
Format: | Online-Aufsatz
|
Sprache: | English |
Veröffentlicht: |
2022
|
Zugriff auf das übergeordnete Werk: | IEEE transactions on pattern analysis and machine intelligence
|
Schlagworte: | Journal Article
Video-Audio Media
Research Support, U.S. Gov't, Non-P.H.S. |