Hierarchical LSTMs with Adaptive Attention for Visual Captioning

Recent progress has been made in using attention based encoder-decoder framework for image and video captioning. Most existing decoders apply the attention mechanism to every generated word including both visual words (e.g., "gun" and "shooting") and non-visual words (e.g., "...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on pattern analysis and machine intelligence. - 1979. - 42(2020), 5 vom: 01. Mai, Seite 1112-1131
1. Verfasser: Gao, Lianli (VerfasserIn)
Weitere Verfasser: Li, Xiangpeng, Song, Jingkuan, Shen, Heng Tao
Format: Online-Aufsatz
Sprache:English
Veröffentlicht: 2020
Zugriff auf das übergeordnete Werk:IEEE transactions on pattern analysis and machine intelligence
Schlagworte:Journal Article Research Support, Non-U.S. Gov't