Multimodal Cross-Lingual Summarization for Videos : A Revisit in Knowledge Distillation Induced Triple-Stage Training Method

Multimodal summarization (MS) for videos aims to generate summaries from multi-source information (e.g., video and text transcript), showing promising progress recently. However, existing works are limited to monolingual scenarios, neglecting non-native viewers' needs to understand videos in ot...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on pattern analysis and machine intelligence. - 1979. - 46(2024), 12 vom: 19. Nov., Seite 10697-10714
1. Verfasser: Liu, Nayu (VerfasserIn)
Weitere Verfasser: Wei, Kaiwen, Yang, Yong, Tao, Jianhua, Sun, Xian, Yao, Fanglong, Yu, Hongfeng, Jin, Li, Lv, Zhao, Fan, Cunhang
Format: Online-Aufsatz
Sprache:English
Veröffentlicht: 2024
Zugriff auf das übergeordnete Werk:IEEE transactions on pattern analysis and machine intelligence
Schlagworte:Journal Article