A Survey on Video Temporal Grounding with Multimodal Large Language Model
The recent advancement in video temporal grounding (VTG) has significantly enhanced fine-grained video understanding, primarily driven by multimodal large language models (MLLMs). With superior multimodal comprehension and reasoning abilities, VTG approaches based on MLLMs (VTG-MLLMs) are gradually...
Ausführliche Beschreibung
Bibliographische Detailangaben
| Veröffentlicht in: | IEEE transactions on pattern analysis and machine intelligence. - 1979. - PP(2025) vom: 29. Sept.
|
| 1. Verfasser: |
Wu, Jianlong
(VerfasserIn) |
| Weitere Verfasser: |
Liu, Wei,
Liu, Ye,
Liu, Meng,
Nie, Liqiang,
Lin, Zhouchen,
Chen, Chang Wen |
| Format: | Online-Aufsatz
|
| Sprache: | English |
| Veröffentlicht: |
2025
|
| Zugriff auf das übergeordnete Werk: | IEEE transactions on pattern analysis and machine intelligence
|
| Schlagworte: | Journal Article |