Vote2Cap-DETR++ : Decoupling Localization and Describing for End-to-End 3D Dense Captioning

3D dense captioning requires a model to translate its understanding of an input 3D scene into several captions associated with different object regions. Existing methods adopt a sophisticated "detect-then-describe" pipeline, which builds explicit relation modules upon a 3D detector with nu...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on pattern analysis and machine intelligence. - 1979. - 46(2024), 11 vom: 27. Okt., Seite 7331-7347
1. Verfasser: Chen, Sijin (VerfasserIn)
Weitere Verfasser: Zhu, Hongyuan, Li, Mingsheng, Chen, Xin, Guo, Peng, Lei, Yinjie, Yu, Gang, Li, Taihao, Chen, Tao
Format: Online-Aufsatz
Sprache:English
Veröffentlicht: 2024
Zugriff auf das übergeordnete Werk:IEEE transactions on pattern analysis and machine intelligence
Schlagworte:Journal Article