Vote2Cap-DETR++ : Decoupling Localization and Describing for End-to-End 3D Dense Captioning

3D dense captioning requires a model to translate its understanding of an input 3D scene into several captions associated with different object regions. Existing methods adopt a sophisticated "detect-then-describe" pipeline, which builds explicit relation modules upon a 3D detector with nu...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on pattern analysis and machine intelligence. - 1979. - 46(2024), 11 vom: 27. Okt., Seite 7331-7347
1. Verfasser:	Chen, Sijin (VerfasserIn)
Weitere Verfasser:	Zhu, Hongyuan, Li, Mingsheng, Chen, Xin, Guo, Peng, Lei, Yinjie, Yu, Gang, Li, Taihao, Chen, Tao
Format:	Online-Aufsatz
Sprache:	English
Veröffentlicht:	2024
Zugriff auf das übergeordnete Werk:	IEEE transactions on pattern analysis and machine intelligence
Schlagworte:	Journal Article

Online verfügbar	Volltext