Vote2Cap-DETR++ : Decoupling Localization and Describing for End-to-End 3D Dense Captioning
3D dense captioning requires a model to translate its understanding of an input 3D scene into several captions associated with different object regions. Existing methods adopt a sophisticated "detect-then-describe" pipeline, which builds explicit relation modules upon a 3D detector with nu...
Ausführliche Beschreibung
Bibliographische Detailangaben
Veröffentlicht in: | IEEE transactions on pattern analysis and machine intelligence. - 1979. - 46(2024), 11 vom: 27. Okt., Seite 7331-7347
|
1. Verfasser: |
Chen, Sijin
(VerfasserIn) |
Weitere Verfasser: |
Zhu, Hongyuan,
Li, Mingsheng,
Chen, Xin,
Guo, Peng,
Lei, Yinjie,
Yu, Gang,
Li, Taihao,
Chen, Tao |
Format: | Online-Aufsatz
|
Sprache: | English |
Veröffentlicht: |
2024
|
Zugriff auf das übergeordnete Werk: | IEEE transactions on pattern analysis and machine intelligence
|
Schlagworte: | Journal Article |