MeViS : A Multi-Modal Dataset for Referring Motion Expression Video Segmentation

This paper proposes a large-scale multi-modal dataset for referring motion expression video segmentation, focusing on segmenting and tracking target objects in videos based on language description of objects' motions. Existing referring video segmentation datasets often focus on salient objects...

Description complète

Détails bibliographiques
Publié dans:IEEE transactions on pattern analysis and machine intelligence. - 1979. - PP(2025) vom: 19. Aug.
Auteur principal: Ding, Henghui (Auteur)
Autres auteurs: Liu, Chang, He, Shuting, Ying, Kaining, Jiang, Xudong, Loy, Chen Change, Jiang, Yu-Gang
Format: Article en ligne
Langue:English
Publié: 2025
Accès à la collection:IEEE transactions on pattern analysis and machine intelligence
Sujets:Journal Article