Semantic Object Segmentation in Tagged Videos via Detection

Semantic object segmentation (SOS) is a challenging task in computer vision that aims to detect and segment all pixels of the objects within predefined semantic categories. In image-based SOS, many supervised models have been proposed and achieved impressive performances due to the rapid advances of...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on pattern analysis and machine intelligence. - 1979. - 40(2018), 7 vom: 25. Juli, Seite 1741-1754
1. Verfasser:	Zhang, Yu (VerfasserIn)
Weitere Verfasser:	Chen, Xiaowu, Li, Jia, Wang, Chen, Xia, Changqun, Li, Jun, Yu Zhang, Xiaowu Chen, Jia Li, Chen Wang, Changqun Xia, Jun Li
Format:	Online-Aufsatz
Sprache:	English
Veröffentlicht:	2018
Zugriff auf das übergeordnete Werk:	IEEE transactions on pattern analysis and machine intelligence
Schlagworte:	Journal Article Research Support, Non-U.S. Gov't

Beschreibung
Zusammenfassung:	Semantic object segmentation (SOS) is a challenging task in computer vision that aims to detect and segment all pixels of the objects within predefined semantic categories. In image-based SOS, many supervised models have been proposed and achieved impressive performances due to the rapid advances of well-annotated training images and machine learning theories. However, in video-based SOS it is often difficult to directly train a supervised model since most videos are weakly annotated by tags. To handle such tagged videos, this paper proposes a novel approach that adopts a segmentation-by-detection framework. In this framework, object detection and segment proposals are first generated using the models pre-trained on still images, which provide useful cues to roughly localize the semantic objects. Based on these proposals, we propose an efficient algorithm to initialize object tracks by solving a joint assignment problem. As such tracks provide rough spatiotemporal configurations of the semantic objects, a voting-based refinement algorithm is further proposed to improve their spatiotemporal consistency. Extensive experiments demonstrate that the proposed framework can robustly and effectively segment semantic objects in tagged videos, even when the image-based object detectors provide inaccurate proposals. On various public benchmarks, the proposed approach obtains substantial improvements over the state-of-the-arts
Beschreibung:	Date Completed 05.04.2019 Date Revised 05.04.2019 published: Print-Electronic Citation Status PubMed-not-MEDLINE
ISSN:	1939-3539
DOI:	10.1109/TPAMI.2017.2727049