Semantic Object Segmentation in Tagged Videos via Detection
Semantic object segmentation (SOS) is a challenging task in computer vision that aims to detect and segment all pixels of the objects within predefined semantic categories. In image-based SOS, many supervised models have been proposed and achieved impressive performances due to the rapid advances of...
Veröffentlicht in: | IEEE transactions on pattern analysis and machine intelligence. - 1979. - 40(2018), 7 vom: 25. Juli, Seite 1741-1754 |
---|---|
1. Verfasser: | |
Weitere Verfasser: | , , , , , , , , , , |
Format: | Online-Aufsatz |
Sprache: | English |
Veröffentlicht: |
2018
|
Zugriff auf das übergeordnete Werk: | IEEE transactions on pattern analysis and machine intelligence |
Schlagworte: | Journal Article Research Support, Non-U.S. Gov't |
Zusammenfassung: | Semantic object segmentation (SOS) is a challenging task in computer vision that aims to detect and segment all pixels of the objects within predefined semantic categories. In image-based SOS, many supervised models have been proposed and achieved impressive performances due to the rapid advances of well-annotated training images and machine learning theories. However, in video-based SOS it is often difficult to directly train a supervised model since most videos are weakly annotated by tags. To handle such tagged videos, this paper proposes a novel approach that adopts a segmentation-by-detection framework. In this framework, object detection and segment proposals are first generated using the models pre-trained on still images, which provide useful cues to roughly localize the semantic objects. Based on these proposals, we propose an efficient algorithm to initialize object tracks by solving a joint assignment problem. As such tracks provide rough spatiotemporal configurations of the semantic objects, a voting-based refinement algorithm is further proposed to improve their spatiotemporal consistency. Extensive experiments demonstrate that the proposed framework can robustly and effectively segment semantic objects in tagged videos, even when the image-based object detectors provide inaccurate proposals. On various public benchmarks, the proposed approach obtains substantial improvements over the state-of-the-arts |
---|---|
Beschreibung: | Date Completed 05.04.2019 Date Revised 05.04.2019 published: Print-Electronic Citation Status PubMed-not-MEDLINE |
ISSN: | 1939-3539 |
DOI: | 10.1109/TPAMI.2017.2727049 |