Multi-Granularity Context Network for Efficient Video Semantic Segmentation

Current video semantic segmentation tasks involve two main challenges: how to take full advantage of multi-frame context information, and how to improve computational efficiency. To tackle the two challenges simultaneously, we present a novel Multi-Granularity Context Network (MGCNet) by aggregating...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on image processing : a publication of the IEEE Signal Processing Society. - 1992. - 32(2023) vom: 10., Seite 3163-3175
1. Verfasser:	Liang, Zhiyuan (VerfasserIn)
Weitere Verfasser:	Dai, Xiangdong, Wu, Yiqian, Jin, Xiaogang, Shen, Jianbing
Format:	Online-Aufsatz
Sprache:	English
Veröffentlicht:	2023
Zugriff auf das übergeordnete Werk:	IEEE transactions on image processing : a publication of the IEEE Signal Processing Society
Schlagworte:	Journal Article


LEADER	01000naa a22002652 4500
001	NLM356193918
003	DE-627
005	20231226065938.0
007	cr uuu---uuuuu
008	231226s2023 xx \|\|\|\|\|o 00\| \|\|eng c
024	7		\|a 10.1109/TIP.2023.3269982 \|2 doi
028	5	2	\|a pubmed24n1187.xml
035			\|a (DE-627)NLM356193918
035			\|a (NLM)37115829
040			\|a DE-627 \|b ger \|c DE-627 \|e rakwb
041			\|a eng
100	1		\|a Liang, Zhiyuan \|e verfasserin \|4 aut
245	1	0	\|a Multi-Granularity Context Network for Efficient Video Semantic Segmentation
264		1	\|c 2023
336			\|a Text \|b txt \|2 rdacontent
337			\|a ƒaComputermedien \|b c \|2 rdamedia
338			\|a ƒa Online-Ressource \|b cr \|2 rdacarrier
500			\|a Date Completed 04.06.2023
500			\|a Date Revised 04.06.2023
500			\|a published: Print-Electronic
500			\|a Citation Status PubMed-not-MEDLINE
520			\|a Current video semantic segmentation tasks involve two main challenges: how to take full advantage of multi-frame context information, and how to improve computational efficiency. To tackle the two challenges simultaneously, we present a novel Multi-Granularity Context Network (MGCNet) by aggregating context information at multiple granularities in a more effective and efficient way. Our method first converts image features into semantic prototypes, and then conducts a non-local operation to aggregate the per-frame and short-term contexts jointly. An additional long-term context module is introduced to capture the video-level semantic information during training. By aggregating both local and global semantic information, a strong feature representation is obtained. The proposed pixel-to-prototype non-local operation requires less computational cost than traditional non-local ones, and is video-friendly since it reuses the semantic prototypes of previous frames. Moreover, we propose an uncertainty-aware and structural knowledge distillation strategy to boost the performance of our method. Experiments on Cityscapes and CamVid datasets with multiple backbones demonstrate that the proposed MGCNet outperforms other state-of-the-art methods with high speed and low latency
650		4	\|a Journal Article
700	1		\|a Dai, Xiangdong \|e verfasserin \|4 aut
700	1		\|a Wu, Yiqian \|e verfasserin \|4 aut
700	1		\|a Jin, Xiaogang \|e verfasserin \|4 aut
700	1		\|a Shen, Jianbing \|e verfasserin \|4 aut
773	0	8	\|i Enthalten in \|t IEEE transactions on image processing : a publication of the IEEE Signal Processing Society \|d 1992 \|g 32(2023) vom: 10., Seite 3163-3175 \|w (DE-627)NLM09821456X \|x 1941-0042 \|7 nnns
773	1	8	\|g volume:32 \|g year:2023 \|g day:10 \|g pages:3163-3175
856	4	0	\|u http://dx.doi.org/10.1109/TIP.2023.3269982 \|3 Volltext
912			\|a GBV_USEFLAG_A
912			\|a SYSFLAG_A
912			\|a GBV_NLM
912			\|a GBV_ILN_350
951			\|a AR
952			\|d 32 \|j 2023 \|b 10 \|h 3163-3175