ISTR : Mask-Embedding-Based Instance Segmentation Transformer

Transformer-based instance-level recognition has attracted increasing research attention recently due to the superior performance. However, although attempts have been made to encode masks as embeddings into Transformer-based frameworks, how to combine mask embeddings and spatial information for a t...

Description complète

Détails bibliographiques
Publié dans:	IEEE transactions on image processing : a publication of the IEEE Signal Processing Society. - 1992. - 33(2024) vom: 14., Seite 2895-2907
Auteur principal:	Hu, Jie (Auteur)
Autres auteurs:	Lu, Yao, Zhang, Shengchuan, Cao, Liujuan
Format:	Article en ligne
Langue:	English
Publié:	2024
Accès à la collection:	IEEE transactions on image processing : a publication of the IEEE Signal Processing Society
Sujets:	Journal Article


LEADER	01000caa a22002652c 4500
001	NLM370970101
003	DE-627
005	20250306015904.0
007	cr uuu---uuuuu
008	240413s2024 xx \|\|\|\|\|o 00\| \|\|eng c
024	7		\|a 10.1109/TIP.2024.3385980 \|2 doi
028	5	2	\|a pubmed25n1235.xml
035			\|a (DE-627)NLM370970101
035			\|a (NLM)38607701
040			\|a DE-627 \|b ger \|c DE-627 \|e rakwb
041			\|a eng
100	1		\|a Hu, Jie \|e verfasserin \|4 aut
245	1	0	\|a ISTR \|b Mask-Embedding-Based Instance Segmentation Transformer
264		1	\|c 2024
336			\|a Text \|b txt \|2 rdacontent
337			\|a ƒaComputermedien \|b c \|2 rdamedia
338			\|a ƒa Online-Ressource \|b cr \|2 rdacarrier
500			\|a Date Revised 17.04.2024
500			\|a published: Print-Electronic
500			\|a Citation Status PubMed-not-MEDLINE
520			\|a Transformer-based instance-level recognition has attracted increasing research attention recently due to the superior performance. However, although attempts have been made to encode masks as embeddings into Transformer-based frameworks, how to combine mask embeddings and spatial information for a transformer-based approach is still not fully explored. In this paper, we revisit the design of mask-embedding-based pipelines and propose an Instance Segmentation TRansformer (ISTR) with Mask Meta-Embeddings (MME), leveraging the strengths of transformer models in encoding embedding information and incorporating spatial information from mask embeddings. ISTR incorporates a recurrent refining head that consists of a Dynamic Box Predictor (DBP), a Mask Information Generator (MIG), and a Mask Meta-Decoder (MMD). To improve the quality of mask embeddings, MME interprets the mask encoding-decoding processes as a mutual information maximization problem, which unifies the objective functions of different decoding schemes such as Principal Component Analysis (PCA) and Discrete Cosine Transform (DCT) with a meta-formulation. Under the meta-formulation, a learnable Spatial Mask Tuner (SMT) is further proposed, which fuses the spatial and embedding information produced from MIG and can significantly boost the segmentation performance. The resulting varieties, i.e., ISTR-PCA, ISTR-DCT, and ISTR-SMT, demonstrate the effectiveness and efficiency of incorporating mask embeddings with the query-based instance segmentation pipelines. On the COCO dataset, ISTR surpasses all predominant mask-embedding-based models by a large margin, and achieves competitive performance compared to concurrent state-of-the-art models. On the Cityscapes dataset, ISTR also outperforms several strong baselines. Our code has been made available at: https://github.com/hujiecpp/ISTR
650		4	\|a Journal Article
700	1		\|a Lu, Yao \|e verfasserin \|4 aut
700	1		\|a Zhang, Shengchuan \|e verfasserin \|4 aut
700	1		\|a Cao, Liujuan \|e verfasserin \|4 aut
773	0	8	\|i Enthalten in \|t IEEE transactions on image processing : a publication of the IEEE Signal Processing Society \|d 1992 \|g 33(2024) vom: 14., Seite 2895-2907 \|w (DE-627)NLM09821456X \|x 1941-0042 \|7 nnas
773	1	8	\|g volume:33 \|g year:2024 \|g day:14 \|g pages:2895-2907
856	4	0	\|u http://dx.doi.org/10.1109/TIP.2024.3385980 \|3 Volltext
912			\|a GBV_USEFLAG_A
912			\|a SYSFLAG_A
912			\|a GBV_NLM
912			\|a GBV_ILN_350
951			\|a AR
952			\|d 33 \|j 2024 \|b 14 \|h 2895-2907