CSFwinformer : Cross-Space-Frequency Window Transformer for Mirror Detection

Mirror detection is a challenging task since mirrors do not possess a consistent visual appearance. Even the Segment Anything Model (SAM), which boasts superior zero-shot performance, cannot accurately detect the position of mirrors. Existing methods determine the position of the mirror under hypoth...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on image processing : a publication of the IEEE Signal Processing Society. - 1992. - 33(2024) vom: 07., Seite 1853-1867
1. Verfasser:	Xie, Zhifeng (VerfasserIn)
Weitere Verfasser:	Wang, Sen, Yu, Qiucheng, Tan, Xin, Xie, Yuan
Format:	Online-Aufsatz
Sprache:	English
Veröffentlicht:	2024
Zugriff auf das übergeordnete Werk:	IEEE transactions on image processing : a publication of the IEEE Signal Processing Society
Schlagworte:	Journal Article


LEADER	01000caa a22002652 4500
001	NLM369415213
003	DE-627
005	20240313234631.0
007	cr uuu---uuuuu
008	240308s2024 xx \|\|\|\|\|o 00\| \|\|eng c
024	7		\|a 10.1109/TIP.2024.3372468 \|2 doi
028	5	2	\|a pubmed24n1326.xml
035			\|a (DE-627)NLM369415213
035			\|a (NLM)38451758
040			\|a DE-627 \|b ger \|c DE-627 \|e rakwb
041			\|a eng
100	1		\|a Xie, Zhifeng \|e verfasserin \|4 aut
245	1	0	\|a CSFwinformer \|b Cross-Space-Frequency Window Transformer for Mirror Detection
264		1	\|c 2024
336			\|a Text \|b txt \|2 rdacontent
337			\|a ƒaComputermedien \|b c \|2 rdamedia
338			\|a ƒa Online-Ressource \|b cr \|2 rdacarrier
500			\|a Date Revised 13.03.2024
500			\|a published: Print-Electronic
500			\|a Citation Status PubMed-not-MEDLINE
520			\|a Mirror detection is a challenging task since mirrors do not possess a consistent visual appearance. Even the Segment Anything Model (SAM), which boasts superior zero-shot performance, cannot accurately detect the position of mirrors. Existing methods determine the position of the mirror under hypothetical conditions, such as the correspondence between objects inside and outside the mirror, and the semantic association between the mirror and surrounding objects. However, these assumptions do not apply to all scenarios. For instance, there may be no corresponding real objects to the reflected objects in the scene, or it may be challenging to extract meaningful semantic associations in complex scenes. On the other hand, humans can easily recognize mirrors through the specular texture caused by materials. To mine mirror features in more general scenes, we propose a Cross-Space-Frequency Window Transformer (CSFwinformer) to extract spatial and frequency features for texture analysis. Specifically, we design a Spatial-Frequency Window Alignment module (SFWA) to calculate spatial-frequency feature affinities and learn the difference between mirror and non-mirror textures. We then propose a Dilated Window Attention (DWA) to extract global features to complement the limitation of window alignment. Besides, we propose a Cross-Modality Context Contrast module (CMCC) to fuse cross-modality features and global features, which enables information flow between different windows to take full advantage of cross-modality information. Extensive experiments show that our method performs favorably against state-of-the-art methods on three mirror detection benchmarks and significantly improved SAM performance on mirror detection. The code is available at https://github.com/wangsen99/CSFwinformer
650		4	\|a Journal Article
700	1		\|a Wang, Sen \|e verfasserin \|4 aut
700	1		\|a Yu, Qiucheng \|e verfasserin \|4 aut
700	1		\|a Tan, Xin \|e verfasserin \|4 aut
700	1		\|a Xie, Yuan \|e verfasserin \|4 aut
773	0	8	\|i Enthalten in \|t IEEE transactions on image processing : a publication of the IEEE Signal Processing Society \|d 1992 \|g 33(2024) vom: 07., Seite 1853-1867 \|w (DE-627)NLM09821456X \|x 1941-0042 \|7 nnns
773	1	8	\|g volume:33 \|g year:2024 \|g day:07 \|g pages:1853-1867
856	4	0	\|u http://dx.doi.org/10.1109/TIP.2024.3372468 \|3 Volltext
912			\|a GBV_USEFLAG_A
912			\|a SYSFLAG_A
912			\|a GBV_NLM
912			\|a GBV_ILN_350
951			\|a AR
952			\|d 33 \|j 2024 \|b 07 \|h 1853-1867