Semantic Concentration for Self-Supervised Dense Representations Learning

Recent advances in image-level self-supervised learning (SSL) have made significant progress, yet learning dense representations for patches remains challenging. Mainstream methods encounter an over-dispersion phenomenon that patches from the same instance/category scatter, harming downstream perfor...

Description complète

Détails bibliographiques
Publié dans:	IEEE transactions on pattern analysis and machine intelligence. - 1979. - PP(2025) vom: 23. Sept.
Auteur principal:	Wen, Peisong (Auteur)
Autres auteurs:	Xu, Qianqian, Dai, Siran, Cong, Runmin, Huang, Qingming
Format:	Article en ligne
Langue:	English
Publié:	2025
Accès à la collection:	IEEE transactions on pattern analysis and machine intelligence
Sujets:	Journal Article


LEADER	01000naa a22002652c 4500
001	NLM392994534
003	DE-627
005	20250925232613.0
007	cr uuu---uuuuu
008	250925s2025 xx \|\|\|\|\|o 00\| \|\|eng c
024	7		\|a 10.1109/TPAMI.2025.3609758 \|2 doi
028	5	2	\|a pubmed25n1579.xml
035			\|a (DE-627)NLM392994534
035			\|a (NLM)40986579
040			\|a DE-627 \|b ger \|c DE-627 \|e rakwb
041			\|a eng
100	1		\|a Wen, Peisong \|e verfasserin \|4 aut
245	1	0	\|a Semantic Concentration for Self-Supervised Dense Representations Learning
264		1	\|c 2025
336			\|a Text \|b txt \|2 rdacontent
337			\|a ƒaComputermedien \|b c \|2 rdamedia
338			\|a ƒa Online-Ressource \|b cr \|2 rdacarrier
500			\|a Date Revised 23.09.2025
500			\|a published: Print-Electronic
500			\|a Citation Status Publisher
520			\|a Recent advances in image-level self-supervised learning (SSL) have made significant progress, yet learning dense representations for patches remains challenging. Mainstream methods encounter an over-dispersion phenomenon that patches from the same instance/category scatter, harming downstream performance on dense tasks. This work reveals that image-level SSL avoids over-dispersion by involving implicit semantic concentration. Specifically, the non-strict spatial alignment ensures intra-instance consistency, while shared patterns, i.e., similar parts of within-class instances in the input space, ensure inter-image consistency. Unfortunately, these approaches are infeasible for dense SSL due to their spatial sensitivity and complicated scene-centric data. These observations motivate us to explore explicit semantic concentration for dense SSL. First, to break the strict spatial alignment, we propose to distill the patch correspondences. Facing noisy and imbalanced pseudo labels, we propose a noise-tolerant ranking loss. The core idea is extending the Average Precision (AP) loss to continuous targets, such that its decision-agnostic and adaptive focusing properties prevent the student model from being misled. Second, to discriminate the shared patterns from complicated scenes, we propose the object-aware filter to map the output space to an object-based space. Specifically, patches are represented by learnable prototypes of objects via cross-attention. Last but not least, empirical studies across various tasks soundly support the effectiveness of our method
650		4	\|a Journal Article
700	1		\|a Xu, Qianqian \|e verfasserin \|4 aut
700	1		\|a Dai, Siran \|e verfasserin \|4 aut
700	1		\|a Cong, Runmin \|e verfasserin \|4 aut
700	1		\|a Huang, Qingming \|e verfasserin \|4 aut
773	0	8	\|i Enthalten in \|t IEEE transactions on pattern analysis and machine intelligence \|d 1979 \|g PP(2025) vom: 23. Sept. \|w (DE-627)NLM098212257 \|x 1939-3539 \|7 nnas
773	1	8	\|g volume:PP \|g year:2025 \|g day:23 \|g month:09
856	4	0	\|u http://dx.doi.org/10.1109/TPAMI.2025.3609758 \|3 Volltext
912			\|a GBV_USEFLAG_A
912			\|a SYSFLAG_A
912			\|a GBV_NLM
912			\|a GBV_ILN_350
951			\|a AR
952			\|d PP \|j 2025 \|b 23 \|c 09