Rethinking Generalized Zero-Shot Learning : A Synthesized Per-Instance Attribute Perspective

Generalized zero-shot learning (GZSL) shows great potential for improving generalization to unseen classes in real-world scenarios. However, most GZSL methods depend on benchmark datasets with per-class attribute annotations, which creates a large semantic gap and worsens the domain shift problem in...

Description complète

Détails bibliographiques
Publié dans:	IEEE transactions on image processing : a publication of the IEEE Signal Processing Society. - 1992. - 34(2025) vom: 12., Seite 5847-5859
Auteur principal:	Tang, Chenwei (Auteur)
Autres auteurs:	Wang, Ying, Xie, Wei, Zhang, Qianjun, Xiao, Rong, He, Zhenan, Lv, Jiancheng
Format:	Article en ligne
Langue:	English
Publié:	2025
Accès à la collection:	IEEE transactions on image processing : a publication of the IEEE Signal Processing Society
Sujets:	Journal Article


LEADER	01000caa a22002652c 4500
001	NLM392639793
003	DE-627
005	20250923233015.0
007	cr uuu---uuuuu
008	250917s2025 xx \|\|\|\|\|o 00\| \|\|eng c
024	7		\|a 10.1109/TIP.2025.3607612 \|2 doi
028	5	2	\|a pubmed25n1578.xml
035			\|a (DE-627)NLM392639793
035			\|a (NLM)40953421
040			\|a DE-627 \|b ger \|c DE-627 \|e rakwb
041			\|a eng
100	1		\|a Tang, Chenwei \|e verfasserin \|4 aut
245	1	0	\|a Rethinking Generalized Zero-Shot Learning \|b A Synthesized Per-Instance Attribute Perspective
264		1	\|c 2025
336			\|a Text \|b txt \|2 rdacontent
337			\|a ƒaComputermedien \|b c \|2 rdamedia
338			\|a ƒa Online-Ressource \|b cr \|2 rdacarrier
500			\|a Date Revised 22.09.2025
500			\|a published: Print
500			\|a Citation Status PubMed-not-MEDLINE
520			\|a Generalized zero-shot learning (GZSL) shows great potential for improving generalization to unseen classes in real-world scenarios. However, most GZSL methods depend on benchmark datasets with per-class attribute annotations, which creates a large semantic gap and worsens the domain shift problem in the visual-semantic space. To address these challenges, instance-level attributes offer an intuitive solution, but they require expensive manual annotation. In this paper, we propose a simple yet effective approach called per-instance attribute synthesis (PIAS) to generate diverse semantic representations for each instance. Our method first uses the Vision Transformer (ViT) model to extract visual features and then generates per-instance attributes. The patch splitting, positional embedding, and multi-head self-attention mechanisms in ViT improve the discriminability of both visual and semantic representations. Next, we define the generated attributes of class-average images as class anchor points. These anchor points are calibrated in the semantic space by minimizing the cosine similarity between the anchor points and per-class attribute annotations. Finally, we improve the diversity of generated per-instance attributes by aligning the topological structure between per-class attribute annotations and synthesized per-instance attributes with that between class-average visual features and per-instance visual features. We conduct comprehensive experiments on three challenging ZSL datasets: AWA2, CUB, and SUN. The results show that PIAS significantly outperforms state-of-the-art methods under both ZSL and GZSL settings. We further demonstrate the generalization ability of PIAS by applying it to attribute-based zero-shot image retrieval tasks
650		4	\|a Journal Article
700	1		\|a Wang, Ying \|e verfasserin \|4 aut
700	1		\|a Xie, Wei \|e verfasserin \|4 aut
700	1		\|a Zhang, Qianjun \|e verfasserin \|4 aut
700	1		\|a Xiao, Rong \|e verfasserin \|4 aut
700	1		\|a He, Zhenan \|e verfasserin \|4 aut
700	1		\|a Lv, Jiancheng \|e verfasserin \|4 aut
773	0	8	\|i Enthalten in \|t IEEE transactions on image processing : a publication of the IEEE Signal Processing Society \|d 1992 \|g 34(2025) vom: 12., Seite 5847-5859 \|w (DE-627)NLM09821456X \|x 1941-0042 \|7 nnas
773	1	8	\|g volume:34 \|g year:2025 \|g day:12 \|g pages:5847-5859
856	4	0	\|u http://dx.doi.org/10.1109/TIP.2025.3607612 \|3 Volltext
912			\|a GBV_USEFLAG_A
912			\|a SYSFLAG_A
912			\|a GBV_NLM
912			\|a GBV_ILN_350
951			\|a AR
952			\|d 34 \|j 2025 \|b 12 \|h 5847-5859