Part-Object Progressive Refinement Network for Zero-Shot Learning

Zero-shot learning (ZSL) recognizes unseen images by sharing semantic knowledge transferred from seen images, encouraging the investigation of associations between semantic and visual information. Prior works have been devoted to the alignment of global visual features with semantic information, i.e...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on image processing : a publication of the IEEE Signal Processing Society. - 1992. - 33(2024) vom: 18., Seite 2032-2043
1. Verfasser:	Liu, Man (VerfasserIn)
Weitere Verfasser:	Zhang, Chunjie, Bai, Huihui, Zhao, Yao
Format:	Online-Aufsatz
Sprache:	English
Veröffentlicht:	2024
Zugriff auf das übergeordnete Werk:	IEEE transactions on image processing : a publication of the IEEE Signal Processing Society
Schlagworte:	Journal Article

Beschreibung
Zusammenfassung:	Zero-shot learning (ZSL) recognizes unseen images by sharing semantic knowledge transferred from seen images, encouraging the investigation of associations between semantic and visual information. Prior works have been devoted to the alignment of global visual features with semantic information, i.e., attribute vectors, or further mining the local part regions related to each attribute and then simply concatenating them for category decisions. Although effective, these works ignore intrinsic interactions between local parts and the whole object, which enables a more discriminative and representative knowledge transfer for ZSL. In this paper, we propose a Part-Object Progressive Refinement Network (POPRNet), where discriminative and transferable semantics are progressively refined by the cooperation between parts and the whole object. Specifically, POPRNet incorporates discriminative part semantics and object-centric semantics guided by semantic intensity to improve cross-domain transferability. To achieve part-object learning, a semantic-augment transformer (SaT) is proposed to model the part-object relation at the part-level via an encoder and at the object-level via a decoder, generating a comprehensive semantic representation to boost discriminability and transferability. By introducing the prototype updating module embedded with the prototype selection layers, the discriminative ability of the updated category prototype is enhanced to further improve the recognition performance of ZSL. Extensive experiments are conducted to demonstrate the superiority and competitiveness of our proposed POPRNet method on three public benchmark datasets. The code is available at https://github.com/ManLiuCoder/POPRNet
Beschreibung:	Date Revised 18.03.2024 published: Print-Electronic Citation Status PubMed-not-MEDLINE
ISSN:	1941-0042
DOI:	10.1109/TIP.2024.3374217