Disentangling Before Composing : Learning Invariant Disentangled Features for Compositional Zero-Shot Learning

Compositional Zero-Shot Learning (CZSL) aims to recognize novel compositions using knowledge learned from seen attribute-object compositions in the training set. Previous works mainly project an image and its corresponding composition into a common embedding space to measure their compatibility scor...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on pattern analysis and machine intelligence. - 1979. - PP(2024) vom: 28. Okt.
1. Verfasser: Zhang, Tian (VerfasserIn)
Weitere Verfasser: Liang, Kongming, Du, Ruoyi, Chen, Wei, Ma, Zhanyu
Format: Online-Aufsatz
Sprache:English
Veröffentlicht: 2024
Zugriff auf das übergeordnete Werk:IEEE transactions on pattern analysis and machine intelligence
Schlagworte:Journal Article
LEADER 01000naa a22002652 4500
001 NLM379535238
003 DE-627
005 20241029233039.0
007 cr uuu---uuuuu
008 241029s2024 xx |||||o 00| ||eng c
024 7 |a 10.1109/TPAMI.2024.3487222  |2 doi 
028 5 2 |a pubmed24n1584.xml 
035 |a (DE-627)NLM379535238 
035 |a (NLM)39466858 
040 |a DE-627  |b ger  |c DE-627  |e rakwb 
041 |a eng 
100 1 |a Zhang, Tian  |e verfasserin  |4 aut 
245 1 0 |a Disentangling Before Composing  |b Learning Invariant Disentangled Features for Compositional Zero-Shot Learning 
264 1 |c 2024 
336 |a Text  |b txt  |2 rdacontent 
337 |a ƒaComputermedien  |b c  |2 rdamedia 
338 |a ƒa Online-Ressource  |b cr  |2 rdacarrier 
500 |a Date Revised 28.10.2024 
500 |a published: Print-Electronic 
500 |a Citation Status Publisher 
520 |a Compositional Zero-Shot Learning (CZSL) aims to recognize novel compositions using knowledge learned from seen attribute-object compositions in the training set. Previous works mainly project an image and its corresponding composition into a common embedding space to measure their compatibility score. However, both attributes and objects share the visual representations learned above, leading the model to exploit spurious correlations and bias towards seen compositions. Instead, we reconsider CZSL as an out-of-distribution generalization problem. If an object is treated as a domain, we can learn object-invariant features to recognize attributes attached to any object reliably, and vice versa. Specifically, we propose an invariant feature learning framework to align different domains at the representation and gradient levels to capture the intrinsic characteristics associated with the tasks. To further facilitate and encourage the disentanglement of attributes and objects, we propose an "encoding-reshuffling-decoding" process to help the model avoid spurious correlations by randomly regrouping the disentangled features into synthetic features. Ultimately, our method improves generalization by learning to disentangle features that represent two independent factors of attributes and objects. Experiments demonstrate that the proposed method achieves state-of-the-art or competitive performance in both closed-world and open-world scenarios. Codes are available at https://github.com/PRIS-CV/Disentangling-before-Composing 
650 4 |a Journal Article 
700 1 |a Liang, Kongming  |e verfasserin  |4 aut 
700 1 |a Du, Ruoyi  |e verfasserin  |4 aut 
700 1 |a Chen, Wei  |e verfasserin  |4 aut 
700 1 |a Ma, Zhanyu  |e verfasserin  |4 aut 
773 0 8 |i Enthalten in  |t IEEE transactions on pattern analysis and machine intelligence  |d 1979  |g PP(2024) vom: 28. Okt.  |w (DE-627)NLM098212257  |x 1939-3539  |7 nnns 
773 1 8 |g volume:PP  |g year:2024  |g day:28  |g month:10 
856 4 0 |u http://dx.doi.org/10.1109/TPAMI.2024.3487222  |3 Volltext 
912 |a GBV_USEFLAG_A 
912 |a SYSFLAG_A 
912 |a GBV_NLM 
912 |a GBV_ILN_350 
951 |a AR 
952 |d PP  |j 2024  |b 28  |c 10