State-Aware Compositional Learning Toward Unbiased Training for Scene Graph Generation

How to avoid biased predictions is an important and active research question in scene graph generation (SGG). Current state-of-the-art methods employ debiasing techniques such as resampling and causality analysis. However, the role of intrinsic cues in the features causing biased training has remain...

Description complète

Détails bibliographiques
Publié dans:IEEE transactions on image processing : a publication of the IEEE Signal Processing Society. - 1992. - 32(2023) vom: 13., Seite 43-56
Auteur principal: He, Tao (Auteur)
Autres auteurs: Gao, Lianli, Song, Jingkuan, Li, Yuan-Fang
Format: Article en ligne
Langue:English
Publié: 2023
Accès à la collection:IEEE transactions on image processing : a publication of the IEEE Signal Processing Society
Sujets:Journal Article
Description
Résumé:How to avoid biased predictions is an important and active research question in scene graph generation (SGG). Current state-of-the-art methods employ debiasing techniques such as resampling and causality analysis. However, the role of intrinsic cues in the features causing biased training has remained under-explored. In this paper, for the first time, we make the surprising observation that object identity information, in the form of object label embeddings (e.g. GLOVE), is principally responsible for biased predictions. We empirically observe that, even without any visual features, a number of recent SGG models can produce comparable or even better results solely from object label embeddings. Motivated by this insight, we propose to leverage a conditional variational auto-encoder to decouple the entangled visual features into two meaningful components: the object's intrinsic identity features and the extrinsic, relation-dependent state feature. We further develop two compositional learning strategies on the relation and object levels to mitigate the data scarcity issue of rare relations. On the two benchmark datasets Visual Genome and GQA, we conduct extensive experiments on the three scenarios, i.e., conventional, few-shot and zero-shot SGG. Results consistently demonstrate that our proposed Decomposition and Composition (DeC) method effectively alleviates the biases in the relation prediction. Moreover, DeC is model-free, and it significantly improves the performance of recent SGG models, establishing new state-of-the-art performance
Description:Date Completed 05.04.2023
Date Revised 05.04.2023
published: Print-Electronic
Citation Status PubMed-not-MEDLINE
ISSN:1941-0042
DOI:10.1109/TIP.2022.3224872