AdaGCL+ : An Adaptive Subgraph Contrastive Learning Toward Tackling Topological Bias

Large-scale graph data poses a training scalability challenge, which is generally treated by employing batch sampling methods to divide the graph into smaller subgraphs and train them in batches. However, such an approach introduces a topological bias in the local batches compared with the complete...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on pattern analysis and machine intelligence. - 1979. - 47(2025), 9 vom: 04. Aug., Seite 8073-8087
1. Verfasser: Wang, Yili (VerfasserIn)
Weitere Verfasser: Liu, Yaohua, Liu, Ninghao, Miao, Rui, Wang, Ying, Wang, Xin
Format: Online-Aufsatz
Sprache:English
Veröffentlicht: 2025
Zugriff auf das übergeordnete Werk:IEEE transactions on pattern analysis and machine intelligence
Schlagworte:Journal Article
Beschreibung
Zusammenfassung:Large-scale graph data poses a training scalability challenge, which is generally treated by employing batch sampling methods to divide the graph into smaller subgraphs and train them in batches. However, such an approach introduces a topological bias in the local batches compared with the complete graph structure, missing either node features or edges. This topological bias is empirically shown to affect the generalization capabilities of graph neural networks (GNNs). To address this issue, we propose adaptive subgraph contrastive learning (AdaGCL) that bridges the gap between large-scale batch sampling and its generalization poorness. Specifically, AdaGCL augments graphs depending on the sampled batches and leverages a subgraph-granularity contrastive loss to learn the node embeddings invariant among the augmented imperfect graphs. To optimize the augmentation strategy for each downstream application, we introduce a node-centric information bottleneck (Node-IB) to control the trade-off regarding the similarity and diversity between the original and augmented graphs. This enhanced version of AdaGCL referred to as AdaGCL+, automates the graph augmentation process by dynamically adjusting graph perturbation parameters (e.g., edge dropping rate) to minimize the downstream loss. Extensive experimental results showcase the scalability of AdaGCL+ to graphs with millions of nodes using batch sampling methods. AdaGCL+ consistently outperforms existing methods on numerous benchmark datasets in terms of node classification accuracy and runtime efficiency
Beschreibung:Date Revised 07.08.2025
published: Print
Citation Status PubMed-not-MEDLINE
ISSN:1939-3539
DOI:10.1109/TPAMI.2025.3574354