SANCUS: Staleness-Aware Communication-Avoiding Full-Graph Decentralized Training in Large-Scale Graph Neural Networks

Jingshu Peng, Zhao Chen, Yingxia Shao*, Yanyan Shen*, Lei Chen, Jiannong Cao

*Corresponding author for this work

Research output: Contribution to journalConference article published in journalpeer-review

Abstract

Graph neural networks (GNNs) have emerged due to their success at modeling graph data. Yet, it is challenging for GNNs to efficiently scale to large graphs. Thus, distributed GNNs come into play. To avoid communication caused by expensive data movement between workers, we propose SANCUS, a staleness-aware communication-avoiding decentralized GNN system. By introducing a set of novel bounded embedding staleness metrics and adaptively skipping broadcasts, SANCUS abstracts decentralized GNN processing as sequential matrix multiplication and uses historical embeddings via cache. Theoretically, we show bounded approximation errors of embeddings and gradients with convergence guarantee. Empirically, we evaluate SANCUS with common GNN models via different system setups on large-scale benchmark datasets. Compared to SOTA works, SANCUS can avoid up to 74% communication with at least 1.86⇥ faster throughput on average without accuracy loss.

Original languageEnglish
Pages (from-to)1937-1950
Number of pages14
JournalProceedings of the VLDB Endowment
Volume15
Issue number9
DOIs
Publication statusPublished - 2022
Event48th International Conference on Very Large Data Bases, VLDB 2022 - Sydney, Australia
Duration: 5 Sept 20229 Sept 2022

Bibliographical note

Publisher Copyright:
© 2022, VLDB Endowment.

Fingerprint

Dive into the research topics of 'SANCUS: Staleness-Aware Communication-Avoiding Full-Graph Decentralized Training in Large-Scale Graph Neural Networks'. Together they form a unique fingerprint.

Cite this