An I/O-Efficient Disk-based Graph System for Scalable Second-Order Random Walk of Large Graphs

Hongzheng Li, Yingxia Shao*, Junping Du, Bin Cui, Lei Chen

*Corresponding author for this work

Research output: Contribution to journalConference article published in journalpeer-review

11 Citations (Scopus)

Abstract

Random walk is widely used in many graph analysis tasks, especially the first-order random walk. However, as a simplification of real-world problems, the first-order random walk is poor at modeling higher-order structures in the data. Recently, second-order random walk-based applications (e.g., Node2vec, Second-order PageRank) have become attractive. Due to the complexity of the second-order random walk models and memory limitations, it is not scalable to run second-order random walk-based applications on a single machine. Existing disk-based graph systems are only friendly to the first-order random walk models and suffer from expensive disk I/Os when executing the second-order random walks. This paper introduces an I/O-effcient disk-based graph system for the scalable second-order random walk of large graphs, called Gra-Sorw. First, to eliminate massive light vertex I/Os, we develop a bi-block execution engine that converts random I/Os into sequential I/Os by applying a new triangular bi-block scheduling strategy, the bucket-based walk management, and the skewed walk storage. Second, to improve the I/O utilization, we design a learning-based block loading model to leverage the advantages of the full-load and on-demand load methods. Finally, we conducted extensive experiments on six large real datasets as well as several synthetic datasets.. The empirical results demonstrate that the end-to-end time cost of popular tasks in GraSorw is reduced by more than one order of magnitude compared to the existing disk-based graph systems.

Original languageEnglish
Pages (from-to)1619-1631
Number of pages13
JournalProceedings of the VLDB Endowment
Volume15
Issue number8
DOIs
Publication statusPublished - 2022
Event48th International Conference on Very Large Data Bases, VLDB 2022 - Sydney, Australia
Duration: 5 Sept 20229 Sept 2022

Bibliographical note

Publisher Copyright:
© 2022, American Mathematical Society. All rights reserved.

Fingerprint

Dive into the research topics of 'An I/O-Efficient Disk-based Graph System for Scalable Second-Order Random Walk of Large Graphs'. Together they form a unique fingerprint.

Cite this