ST4ML: Machine Learning Oriented Spatio-Temporal Data Processing at Scale

Jianqiang Huang, Panrong Tong, Yue Wu, Mo Li, Kaiqi Liu

Research output: Contribution to conferenceConference Paperpeer-review

Abstract

Data scientists and researchers utilize enormous spatio-temporal data and build machine learning models to solve practical problems in diverse domains including intelligent transportation, urban planning, epidemic prediction, and many more. Extracting application-specific features from big spatio-temporal data poses system requirements of heterogeneous data support, efficient and scalable computing over spatial and temporal dimensions, as well as a user-friendly programming interface. This paper presents ST4ML, a distributed spatio-temporal data processing system to support scalable machine-learning-oriented applications. We propose a three-stage pipelining computing framework, namely "selection-conversion-extraction" to abstract the distributed computing flow and implement it based on Apache Spark. To the best of our knowledge, ST4ML is the first of its kind to realize our design considerations. Extensive experiments with real-world datasets evidence that ST4ML outperforms straightforward extensions of existing ST data processing systems by up to an order of magnitude. ST4ML is open-sourced at https://github.com/Panrong/st4ml.
Original languageEnglish
Pages46753
Publication statusPublished - May 2023
Externally publishedYes
EventProceedings of the ACM on Management of Data -
Duration: 1 May 20231 May 2023

Conference

ConferenceProceedings of the ACM on Management of Data
Period1/05/231/05/23

Fingerprint

Dive into the research topics of 'ST4ML: Machine Learning Oriented Spatio-Temporal Data Processing at Scale'. Together they form a unique fingerprint.

Cite this