Cross-matching big astronomic catalogs on heterogeneous clusters

  • Xiaoying JIA

Student thesis: Doctoral thesis

Abstract

In astronomy, cross-match is a central operation to integrate multi-wavelength information by identifying celestial objects across multiple catalogs. With the rapid increase in data volume from space and ground-based surveys, it becomes mandatory to process large astronomic catalogs efficiently. In this thesis, we study how to accelerate the cross-match of billion-record catalogs on a cluster of heterogeneous computers with both CPUs and GPUs. Specifically, we present two cross-match algorithms, namely IB-CM (Index-Based Cross-Match) and MASJ-CM (Multi-Assignment Single-Join Cross-Match), and study the performance impact of indexing methods as well as design choices and optimizations of both algorithms for a heterogeneous computer cluster. We have implemented these algorithms fully utilizing the computation and communication resources of the cluster, and compared with those on Spark and SpatialHadoop, two popular distributed computing platforms. Our evaluations on real-world astronomic catalogs show that our native implementations were orders of magnitude faster than those on Spark or SpatialHadoop and that self-matching billion-record catalogs on a six-node cluster finished under five minutes.
Date of Award2017
Original languageEnglish
Awarding Institution
  • The Hong Kong University of Science and Technology

Cite this

'