Data-parallel primitives, such as gather, scatter, scan (prefix sum), and split, are widely used in parallel programs. Multi-way joins are a common operator in data analytics applications. In this thesis, we design and implement efficient algorithms for these primitives and join operators on heterogeneous processors, including multi-core CPUs, Intel Xeon Phi (KNC) processors, and Graphics Processing Units (GPUs). Specifically, we first revisit the performance of scatter and gather on new-generation GPUs, and propose a new model for their optimization. We then propose optimization strategies for these two primitives as well as scan and split that work well for an Intel multi-core CPU, an NVIDIA GPU, and a KNC. Finally, we propose a GPU-based multi-way hash join solution that effectively utilizes the primitives to achieve high bandwidth utilization on GPUs. Our core idea is to design a warp-based parallelization strategy for reducing thread divergence and for coalescing memory accesses. We further enhance our implementation with a set of GPU-friendly optimizations, including dynamic skew handling for load balance and a sampling-based probing strategy for result counting. We experimentally compare our method with cascaded pairwise hash joins as well as merge-based multi-way joins on GPUs and CPUs. The results show that our method outperforms prior work.
| Date of Award | 2020 |
|---|
| Original language | English |
|---|
| Awarding Institution | - The Hong Kong University of Science and Technology
|
|---|
Accelerating data-parallel primitives and multi-way joins on heterogeneous processors
LAI, Z. (Author). 2020
Student thesis: Doctoral thesis