Performance optimizations on GPUs are not well-understood enough. This thesis discusses principles and automation of performance optimizations on NVIDIA GPUs, with a special focus on compute-bound kernels. This thesis focuses on the abstraction layers between portable virtual instruction sets (e.g., LLVM IR, NVIDIA PTX) and native hardware assembly. We first introduce the native GPU instruction set, Shader ASSembly (SASS). Previously, the public cannot customize SASS generation as the only way to generate SASS is to use close-sourced proprietary compiler ptxas. ptxas hides many important optimizations including instruction scheduling. We develop an open-source assembler, TuringAs, for the public to manipulate SASS. And we identified new optimization opportunities at SASS level. For instance, using some native SASS instructions helps to reduce register pressure and reordering SASS instructions leads to better instructionlevel parallelism thus increasing throughput. We evaluate the effectiveness of our optimizations with the examples of Winograd convolution (a fast convolution algorithm) and Tensor Core matrix multiplication. Next, we introduce our effort to automate SASS optimizations to promote productivity. Programming in SASS doesn’t scale to a large number of kernels nor new GPU architectures. We develop GASS, an LLVM-based compiler that translates high-level virtual representation (i.e., LLVM IR) to optimized SASS automatically. We highlight our newly proposed instruction scheduler for compute-bound deep learning kernels, our customization of the if-conversion pass, and our algorithms to resolve data dependency. The evaluation shows that our algorithms in GASS outperform LLVM’s algorithms by a considerable margin and GASS is on-par with highly optimized proprietary compiler ptxas.
| Date of Award | 2022 |
|---|
| Original language | English |
|---|
| Awarding Institution | - The Hong Kong University of Science and Technology
|
|---|
| Supervisor | Wei WANG (Supervisor) |
|---|
Principles and automation of low-level optimizations on GPUs
YAN, D. (Author). 2022
Student thesis: Doctoral thesis