Abstract
Sparse Matrix-Matrix Multiplication (SpMM) is a building-block operation in scientific computing and machine learning applications. Recent advancements in hardware, notably Tensor Cores (TCs), have created promising opportunities for accelerating SpMM. However, harnessing these hardware accelerators to speed up general SpMM necessitates considerable effort. In this paper, we undertake a comprehensive analysis of the state-of-the-art techniques for accelerating TC-based SpMM and identify crucial performance gaps. Drawing upon these insights, we propose DTC-SpMM, a novel approach with systematic optimizations tailored for accelerating general SpMM on TCs. DTC-SpMM encapsulates diverse aspects, including efficient compression formats, reordering methods, and runtime pipeline optimizations. Our extensive experiments on modern GPUs with a diverse range of benchmark matrices demonstrate remarkable performance improvements in SpMM acceleration by TCs in conjunction with our proposed optimizations. The case study also shows that DTC-SpMM speeds up end-to-end GNN training by up to 1.91× against popular GNN frameworks.
| Original language | English |
|---|---|
| Title of host publication | Fall Cycle |
| Publisher | Association for Computing Machinery |
| Pages | 253-267 |
| Number of pages | 15 |
| ISBN (Electronic) | 9798400703867 |
| DOIs | |
| Publication status | Published - 27 Apr 2024 |
| Event | 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2024 - San Diego, United States Duration: 27 Apr 2024 → 1 May 2024 |
Publication series
| Name | International Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS |
|---|---|
| Volume | 3 |
Conference
| Conference | 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2024 |
|---|---|
| Country/Territory | United States |
| City | San Diego |
| Period | 27/04/24 → 1/05/24 |
Bibliographical note
Publisher Copyright:© 2024 Copyright held by the owner/author(s).
Keywords
- GPU
- SpMM
- Sparse Matrix-Matrix Multiplication
- Tensor Core
- unstructured sparsity
Fingerprint
Dive into the research topics of 'DTC-SpMM: Bridging the Gap in Accelerating General Sparse Matrix Multiplication with Tensor Cores'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver