HSViT: A Hardware and Software Collaborative Design for Vision Transformer via Multi-level Compression

Hong Rui Song*, Liang Xu, Ya Wang, Xiao Wu, Meiqi Wang, Zhongfeng Wang

*Corresponding author for this work

Research output: Chapter in Book/Conference Proceeding/ReportConference Paper published in a bookpeer-review

Abstract

The rapid advancement of Vision Transformer (ViT) models has greatly enhanced performance in computer vision tasks. However, deploying ViTs in resource-constrained environments presents a challenge as attention computation forms a bottleneck, necessitating extensive memory and computation resources. To address this issue, we propose HSViT, a dedicated hardware and software co-design framework specified for ViT. HSViT introduces a configurable and efficient accelerator with dedicated dataflows that takes advantage of the multi-level compression, including feature map compression, token pruning and hardware-friendly sparsity. The proposed accelerator reduces intermediate transmission for feature maps and Query, Key, and Value matrices while enhancing data reuse and processing element utilization for chain matrix multiplications. Moreover, an innovative Top-k engine, integrated into the accelerator, is presented to support various selection scenarios with high speed and low resource consumption. Experiments validate that the proposed HSViT delivers significant speedups of 123.91×, 29.5×, and 3.01 ∼ 20.65× over conventional CPUs, GPUs, and prior arts, respectively. HSViT also achieves the throughput of up to 731.5 GOP/s and PE utilization as high as 92%.

Original languageEnglish
Title of host publicationISCAS 2024 - IEEE International Symposium on Circuits and Systems
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9798350330991
DOIs
Publication statusPublished - 2024
Externally publishedYes
Event2024 IEEE International Symposium on Circuits and Systems, ISCAS 2024 - Singapore, Singapore
Duration: 19 May 202422 May 2024

Publication series

NameProceedings - IEEE International Symposium on Circuits and Systems
ISSN (Print)0271-4310

Conference

Conference2024 IEEE International Symposium on Circuits and Systems, ISCAS 2024
Country/TerritorySingapore
CitySingapore
Period19/05/2422/05/24

Bibliographical note

Publisher Copyright:
© 2024 IEEE.

Keywords

  • FPGA
  • Hardware and Software Co-design
  • Model Compression
  • Vision Transformer

Fingerprint

Dive into the research topics of 'HSViT: A Hardware and Software Collaborative Design for Vision Transformer via Multi-level Compression'. Together they form a unique fingerprint.

Cite this