Skip to main navigation Skip to search Skip to main content

Searching for Efficiency: Automated Search for Hardware-Aware Neural Networks and High-Performance FPGA Accelerators

  • Afzal AHMAD

Student thesis: Doctoral thesis

Abstract

The quest for computational efficiency in deep learning requires optimization across all levels of the system stack, from arithmetic logic to neural network topology. This thesis presents a hierarchical framework of automated search techniques to co-optimize hardware-aware neural networks and their high-performance FPGA accelerators. The core thesis is that principled, automated search can systematically uncover superior solutions that are inaccessible through manual design.

Our hierarchical approach begins at the compute kernel level, where we first enrich the design space for hardware synthesis. We introduce a fast and practical FPGA implementation of Strassen’s algorithm, demonstrating for the first time its viability on non-asymptotic matrices with up to a 1.85× speedup over highly optimized standard kernels. This work provides a novel, resource-complementary building block that is then exploited by HeteroGEMM, our framework for automated search at the accelerator architecture level. By intelligently composing a mix of standard and Strassen’s kernels into a tailored, multi-kernel accelerator, HeteroGEMM significantly outperforms traditional monolithic designs on complex DNN workloads.

Addressing the complementary software challenge, we introduce PertNAS, a memory-efficient evolutionary methodology for Neural Architecture Search. By decoupling memory requirements from search space size, PertNAS reduces the cost of finding optimal network architectures by 80%, making the software side of the co-design problem more scalable and practical. To bridge the gap between this software-level search and hardware reality, we developed Accel-NASBench, the first large-scale, bi-objective benchmark with true on-device performance data from a diverse suite of FPGAs, GPUs, and TPUs. This benchmark serves as a high-fidelity platform to validate co-design strategies and enable realistic hardware-aware research at zero cost.

Collectively, these contributions provide a holistic, multi-layered methodology to automate the search for efficiency, demonstrating a complete and practical approach to hardware/software co-design for next-generation AI systems.

Date of Award2025
Original languageEnglish
Awarding Institution
  • The Hong Kong University of Science and Technology
SupervisorWei ZHANG (Supervisor)

Cite this

'