Architectural exploration and memory management for emerging CPU-FPGA systems

  • Liang FENG

Student thesis: Doctoral thesis

Abstract

Heterogeneous computing is a promising direction to address the challenges of performance and power walls in today’s high-performance computing. For this purpose, the CPU-FPGA system is especially promising due to the high flexibility of FPGA, which enables customization for various computing tasks to boost performance and energy efficiency. Nowadays, tightly-coupled CPU-FPGA systems with shared cache hierarchy (like Intel HARP and IBM POWER with CAPI) have been proposed to enhance the communication efficiency between the CPU and FPGA and simplify the programming model. In such systems, multi-core CPUs and the FPGA coherently share the same cache system and an FPGA cache is attached to the FPGA for quick memory access. Such emerging architectures bring new challenges when designing the CPU-FPGA collaborating systems. In this thesis, we address the challenges in emerging CPU-FPGA systems from various perspectives. First, we develop a simulation framework for CPU-FPGA systems to aid the design evaluation. It supports fast architectural exploration with respect to the number of cores, number of accelerated units on the FPGA, and different cache hierarchies between the CPU and FPGA. Various performance metrics are returned for the performance analysis and architectural configuration optimization. Then, motivated by the fact that the behavior of the FPGA cache often dominates the performance in emerging shared cache CPU-FPGA systems, we design two cache management approaches to enhance the FPGA cache utilization, targeting two different scenarios. One is to rely on cache bypassing to improve the FPGA cache hit rate for a single accelerated unit, the other is to alleviate the cache contention among multiple accelerated units by combing both cache partitioning and cache bypassing. These two approaches rely on static analysis of the applications and dynamic control guided by such static analysis. Finally, targeting the recently released Intel HARP2 CPU-FPGA system, where there are three bus links between the CPU and FPGA, one QPI bus attached with an FPGA cache and two PCIe buses, we develop an access management framework to select the bus link for each access in a static and dynamic hybrid way. The framework adaptively arranges the memory accesses to the preferred link to enhance the utilization of all links and boost the FPGA cache reuse benefit. It also solves the data inconsistency problem caused by the multiple links. A complete set of software services and hardware IPs for the framework is provided on the real HARP2 system. In summary, the proposed thesis performs a deep and multi-dimensional study on emerging CPU-FPGA systems, from architectural exploration to performance optimization, from simulation environment development to real system design, from algorithm level to the hardware logic level, and covers various types of emerging CPU-FPGA systems.
Date of Award2019
Original languageEnglish
Awarding Institution
  • The Hong Kong University of Science and Technology

Cite this

'