High Efficiency Inference Accelerating Algorithm for NOMA-Based Edge Intelligence

Xin Yuan, Ning Li*, Tuo Zhang, Muqing Li, Yuwen Chen, Jose Fernan Martinez Ortega, Song Guo

*Corresponding author for this work

Research output: Contribution to journalJournal Articlepeer-review

6 Citations (Scopus)

Abstract

Even the artificial intelligence (AI) has been widely used and significantly changed our life, deploying the large AI models on resource limited edge devices directly is not appropriate. Thus, the model split inference is proposed to improve the performance of edge intelligence (EI), in which the AI model is divided into different sub-models and the resource-intensive sub-model is offloaded to edge server wirelessly for reducing resource requirements and inference latency. Unfortunately, with the sharp increasing of edge devices, the shortage of spectrum resource in edge network becomes seriously in recent years, which limits the performance improvement of EI. Refer to the NOMA-based edge computing (EC), integrating non-orthogonal multiple access (NOMA) technology with split inference in EI is attractive. However, the NOMA-based communication aspect and the influence of intermediate data transmission fail to be considered properly in model split inference of EI in previous works, and the sophistication in resource allocation caused by NOMA scheme makes it further complicated. Thus, the Effective Communication and Computing resource allocation algorithm is proposed in this paper for accelerating the split inference in NOMA-based EI, shorted as ECC. Specifically, the ECC takes the energy consumption and the inference latency into account to find the optimal model split strategy and resource allocation strategy (subchannel, transmission power, computing resource). Since the minimum inference delay and energy consumption cannot be satisfied simultaneously, the gradient descent (GD) based algorithm is adopted to find the optimal tradeoff between them. Moreover, the loop iteration GD approach (Li-GD) is developed to reduce the complexity of the GD algorithm caused by parameter discretization. The key idea of Li-GD is that: the initial value of the ith layer's GD procedure is selected from the optimal results of the former (i-1) layers' GD procedure whose intermediate data size is the closest to ith layer. Additionally, the properties of the proposed algorithms are investigated, including convergence, complexity, and approximation error. The experimental results demonstrate that the performance of ECC is much better than that of the previous studies.

Original languageEnglish
Pages (from-to)17539-17556
Number of pages18
JournalIEEE Transactions on Wireless Communications
Volume23
Issue number11
DOIs
Publication statusPublished - 2024

Bibliographical note

Publisher Copyright:
© 2002-2012 IEEE.

Keywords

  • Edge intelligence
  • NOMA
  • inference accelerating
  • model split

Fingerprint

Dive into the research topics of 'High Efficiency Inference Accelerating Algorithm for NOMA-Based Edge Intelligence'. Together they form a unique fingerprint.

Cite this