TY - JOUR
T1 - Fast Multimodal Edge Inference via Selective Feature Distillation
AU - Chen, Jinyu
AU - Xu, Wenchao
AU - Fan, Yunfeng
AU - Wang, Haozhao
AU - Chen, Quan
AU - Li, Jing
N1 - Publisher Copyright:
© 2002-2012 IEEE.
PY - 2025
Y1 - 2025
N2 - Inferring user status at the edge is essential for delivering personalized services, such as detecting emotional states. However, deploying large-scale models directly on user devices is impractical due to substantial computational overhead and the scarcity of labeled data. Conversely, uploading raw data to the cloud for processing raises significant privacy concerns and incurs prohibitive communication costs. To address this challenge, we propose a privacy-preserving multimodal inference framework that leverages large-scale public data while safeguarding sensitive information and optimizing computational efficiency. Specifically, we first train a teacher model in the cloud using publicly available data. Through a feature distillation process, the knowledge from this teacher model is transferred to a lightweight encoder deployed at the user end. This transfer is tailored to the user's data, ensuring that only relevant knowledge is distilled. To accommodate varying communication constraints, we introduce a feature compression mechanism that significantly reduces communication overhead without compromising inference accuracy. Extensive experiments on emotion recognition tasks demonstrate that the proposed framework effectively balances privacy preservation, resource efficiency, and inference accuracy, facilitating seamless collaboration between cloud and edge devices.
AB - Inferring user status at the edge is essential for delivering personalized services, such as detecting emotional states. However, deploying large-scale models directly on user devices is impractical due to substantial computational overhead and the scarcity of labeled data. Conversely, uploading raw data to the cloud for processing raises significant privacy concerns and incurs prohibitive communication costs. To address this challenge, we propose a privacy-preserving multimodal inference framework that leverages large-scale public data while safeguarding sensitive information and optimizing computational efficiency. Specifically, we first train a teacher model in the cloud using publicly available data. Through a feature distillation process, the knowledge from this teacher model is transferred to a lightweight encoder deployed at the user end. This transfer is tailored to the user's data, ensuring that only relevant knowledge is distilled. To accommodate varying communication constraints, we introduce a feature compression mechanism that significantly reduces communication overhead without compromising inference accuracy. Extensive experiments on emotion recognition tasks demonstrate that the proposed framework effectively balances privacy preservation, resource efficiency, and inference accuracy, facilitating seamless collaboration between cloud and edge devices.
KW - Cloud-edge collaborative inference
KW - knowledge distillation
KW - multimodal inference
UR - https://www.webofscience.com/wos/woscc/full-record/WOS:001587233000006
UR - https://openalex.org/W4411550720
UR - https://www.scopus.com/pages/publications/105009301013
U2 - 10.1109/TMC.2025.3580102
DO - 10.1109/TMC.2025.3580102
M3 - Journal Article
SN - 1536-1233
JO - IEEE Transactions on Mobile Computing
JF - IEEE Transactions on Mobile Computing
ER -