Skip to main navigation Skip to search Skip to main content

Federated transfer learning under heterogeneous data

  • Xueyang WU

Student thesis: Doctoral thesis

Abstract

Recent advancements in artificial intelligence (AI) applications rely on massive amounts of training data. In practice, these valuable data are independently distributed among multiple data owners (e.g., companies and individuals), whose quantities are typically modest, and the data are usually heterogeneous. Collecting data from individual users or acquiring data from data owners is a conventionally popular and straightforward solution to this issue. However, such solutions have become obsolete due to the rising trend of data privacy and data security concerns. Currently, AI systems face the problem of utilizing fragmented and diverse data that are independently distributed across several data owners. Federated learning (FL), a novel privacy-preserving collaborative machine learning paradigm, is proposed to address the privately isolated small data learning problem. Its main idea is to compose a federation of data owners in which all participants virtually assemble their data without sacrificing data security and privacy. There are several challenges for federated learning, including communication efficiency, data security and privacy protection, and statistical learning. Among these challenges, the statistical learning challenge caused by heterogeneous data significantly affects the performance of FL systems and thus prohibits FL’s applications in practice. In recent years, academics have developed a machine learning paradigm known as transfer learning, which utilizes heterogeneous data to solve the statistical learning issue in the target domain with limited or no data. Naturally, it motivates us to incorporate the spirit of transfer learning into federated learning to overcome the difficulty of statistical learning in practical FL. In this thesis, we focus on federated transfer learning, a class of federated learning methods that employ the transfer learning methodology to tackle the statistical learning difficulty posed by heterogeneous data. Compared to other federated learning approaches, which presume datasets on data owners are similarly and independently distributed, federated transfer learning focuses on how to address data heterogeneity across data owners in practice and achieves superior performance. The thesis consists of two parts. First, we provide a brief overview of federated learning, including its concept, evolution, and categorization. More specifically, we cover its statistical learning challenges in depth. We offer a precise categorization of algorithms addressing these challenges in federated learning, which we refer to as federated transfer learning. Then, we examine current representative works and incorporate them into our proposed federated transfer learning architecture. Second, we identify three typical scenarios of data heterogeneity in federated learning with practical applications and investigate how our proposed federated transfer learning methods overcome the challenge in these scenarios. We believe that these federated transfer learning methods hold great promise for wider applications of federated learning.
Date of Award2022
Original languageEnglish
Awarding Institution
  • The Hong Kong University of Science and Technology
SupervisorQiang YANG (Supervisor) & Lei CHEN (Supervisor)

Cite this

'