Abstract
Pre-trained vision-language models like CLIP show great potential in learning representations that capture latent characteristics of users. A recently proposed method called Contextual Optimization (CoOp) introduces the concept of training prompt for adapting pre-trained vision-language models. Given the lightweight nature of this method, researchers have migrated the paradigm from centralized to decentralized system to innovate the collaborative training framework of Federated Learning (FL). However, current prompt training in FL mainly focuses on modeling user consensus and lacks the adaptation to user characteristics, leaving the personalization of prompt largely under-explored. Researches over the past few years have applied personalized FL (pFL) approaches to customizing models for heterogeneous users. Unfortunately, we find that with the variation of modality and training behavior, directly applying the pFL methods to prompt training leads to insufficient personalization and performance. To bridge the gap, we present pFedPrompt, which leverages the unique advantage of multimodality in vision-language models by learning user consensus from linguistic space and adapting to user characteristics in visual space in a non-parametric manner. Through this dual collaboration, the learned prompt will be fully personalized and aligned to the user's local characteristics. We conduct extensive experiments across various datasets under the FL setting with statistical heterogeneity. The results demonstrate the superiority of our pFedPrompt against the alternative approaches with robust performance.
| Original language | English |
|---|---|
| Title of host publication | ACM Web Conference 2023 - Proceedings of the World Wide Web Conference, WWW 2023 |
| Publisher | Association for Computing Machinery, Inc |
| Pages | 1364-1374 |
| Number of pages | 11 |
| ISBN (Electronic) | 9781450394161 |
| DOIs | |
| Publication status | Published - 30 Apr 2023 |
| Externally published | Yes |
| Event | 32nd ACM World Wide Web Conference, WWW 2023 - Austin, United States Duration: 30 Apr 2023 → 4 May 2023 |
Publication series
| Name | ACM Web Conference 2023 - Proceedings of the World Wide Web Conference, WWW 2023 |
|---|
Conference
| Conference | 32nd ACM World Wide Web Conference, WWW 2023 |
|---|---|
| Country/Territory | United States |
| City | Austin |
| Period | 30/04/23 → 4/05/23 |
Bibliographical note
Publisher Copyright:© 2023 ACM.
Keywords
- federated learning
- prompt learning
- user modeling and personalization
- vision-language models
Fingerprint
Dive into the research topics of 'pFedPrompt: Learning Personalized Prompt for Vision-Language Models in Federated Learning'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver