Abstract
Federated learning (FL) is typically performed in a synchronous parallel manner, and the involvement of a slow client delays the training progress. Current FL systems employ a participant selection strategy to select fast clients with quality data in each iteration. However, this is not always possible in practice, and the selection strategy has to navigate a knotty tradeoff between the speed and the data quality. This paper makes a case for asynchronous FL by presenting Pisces, a new FL system with intelligent participant selection and model aggregation for accelerated training despite slow clients. To avoid incurring excessive resource cost and stale training computation, Pisces uses a novel scoring mechanism to identify suitable clients to participate in each training iteration. It also adapts the aggregation pace dynamically to bound the progress gap between the participating clients and the server, with a provable convergence guarantee in a smooth non-convex setting. We have implemented Pisces in an open-source FL platform, Plato, and evaluated its performance in large-scale experiments with popular vision and language models. Pisces outperforms the state-of-the-art synchronous and asynchronous alternatives, reducing the time-to-accuracy by up to 2.0X and 1.9X, respectively.
| Original language | English |
|---|---|
| Title of host publication | SoCC 2022 - Proceedings of the 13th Symposium on Cloud Computing |
| Publisher | Association for Computing Machinery, Inc |
| Pages | 370-385 |
| Number of pages | 16 |
| ISBN (Electronic) | 9781450394147 |
| DOIs | |
| Publication status | Published - 7 Nov 2022 |
| Event | 13th Annual ACM Symposium on Cloud Computing, SoCC 2022 - San Francisco, United States Duration: 7 Nov 2022 → 11 Nov 2022 |
Publication series
| Name | SoCC 2022 - Proceedings of the 13th Symposium on Cloud Computing |
|---|
Conference
| Conference | 13th Annual ACM Symposium on Cloud Computing, SoCC 2022 |
|---|---|
| Country/Territory | United States |
| City | San Francisco |
| Period | 7/11/22 → 11/11/22 |
Bibliographical note
Publisher Copyright:© 2022 ACM.
Keywords
- asynchronous training
- efficiency
- federated learning