Abstract
Online recommender systems use deep learning recommendation models (DLRMs) to provide accurate, personalized recommendations to improve customer experience. However, efficiently provisioning DLRM services at scale is challenging. DLRMs exhibit distinct resource usage patterns: they require a large number of CPU cores and a tremendous amount of memory, but only a small number of GPUs. Running them in multi-GPU servers quickly exhausts the servers' CPU and memory resources, leaving a large number of unallocated GPUs stranded, unable to utilize by other tasks. This paper describes Prism, a production DLRM serving system that eliminates GPU fragmentation by means of resource disaggregation. In Prism, a fleet of CPU nodes (CNs) interconnect with a cluster of heterogeneous GPU nodes (HNs) through RDMA, leading to two disaggregated resource pools that can independently scale. Prism automatically divides DLRMs into CPU- and GPU-intensive subgraphs and schedules them on CNs and HNs for disaggregated serving. Prism employs various techniques to minimize the latency overhead caused by disaggregation, including optimal graph partitioning, topology-aware resource management, and SLO-aware communication scheduling. Evaluations show that Prism effectively reduces CPU and GPU fragmentation by 53% and 27% in a crowded GPU cluster. During seasonal promotion events, it efficiently enables capacity loaning from training clusters, saving over 90% of GPUs. Prism has been deployed in production clusters for over two years and now runs on over 10k GPUs.
| Original language | English |
|---|---|
| Title of host publication | Proceedings of the 22nd USENIX Symposium on Networked Systems Design and Implementation, NSDI 2025 |
| Publisher | USENIX Association |
| Pages | 847-863 |
| Number of pages | 17 |
| ISBN (Electronic) | 9781939133465 |
| Publication status | Published - 2025 |
| Event | 22nd USENIX Symposium on Networked Systems Design and Implementation, NSDI 2025 - Philadelphia, United States Duration: 28 Apr 2025 → 30 Apr 2025 |
Publication series
| Name | Proceedings of the 22nd USENIX Symposium on Networked Systems Design and Implementation, NSDI 2025 |
|---|
Conference
| Conference | 22nd USENIX Symposium on Networked Systems Design and Implementation, NSDI 2025 |
|---|---|
| Country/Territory | United States |
| City | Philadelphia |
| Period | 28/04/25 → 30/04/25 |
Bibliographical note
Publisher Copyright:© 2025 by The USENIX Association All Rights Reserved.