Skip to main navigation Skip to search Skip to main content

High-throughput Generative Inference of Large Language Models with a Single GPU

Clark Barrett, Beidi Chen, Daniel Y. Fu, Joseph E. Gonzalez, Zhuohan Li, Percy Liang, Christopher Ré, Max Ryabinin, Ying Sheng, Ion Stoica, Zhiqiang Xie, Binhang Yuan, Ce Zhang, Lianmin Zheng

Research output: Contribution to conferenceConference Paper

Original languageEnglish
Publication statusPublished - 2023
Event40th International Conference on Machine Learning (ICML 2023) -
Duration: 1 Jan 20231 Jan 2023

Conference

Conference40th International Conference on Machine Learning (ICML 2023)
Period1/01/231/01/23

Cite this