Ca2-VDM: Efficient Autoregressive Video Diffusion Model with Causal Generation and Cache Sharing

Kaifeng Gao, Jiaxin Shi, Hanwang Zhang, Chunping Wang, Jun Xiao, Long Chen*

*Corresponding author for this work

Research output: Contribution to journalConference article published in journalpeer-review

Abstract

With the advance of diffusion models, today's video generation has achieved impressive qual-ity. To extend the generation length and facilitate real-world applications, a majority of video dif-fusion models (VDMs) generate videos in an au-toregressive manner, ie., generating subsequent clips conditioned on the last frame(s) of the previ-ous clip. However, existing autoregressive VDMS are highly inefficient and redundant: The model must re-compute all the conditional frames that are overlapped between adjacent clips. This issue is exacerbated when the conditional frames are extended autoregressively to provide the model with long-term context. In such cases, the compu-tational demands increase significantly (ie, with a quadratic complexity w.r.t. the autoregression step). In this paper, we propose Ca2-VDM, an efficient autoregressive VDM with Causal gen-eration and Cache sharing. For causal gener-atlon, it introduces unidirectional feature com-putation, which ensures that the cache of con-ditional frames can be precomputed in previous autoregression steps and reused in every subse-quent step, eliminating redundant computations. For cache sharing, it shares the cache across all denoising steps to avoid the huge cache stor-age cost. Extensive experiments demonstrated that our Ca2-VDM achieves state-of-the-art quan-titative and qualitative video generation results and significantly improves the generation speed. Code is available: https://github.com/Dawn-LX/CausalCache-VDM

Original languageEnglish
Pages (from-to)18550-18565
Number of pages16
JournalProceedings of Machine Learning Research
Volume267
Publication statusPublished - 2025
Event42nd International Conference on Machine Learning, ICML 2025 - Vancouver, Canada
Duration: 13 Jul 202519 Jul 2025

Bibliographical note

Publisher Copyright:
© 2025 by the author(s).

Fingerprint

Dive into the research topics of 'Ca2-VDM: Efficient Autoregressive Video Diffusion Model with Causal Generation and Cache Sharing'. Together they form a unique fingerprint.

Cite this