Abstract
Recently, adapting Vision Language Models (VLMs) to zero-shot visual classification by tuning class embedding with a few prompts (Test-time Prompt Tuning, TPT) or replacing class names with generated visual samples (support-set) has shown promising results. However, TPT cannot avoid the semantic gap between modalities while the support-set cannot be tuned. To this end, we draw on each other's strengths and propose a novel framework, namely TEst-time Support-set Tuning for zero-shot Video Classification (TEST-V). It first dilates the support-set with multiple prompts (Multi-prompting Support-set Dilation, MSD) and then erodes the support-set via learnable weights to mine key cues dynamically (Temporal-aware Support-set Erosion, TSE). Specifically, i) MSD expands the support samples for each class based on multiple prompts inquired from LLMs to enrich the diversity of the support-set. ii) TSE tunes the support-set with factorized learnable weights according to the temporal prediction consistency in a self-supervised manner to dig pivotal supporting cues for each class. TEST-V achieves state-of-the-art results across four benchmarks and shows good interpretability.
| Original language | English |
|---|---|
| Title of host publication | Proceedings of the 34th International Joint Conference on Artificial Intelligence, IJCAI 2025 |
| Editors | James Kwok |
| Publisher | International Joint Conferences on Artificial Intelligence |
| Pages | 2143-2151 |
| Number of pages | 9 |
| ISBN (Electronic) | 9781956792065 |
| DOIs | |
| Publication status | Published - 2025 |
| Event | 34th Internationa Joint Conference on Artificial Intelligence, IJCAI 2025 - Montreal, Canada Duration: 16 Aug 2025 → 22 Aug 2025 |
Publication series
| Name | IJCAI International Joint Conference on Artificial Intelligence |
|---|---|
| ISSN (Print) | 1045-0823 |
Conference
| Conference | 34th Internationa Joint Conference on Artificial Intelligence, IJCAI 2025 |
|---|---|
| Country/Territory | Canada |
| City | Montreal |
| Period | 16/08/25 → 22/08/25 |
Bibliographical note
Publisher Copyright:© 2025 International Joint Conferences on Artificial Intelligence. All rights reserved.
Fingerprint
Dive into the research topics of 'TEST-V: TEst-time Support-set Tuning for Zero-shot Video Classification'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver