SEPT: Towards Scalable and Efficient Visual Pre-training

Yiqi Lin, Huabin Zheng, Huaping Zhong, Jinjing Zhu, Weijia Li*, Conghui He, Lin Wang*

*Corresponding author for this work

Research output: Chapter in Book/Conference Proceeding/ReportConference Paper published in a bookpeer-review

Abstract

Recently, the self-supervised pre-training paradigm has shown great potential in leveraging large-scale unlabeled data to improve downstream task performance. However, increasing the scale of unlabeled pre-training data in real-world scenarios requires prohibitive computational costs and faces the challenge of uncurated samples. To address these issues, we build a task-specific self-supervised pre-training framework from a data selection perspective based on a simple hypothesis that pre-training on the unlabeled samples with similar distribution to the target task can bring substantial performance gains. Buttressed by the hypothesis, we propose the first yet novel framework for Scalable and Efficient visual Pre-Training (SEPT) by introducing a retrieval pipeline for data selection. SEPT first leverage a self-supervised pre-trained model to extract the features of the entire unlabeled dataset for retrieval pipeline initialization. Then, for a specific target task, SEPT retrievals the most similar samples from the unlabeled dataset based on feature similarity for each target instance for pre-training. Finally, SEPT pre-trains the target model with the selected unlabeled samples in a self-supervised manner for target data finetuning. By decoupling the scale of pre-training and available upstream data for a target task, SEPT achieves high scalability of the upstream dataset and high efficiency of pre-training, resulting in high model architecture flexibility. Results on various downstream tasks demonstrate that SEPT can achieve competitive or even better performance compared with ImageNet pretraining while reducing the size of training samples by one magnitude without resorting to any extra annotations.

Original languageEnglish
Title of host publicationAAAI-23 Technical Tracks 2
EditorsBrian Williams, Yiling Chen, Jennifer Neville
PublisherAAAI Press
Pages1622-1630
Number of pages9
ISBN (Electronic)9781577358800
DOIs
Publication statusPublished - 27 Jun 2023
Event37th AAAI Conference on Artificial Intelligence, AAAI 2023 - Washington, United States
Duration: 7 Feb 202314 Feb 2023

Publication series

NameProceedings of the 37th AAAI Conference on Artificial Intelligence, AAAI 2023
Volume37

Conference

Conference37th AAAI Conference on Artificial Intelligence, AAAI 2023
Country/TerritoryUnited States
CityWashington
Period7/02/2314/02/23

Bibliographical note

Publisher Copyright:
Copyright © 2023, Association for the Advancement of Artificial Intelligence (www.aaai.org).

Fingerprint

Dive into the research topics of 'SEPT: Towards Scalable and Efficient Visual Pre-training'. Together they form a unique fingerprint.

Cite this