Abstract
Deep learning applications are rapidly gaining traction both in industry and scientific computing. Unsurprisingly, there has been significant interest in adopting deep learning at a very large scale on supercomputing infrastructures for a variety of scientific applications. A key issue in this context is how to find an appropriate model architecture that is suitable to solve the problem. We call this the neural architecture search (NAS) problem. Over time, many automated approaches have been proposed that can explore a large number of candidate models. However, this remains a time-consuming and resource expensive process: the candidates are often trained from scratch for a small number of epochs in order to obtain a set of top-K best performers, which are fully trained in a second phase. To address this problem, we propose a novel method that leverages checkpoints of previously discovered candidates to accelerate NAS. Based on the observation that the candidates feature high structural similarity, we propose the idea that new candidates need not be trained starting from random weights, but rather from the weights of similar layers of previously evaluated candidates. Thanks to this approach, the convergence of the candidate models can be significantly accelerated and produces candidates that are statistically better based on the objective metrics. Furthermore, once the top-K models are identified, our approach provides a significant speed-up (1.4~1.5× on the average) for the full training.
| Original language | English |
|---|---|
| Title of host publication | Proceedings - 2021 IEEE International Conference on Cluster Computing, Cluster 2021 |
| Publisher | Institute of Electrical and Electronics Engineers Inc. |
| Pages | 82-93 |
| Number of pages | 12 |
| ISBN (Electronic) | 9781728196664 |
| Publication status | Published - 2021 |
| Externally published | Yes |
| Event | 2021 IEEE International Conference on Cluster Computing, Cluster 2021 - Virtual, Portland, United States Duration: 7 Sept 2021 → 10 Sept 2021 |
Publication series
| Name | Proceedings - IEEE International Conference on Cluster Computing, ICCC |
|---|---|
| Volume | 2021-September |
| ISSN (Print) | 1552-5244 |
Conference
| Conference | 2021 IEEE International Conference on Cluster Computing, Cluster 2021 |
|---|---|
| Country/Territory | United States |
| City | Virtual, Portland |
| Period | 7/09/21 → 10/09/21 |
Bibliographical note
Publisher Copyright:©2021 IEEE.
Keywords
- Checkpointing
- Deep Learning
- Neural Architecture Search