Video super-resolution (VSR) means generating a high-resolution (HR) video from its low-resolution (LR) counterpart. Convolutional neural network (CNN) models have been recently shown to be promising for VSR. However, previous models assume knowledge of the degradation operation from HR to the LR version of the test video, and are based on supervised learning, i.e., training on LR/HR pairs synthesized by artificial degradation operations. When the degradation operation is not available, the performance is limited. Moreover, previous approaches generate HR frames independently, leading to poor temporal consistency in the form of flickering artifacts. We propose VistGAN, an unsupervised video super-resolution with temporal consistency using Generative Adversarial Network architecture without assuming any degradation operation. VistGAN is an encoder-decoder architecture. The encoder degrades the HR training video to the LR version in an unsupervised way using GAN. Using our designed metric learning as the discriminator, the features of the LR version match well with the test video. To achieve temporal consistency in the HR domain, the decoder seeks to recover the HR training sequence from the LR frames using a frame-recurrent scheme based on high-resolution optical flow, the current test frame, and previously generated super-resolved frames. After the training period, the test video is then super-resolved only using the decoder. We conduct extensive experiments on benchmark datasets. As compared with state-of-the-art schemes, VistGAN achieves much better performance in terms of temporal consistency (cutting the warping error about 12.6%) and PSNR (improving up to 1.02dB).
| Date of Award | 2019 |
|---|
| Original language | English |
|---|
| Awarding Institution | - The Hong Kong University of Science and Technology
|
|---|
Unsupervised video super-resolution with temporal consistency using GAN
WEN, S. (Author). 2019
Student thesis: Master's thesis