scholarly journals Efficient Video Frame Interpolation Using Generative Adversarial Networks

2020 ◽  
Vol 10 (18) ◽  
pp. 6245
Author(s):  
Quang Nhat Tran ◽  
Shih-Hsuan Yang

Frame interpolation, which generates an intermediate frame given adjacent ones, finds various applications such as frame rate up-conversion, video compression, and video streaming. Instead of using complex network models and additional data involved in the state-of-the-art frame interpolation methods, this paper proposes an approach based on an end-to-end generative adversarial network. A combined loss function is employed, which jointly considers the adversarial loss (difference between data models), reconstruction loss, and motion blur degradation. The objective image quality metric values reach a PSNR of 29.22 dB and SSIM of 0.835 on the UCF101 dataset, similar to those of the state-of-the-art approach. The good visual quality is notably achieved by approximately one-fifth computational time, which entails possible real-time frame rate up-conversion. The interpolated output can be further improved by a GAN based refinement network that better maintains motion and color by image-to-image translation.

2020 ◽  
Vol 34 (07) ◽  
pp. 10607-10614 ◽  
Author(s):  
Xianhang Cheng ◽  
Zhenzhong Chen

Learning to synthesize non-existing frames from the original consecutive video frames is a challenging task. Recent kernel-based interpolation methods predict pixels with a single convolution process to replace the dependency of optical flow. However, when scene motion is larger than the pre-defined kernel size, these methods yield poor results even though they take thousands of neighboring pixels into account. To solve this problem in this paper, we propose to use deformable separable convolution (DSepConv) to adaptively estimate kernels, offsets and masks to allow the network to obtain information with much fewer but more relevant pixels. In addition, we show that the kernel-based methods and conventional flow-based methods are specific instances of the proposed DSepConv. Experimental results demonstrate that our method significantly outperforms the other kernel-based interpolation methods and shows strong performance on par or even better than the state-of-the-art algorithms both qualitatively and quantitatively.


Author(s):  
Xiang Kong ◽  
Qizhe Xie ◽  
Zihang Dai ◽  
Eduard Hovy

Mixture of Softmaxes (MoS) has been shown to be effective at addressing the expressiveness limitation of Softmax-based models. Despite the known advantage, MoS is practically sealed by its large consumption of memory and computational time due to the need of computing multiple Softmaxes. In this work, we set out to unleash the power of MoS in practical applications by investigating improved word coding schemes, which could effectively reduce the vocabulary size and hence relieve the memory and computation burden. We show both BPE and our proposed Hybrid-LightRNN lead to improved encoding mechanisms that can halve the time and memory consumption of MoS without performance losses. With MoS, we achieve an improvement of 1.5 BLEU scores on IWSLT 2014 German-to-English corpus and an improvement of 0.76 CIDEr score on image captioning. Moreover, on the larger WMT 2014 machine translation dataset, our MoSboosted Transformer yields 29.6 BLEU score for English-toGerman and 42.1 BLEU score for English-to-French, outperforming the single-Softmax Transformer by 0.9 and 0.4 BLEU scores respectively and achieving the state-of-the-art result on WMT 2014 English-to-German task.


Symmetry ◽  
2019 ◽  
Vol 11 (10) ◽  
pp. 1251 ◽  
Author(s):  
Ahn ◽  
Jeong ◽  
Kim ◽  
Kwon ◽  
Yoo

Recently, video frame interpolation research developed with a convolutional neural network has shown remarkable results. However, these methods demand huge amounts of memory and run time for high-resolution videos, and are unable to process a 4K frame in a single pass. In this paper, we propose a fast 4K video frame interpolation method, based upon a multi-scale optical flow reconstruction scheme. The proposed method predicts low resolution bi-directional optical flow, and reconstructs it into high resolution. We also proposed consistency and multi-scale smoothness loss to enhance the quality of the predicted optical flow. Furthermore, we use adversarial loss to make the interpolated frame more seamless and natural. We demonstrated that the proposed method outperforms the existing state-of-the-art methods in quantitative evaluation, while it runs up to 4.39× faster than those methods for 4K videos.


Sign in / Sign up

Export Citation Format

Share Document