scholarly journals Bidirectional Temporal-Recurrent Propagation Networks for Video Super-Resolution

Electronics ◽  
2020 ◽  
Vol 9 (12) ◽  
pp. 2085
Author(s):  
Lei Han ◽  
Cien Fan ◽  
Ye Yang ◽  
Lian Zou

Recently, convolutional neural networks have made a remarkable performance for video super-resolution. However, how to exploit the spatial and temporal information of video efficiently and effectively remains challenging. In this work, we design a bidirectional temporal-recurrent propagation unit. The bidirectional temporal-recurrent propagation unit makes it possible to flow temporal information in an RNN-like manner from frame to frame, which avoids complex motion estimation modeling and motion compensation. To better fuse the information of the two temporal-recurrent propagation units, we use channel attention mechanisms. Additionally, we recommend a progressive up-sampling method instead of one-step up-sampling. We find that progressive up-sampling gets better experimental results than one-stage up-sampling. Extensive experiments show that our algorithm outperforms several recent state-of-the-art video super-resolution (VSR) methods with a smaller model size.

2020 ◽  
Vol 34 (07) ◽  
pp. 12468-12475
Author(s):  
Jingwei Xin ◽  
Nannan Wang ◽  
Jie Li ◽  
Xinbo Gao ◽  
Zhifeng Li

Video super-resolution (VSR) methods have recently achieved a remarkable success due to the development of deep convolutional neural networks (CNN). Current state-of-the-art CNN methods usually treat the VSR problem as a large number of separate multi-frame super-resolution tasks, at which a batch of low resolution (LR) frames is utilized to generate a single high resolution (HR) frame, and running a slide window to select LR frames over the entire video would obtain a series of HR frames. However, duo to the complex temporal dependency between frames, with the number of LR input frames increase, the performance of the reconstructed HR frames become worse. The reason is in that these methods lack the ability to model complex temporal dependencies and hard to give an accurate motion estimation and compensation for VSR process. Which makes the performance degrade drastically when the motion in frames is complex. In this paper, we propose a Motion-Adaptive Feedback Cell (MAFC), a simple but effective block, which can efficiently capture the motion compensation and feed it back to the network in an adaptive way. Our approach efficiently utilizes the information of the inter-frame motion, the dependence of the network on motion estimation and compensation method can be avoid. In addition, benefiting from the excellent nature of MAFC, the network can achieve better performance in the case of extremely complex motion scenarios. Extensive evaluations and comparisons validate the strengths of our approach, and the experimental results demonstrated that the proposed framework is outperform the state-of-the-art methods.


Mathematics ◽  
2021 ◽  
Vol 9 (22) ◽  
pp. 2873
Author(s):  
Anusha Khan ◽  
Allah Bux Sargano ◽  
Zulfiqar Habib

Video super-resolution (VSR) aims at generating high-resolution (HR) video frames with plausible and temporally consistent details using their low-resolution (LR) counterparts, and neighboring frames. The key challenge for VSR lies in the effective exploitation of intra-frame spatial relation and temporal dependency between consecutive frames. Many existing techniques utilize spatial and temporal information separately and compensate motion via alignment. These methods cannot fully exploit the spatio-temporal information that significantly affects the quality of resultant HR videos. In this work, a novel deformable spatio-temporal convolutional residual network (DSTnet) is proposed to overcome the issues of separate motion estimation and compensation methods for VSR. The proposed framework consists of 3D convolutional residual blocks decomposed into spatial and temporal (2+1) D streams. This decomposition can simultaneously utilize input video’s spatial and temporal features without a separate motion estimation and compensation module. Furthermore, the deformable convolution layers have been used in the proposed model that enhances its motion-awareness capability. Our contribution is twofold; firstly, the proposed approach can overcome the challenges in modeling complex motions by efficiently using spatio-temporal information. Secondly, the proposed model has fewer parameters to learn than state-of-the-art methods, making it a computationally lean and efficient framework for VSR. Experiments are conducted on a benchmark Vid4 dataset to evaluate the efficacy of the proposed approach. The results demonstrate that the proposed approach achieves superior quantitative and qualitative performance compared to the state-of-the-art methods.


Author(s):  
Jorge F. Lazo ◽  
Aldo Marzullo ◽  
Sara Moccia ◽  
Michele Catellani ◽  
Benoit Rosa ◽  
...  

Abstract Purpose Ureteroscopy is an efficient endoscopic minimally invasive technique for the diagnosis and treatment of upper tract urothelial carcinoma. During ureteroscopy, the automatic segmentation of the hollow lumen is of primary importance, since it indicates the path that the endoscope should follow. In order to obtain an accurate segmentation of the hollow lumen, this paper presents an automatic method based on convolutional neural networks (CNNs). Methods The proposed method is based on an ensemble of 4 parallel CNNs to simultaneously process single and multi-frame information. Of these, two architectures are taken as core-models, namely U-Net based in residual blocks ($$m_1$$ m 1 ) and Mask-RCNN ($$m_2$$ m 2 ), which are fed with single still-frames I(t). The other two models ($$M_1$$ M 1 , $$M_2$$ M 2 ) are modifications of the former ones consisting on the addition of a stage which makes use of 3D convolutions to process temporal information. $$M_1$$ M 1 , $$M_2$$ M 2 are fed with triplets of frames ($$I(t-1)$$ I ( t - 1 ) , I(t), $$I(t+1)$$ I ( t + 1 ) ) to produce the segmentation for I(t). Results The proposed method was evaluated using a custom dataset of 11 videos (2673 frames) which were collected and manually annotated from 6 patients. We obtain a Dice similarity coefficient of 0.80, outperforming previous state-of-the-art methods. Conclusion The obtained results show that spatial-temporal information can be effectively exploited by the ensemble model to improve hollow lumen segmentation in ureteroscopic images. The method is effective also in the presence of poor visibility, occasional bleeding, or specular reflections.


2020 ◽  
Vol 12 (15) ◽  
pp. 2366
Author(s):  
Nicolas Latte ◽  
Philippe Lejeune

Sentinel-2 (S2) imagery is used in many research areas and for diverse applications. Its spectral resolution and quality are high but its spatial resolutions, of at most 10 m, is not sufficient for fine scale analysis. A novel method was thus proposed to super-resolve S2 imagery to 2.5 m. For a given S2 tile, the 10 S2 bands (four at 10 m and six at 20 m) were fused with additional images acquired at higher spatial resolution by the PlanetScope (PS) constellation. The radiometric inconsistencies between PS microsatellites were normalized. Radiometric normalization and super-resolution were achieved simultaneously using state-of–the-art super-resolution residual convolutional neural networks adapted to the particularities of S2 and PS imageries (including masks of clouds and shadows). The method is described in detail, from image selection and downloading to neural network architecture, training, and prediction. The quality was thoroughly assessed visually (photointerpretation) and quantitatively, confirming that the proposed method is highly spatially and spectrally accurate. The method is also robust and can be applied to S2 images acquired worldwide at any date.


Sensors ◽  
2020 ◽  
Vol 20 (20) ◽  
pp. 5850
Author(s):  
Pablo Blanco-Medina ◽  
Eduardo Fidalgo ◽  
Enrique Alegre ◽  
Rocío Alaiz-Rodríguez ◽  
Francisco Jáñez-Martino ◽  
...  

Retrieving text embedded within images is a challenging task in real-world settings. Multiple problems such as low-resolution and the orientation of the text can hinder the extraction of information. These problems are common in environments such as Tor Darknet and Child Sexual Abuse images, where text extraction is crucial in the prevention of illegal activities. In this work, we evaluate eight text recognizers and, to increase the performance of text transcription, we combine these recognizers with rectification networks and super-resolution algorithms. We test our approach on four state-of-the-art and two custom datasets (TOICO-1K and Child Sexual Abuse (CSA)-text, based on text retrieved from Tor Darknet and Child Sexual Exploitation Material, respectively). We obtained a 0.3170 score of correctly recognized words in the TOICO-1K dataset when we combined Deep Convolutional Neural Networks (CNN) and rectification-based recognizers. For the CSA-text dataset, applying resolution enhancements achieved a final score of 0.6960. The highest performance increase was achieved on the ICDAR 2015 dataset, with an improvement of 4.83% when combining the MORAN recognizer and the Residual Dense resolution approach. We conclude that rectification outperforms super-resolution when applied separately, while their combination achieves the best average improvements in the chosen datasets.


2020 ◽  
Vol 12 (14) ◽  
pp. 2207 ◽  
Author(s):  
Francesco Salvetti ◽  
Vittorio Mazzia ◽  
Aleem Khaliq ◽  
Marcello Chiaberge

Convolutional Neural Networks (CNNs) consistently proved state-of-the-art results in image Super-resolution (SR), representing an exceptional opportunity for the remote sensing field to extract further information and knowledge from captured data. However, most of the works published in the literature focused on the Single-image Super-resolution problem so far. At present, satellite-based remote sensing platforms offer huge data availability with high temporal resolution and low spatial resolution. In this context, the presented research proposes a novel residual attention model (RAMS) that efficiently tackles the Multi-image Super-resolution task, simultaneously exploiting spatial and temporal correlations to combine multiple images. We introduce the mechanism of visual feature attention with 3D convolutions in order to obtain an aware data fusion and information extraction of the multiple low-resolution images, transcending limitations of the local region of convolutional operations. Moreover, having multiple inputs with the same scene, our representation learning network makes extensive use of nestled residual connections to let flow redundant low-frequency signals and focus the computation on more important high-frequency components. Extensive experimentation and evaluations against other available solutions, either for Single or Multi-image Super-resolution, demonstrated that the proposed deep learning-based solution can be considered state-of-the-art for Multi-image Super-resolution for remote sensing applications.


Author(s):  
Feng Li ◽  
Runmin Cong ◽  
Huihui Bai ◽  
Yifan He

Recently, Convolutional Neural Networks (CNN) based image super-resolution (SR) have shown significant success in the literature. However, these methods are implemented as single-path stream to enrich feature maps from the input for the final prediction, which fail to fully incorporate former low-level features into later high-level features. In this paper, to tackle this problem, we propose a deep interleaved network (DIN) to learn how information at different states should be combined for image SR where shallow information guides deep representative features prediction. Our DIN follows a multi-branch pattern allowing multiple interconnected branches to interleave and fuse at different states. Besides, the asymmetric co-attention (AsyCA) is proposed and attacked to the interleaved nodes to adaptively emphasize informative features from different states and improve the discriminative ability of networks. Extensive experiments demonstrate the superiority of our proposed DIN in comparison with the state-of-the-art SR methods.


2021 ◽  
Vol 40 (5) ◽  
pp. 1-18
Author(s):  
Hyeongseok Son ◽  
Junyong Lee ◽  
Jonghyeop Lee ◽  
Sunghyun Cho ◽  
Seungyong Lee

For the success of video deblurring, it is essential to utilize information from neighboring frames. Most state-of-the-art video deblurring methods adopt motion compensation between video frames to aggregate information from multiple frames that can help deblur a target frame. However, the motion compensation methods adopted by previous deblurring methods are not blur-invariant, and consequently, their accuracy is limited for blurry frames with different blur amounts. To alleviate this problem, we propose two novel approaches to deblur videos by effectively aggregating information from multiple video frames. First, we present blur-invariant motion estimation learning to improve motion estimation accuracy between blurry frames. Second, for motion compensation, instead of aligning frames by warping with estimated motions, we use a pixel volume that contains candidate sharp pixels to resolve motion estimation errors. We combine these two processes to propose an effective recurrent video deblurring network that fully exploits deblurred previous frames. Experiments show that our method achieves the state-of-the-art performance both quantitatively and qualitatively compared to recent methods that use deep learning.


Author(s):  
M. U. Müller ◽  
N. Ekhtiari ◽  
R. M. Almeida ◽  
C. Rieke

Abstract. Super-resolution aims at increasing image resolution by algorithmic means and has progressed over the recent years due to advances in the fields of computer vision and deep learning. Convolutional Neural Networks based on a variety of architectures have been applied to the problem, e.g. autoencoders and residual networks. While most research focuses on the processing of photographs consisting only of RGB color channels, little work can be found concentrating on multi-band, analytic satellite imagery. Satellite images often include a panchromatic band, which has higher spatial resolution but lower spectral resolution than the other bands. In the field of remote sensing, there is a long tradition of applying pan-sharpening to satellite images, i.e. bringing the multispectral bands to the higher spatial resolution by merging them with the panchromatic band. To our knowledge there are so far no approaches to super-resolution which take advantage of the panchromatic band. In this paper we propose a method to train state-of-the-art CNNs using pairs of lower-resolution multispectral and high-resolution pan-sharpened image tiles in order to create super-resolved analytic images. The derived quality metrics show that the method improves information content of the processed images. We compare the results created by four CNN architectures, with RedNet30 performing best.


Entropy ◽  
2021 ◽  
Vol 23 (11) ◽  
pp. 1398
Author(s):  
Taian Guo ◽  
Tao Dai ◽  
Ling Liu ◽  
Zexuan Zhu ◽  
Shu-Tao Xia

Convolutional Neural Networks (CNNs) have been widely used in video super-resolution (VSR). Most existing VSR methods focus on how to utilize the information of multiple frames, while neglecting the feature correlations of the intermediate features, thus limiting the feature expression of the models. To address this problem, we propose a novel SAA network, that is, Scale-and-Attention-Aware Networks, to apply different attention to different temporal-length streams, while further exploring both spatial and channel attention on separate streams with a newly proposed Criss-Cross Channel Attention Module (C3AM). Experiments on public VSR datasets demonstrate the superiority of our method over other state-of-the-art methods in terms of both quantitative and qualitative metrics.


Sign in / Sign up

Export Citation Format

Share Document