scholarly journals Automatic Generation of Aerial Orthoimages Using Sentinel-2 Satellite Imagery with a Context-Based Deep Learning Approach

2021 ◽  
Vol 11 (3) ◽  
pp. 1089
Author(s):  
Suhong Yoo ◽  
Jisang Lee ◽  
Junsu Bae ◽  
Hyoseon Jang ◽  
Hong-Gyoo Sohn

Aerial images are an outstanding option for observing terrain with their high-resolution (HR) capability. The high operational cost of aerial images makes it difficult to acquire periodic observation of the region of interest. Satellite imagery is an alternative for the problem, but low-resolution is an obstacle. In this study, we proposed a context-based approach to simulate the 10 m resolution of Sentinel-2 imagery to produce 2.5 and 5.0 m prediction images using the aerial orthoimage acquired over the same period. The proposed model was compared with an enhanced deep super-resolution network (EDSR), which has excellent performance among the existing super-resolution (SR) deep learning algorithms, using the peak signal-to-noise ratio (PSNR), structural similarity index measure (SSIM), and root-mean-squared error (RMSE). Our context-based ResU-Net outperformed the EDSR in all three metrics. The inclusion of the 60 m resolution of Sentinel-2 imagery performs better through fine-tuning. When 60 m images were included, RMSE decreased, and PSNR and SSIM increased. The result also validated that the denser the neural network, the higher the quality. Moreover, the accuracy is much higher when both denser feature dimensions and the 60 m images were used.

2021 ◽  
Vol 38 (5) ◽  
pp. 1361-1368
Author(s):  
Fatih M. Senalp ◽  
Murat Ceylan

The thermal camera systems can be used in all kinds of applications that require the detection of heat change, but thermal imaging systems are highly costly systems. In recent years, developments in the field of deep learning have increased the success by obtaining quality results compared to traditional methods. In this paper, thermal images of neonates (healthy - unhealthy) obtained from a high-resolution thermal camera were used and these images were evaluated as high resolution (ground truth) images. Later, these thermal images were downscaled at 1/2, 1/4, 1/8 ratios, and three different datasets consisting of low-resolution images in different sizes were obtained. In this way, super-resolution applications have been carried out on the deep network model developed based on generative adversarial networks (GAN) by using three different datasets. The successful performance of the results was evaluated with PSNR (peak signal to noise ratio) and SSIM (structural similarity index measure). In addition, healthy - unhealthy classification application was carried out by means of a classifier network developed based on convolutional neural networks (CNN) to evaluate the super-resolution images obtained using different datasets. The obtained results show the importance of combining medical thermal imaging with super-resolution methods.


2020 ◽  
Vol 2 (2) ◽  
pp. 78-98 ◽  
Author(s):  
Sandra Aigner ◽  
Marco Körner

This paper analyzes in detail how different loss functions influence the generalization abilities of a deep learning-based next frame prediction model for traffic scenes. Our prediction model is a convolutional long-short term memory (ConvLSTM) network that generates the pixel values of the next frame after having observed the raw pixel values of a sequence of four past frames. We trained the model with 21 combinations of seven loss terms using the Cityscapes Sequences dataset and an identical hyper-parameter setting. The loss terms range from pixel-error based terms to adversarial terms. To assess the generalization abilities of the resulting models, we generated predictions up to 20 time-steps into the future for four datasets of increasing visual distance to the training dataset—KITTI Tracking, BDD100K, UA-DETRAC, and KIT AIS Vehicles. All predicted frames were evaluated quantitatively with both traditional pixel-based evaluation metrics, that is, mean squared error (MSE), peak signal-to-noise ratio (PSNR), and structural similarity index (SSIM), and recent, more advanced, feature-based evaluation metrics, that is, Fréchet inception distance (FID), and learned perceptual image patch similarity (LPIPS). The results show that solely by choosing a different combination of losses, we can boost the prediction performance on new datasets by up to 55%, and by up to 50% for long-term predictions.


Electronics ◽  
2019 ◽  
Vol 8 (8) ◽  
pp. 892 ◽  
Author(s):  
Wazir Muhammad ◽  
Supavadee Aramvith

Single image super-resolution (SISR) aims to reconstruct a high-resolution (HR) image from a low-resolution (LR) image. In order to address the SISR problem, recently, deep convolutional neural networks (CNNs) have achieved remarkable progress in terms of accuracy and efficiency. In this paper, an innovative technique, namely a multi-scale inception-based super-resolution (SR) using deep learning approach, or MSISRD, was proposed for fast and accurate reconstruction of SISR. The proposed network employs the deconvolution layer to upsample the LR image to the desired HR image. The proposed method is in contrast to existing approaches that use the interpolation techniques to upscale the LR image. Primarily, interpolation techniques are not designed for this purpose, which results in the creation of undesired noise in the model. Moreover, the existing methods mainly focus on the shallow network or stacking multiple layers in the model with the aim of creating a deeper network architecture. The technique based on the aforementioned design creates the vanishing gradients problem during the training and increases the computational cost of the model. Our proposed method does not use any hand-designed pre-processing steps, such as the bicubic interpolation technique. Furthermore, an asymmetric convolution block is employed to reduce the number of parameters, in addition to the inception block adopted from GoogLeNet, to reconstruct the multiscale information. Experimental results demonstrate that the proposed model exhibits an enhanced performance compared to twelve state-of-the-art methods in terms of the average peak signal-to-noise ratio (PSNR), structural similarity index (SSIM) with a reduced number of parameters for the scale factor of 2 × , 4 × , and 8 × .


2020 ◽  
Vol 25 (2) ◽  
pp. 86-97
Author(s):  
Sandy Suryo Prayogo ◽  
Tubagus Maulana Kusuma

DVB merupakan standar transmisi televisi digital yang paling banyak digunakan saat ini. Unsur terpenting dari suatu proses transmisi adalah kualitas gambar dari video yang diterima setelah melalui proses transimisi tersebut. Banyak faktor yang dapat mempengaruhi kualitas dari suatu gambar, salah satunya adalah struktur frame dari video. Pada tulisan ini dilakukan pengujian sensitifitas video MPEG-4 berdasarkan struktur frame pada transmisi DVB-T. Pengujian dilakukan menggunakan simulasi matlab dan simulink. Digunakan juga ffmpeg untuk menyediakan format dan pengaturan video akan disimulasikan. Variabel yang diubah dari video adalah bitrate dan juga group-of-pictures (GOP), sedangkan variabel yang diubah dari transmisi DVB-T adalah signal-to-noise-ratio (SNR) pada kanal AWGN di antara pengirim (Tx) dan penerima (Rx). Hasil yang diperoleh dari percobaan berupa kualitas rata-rata gambar pada video yang diukur menggunakan metode pengukuran structural-similarity-index (SSIM). Dilakukan juga pengukuran terhadap jumlah bit-error-rate BER pada bitstream DVB-T. Percobaan yang dilakukan dapat menunjukkan seberapa besar sensitifitas bitrate dan GOP dari video pada transmisi DVB-T dengan kesimpulan semakin besar bitrate maka akan semakin buruk nilai kualitas gambarnya, dan semakin kecil nilai GOP maka akan semakin baik nilai kualitasnya. Penilitian diharapkan dapat dikembangkan menggunakan deep learning untuk memperoleh frame struktur yang tepat di kondisi-kondisi tertentu dalam proses transmisi televisi digital.


Author(s):  
F. Pineda ◽  
V. Ayma ◽  
C. Beltran

Abstract. High-resolution satellite images have always been in high demand due to the greater detail and precision they offer, as well as the wide scope of the fields in which they could be applied; however, satellites in operation offering very high-resolution (VHR) images has experienced an important increase, but they remain as a smaller proportion against existing lower resolution (HR) satellites. Recent models of convolutional neural networks (CNN) are very suitable for applications with image processing, like resolution enhancement of images; but in order to obtain an acceptable result, it is important, not only to define the kind of CNN architecture but the reference set of images to train the model. Our work proposes an alternative to improve the spatial resolution of HR images obtained by Sentinel-2 satellite by using the VHR images from PeruSat1, a peruvian satellite, which serve as the reference for the super-resolution approach implementation based on a Generative Adversarial Network (GAN) model, as an alternative for obtaining VHR images. The VHR PeruSat-1 image dataset is used for the training process of the network. The results obtained were analyzed considering the Peak Signal to Noise Ratios (PSNR) and the Structural Similarity (SSIM). Finally, some visual outcomes, over a given testing dataset, are presented so the performance of the model could be analyzed as well.


2021 ◽  
Author(s):  
Ryusei Ishii ◽  
Patrice Carbonneau ◽  
Hitoshi Miyamoto

<p>Archival imagery dating back to the mid-twentieth century holds information that pre-dates urban expansion and the worst impacts of climate change.  In this research, we examine deep learning colorisation methods applied to historical aerial images in Japan.  Specifically, we attempt to colorize monochrome images of river basins by applying the method of Neural Style Transfer (NST).    First, we created RGB orthomosaics (1m) for reaches of 3 Japanese rivers, the Kurobe, Ishikari, and Kinu rivers.  From the orthomosaics, we extract 60 thousand image tiles of `100 x100` pixels in order to train the CNN used in NST.  The Image tiles were classified into 6 classes: urban, river, forest, tree, grass, and paddy field.  Second, we use the VGG16 model pre-trained on ImageNet data in a transfer learning approach where we freeze a variable number of layers.  We fine-tuned the training epochs, learning rate, and frozen layers in VGG16 in order to derive the optimal CNN used in NST.  The fine tuning resulted in the F-measure accuracy of 0.961, 0.947, and 0.917 for the freeze layer in 7,11,15, respectively.  Third, we colorize monochrome aerial images by the NST with the retrained model weights.  Here used RGB images for 7 Japanese rivers and the corresponding grayscale versions to evaluate the present NST colorization performance.  The RMSE between the RGB and resultant colorized images showed the best performance with the model parameters of lower content layer (6), shallower freeze layer (7), and larger style/content weighting ratio (1.0 x10⁵).  The NST hyperparameter analysis indicated that the colorized images became rougher when the content layer selected deeper in the VGG model.  This is because the deeper the layer, the more features were extracted from the original image.  It was also confirmed that the Kurobe and Ishikari rivers indicated higher accuracy in colorisation.  It might come from the fact that the training dataset of the fine tuning was extracted from these river images.  Finally, we colorized historical monochrome images of Kurobe river with the best NST parameters, resulting in quality high enough compared with the RGB images.  The result indicated that the fine tuning of the NST model could achieve high performance to proceed further land cover classification in future research work.</p>


2020 ◽  
Vol 12 (18) ◽  
pp. 3015 ◽  
Author(s):  
Mélissande Machefer ◽  
François Lemarchand ◽  
Virginie Bonnefond ◽  
Alasdair Hitchins ◽  
Panagiotis Sidiropoulos

This work introduces a method that combines remote sensing and deep learning into a framework that is tailored for accurate, reliable and efficient counting and sizing of plants in aerial images. The investigated task focuses on two low-density crops, potato and lettuce. This double objective of counting and sizing is achieved through the detection and segmentation of individual plants by fine-tuning an existing deep learning architecture called Mask R-CNN. This paper includes a thorough discussion on the optimal parametrisation to adapt the Mask R-CNN architecture to this novel task. As we examine the correlation of the Mask R-CNN performance to the annotation volume and granularity (coarse or refined) of remotely sensed images of plants, we conclude that transfer learning can be effectively used to reduce the required amount of labelled data. Indeed, a previously trained Mask R-CNN on a low-density crop can improve performances after training on new crops. Once trained for a given crop, the Mask R-CNN solution is shown to outperform a manually-tuned computer vision algorithm. Model performances are assessed using intuitive metrics such as Mean Average Precision (mAP) from Intersection over Union (IoU) of the masks for individual plant segmentation and Multiple Object Tracking Accuracy (MOTA) for detection. The presented model reaches an mAP of 0.418 for potato plants and 0.660 for lettuces for the individual plant segmentation task. In detection, we obtain a MOTA of 0.781 for potato plants and 0.918 for lettuces.


2020 ◽  
Vol 10 (1) ◽  
pp. 375 ◽  
Author(s):  
Zetao Jiang ◽  
Yongsong Huang ◽  
Lirui Hu

The super-resolution generative adversarial network (SRGAN) is a seminal work that is capable of generating realistic textures during single image super-resolution. However, the hallucinated details are often accompanied by unpleasant artifacts. To further enhance the visual quality, we propose a deep learning method for single image super-resolution (SR). Our method directly learns an end-to-end mapping between the low/high-resolution images. The method is based on depthwise separable convolution super-resolution generative adversarial network (DSCSRGAN). A new depthwise separable convolution dense block (DSC Dense Block) was designed for the generator network, which improved the ability to represent and extract image features, while greatly reducing the total amount of parameters. For the discriminator network, the batch normalization (BN) layer was discarded, and the problem of artifacts was reduced. A frequency energy similarity loss function was designed to constrain the generator network to generate better super-resolution images. Experiments on several different datasets showed that the peak signal-to-noise ratio (PSNR) was improved by more than 3 dB, structural similarity index (SSIM) was increased by 16%, and the total parameter was reduced to 42.8% compared with the original model. Combining various objective indicators and subjective visual evaluation, the algorithm was shown to generate richer image details, clearer texture, and lower complexity.


2019 ◽  
Vol 2019 ◽  
pp. 1-25
Author(s):  
Yanzhu Hu ◽  
Jiao Wang ◽  
Xinbo Ai ◽  
Xu Zhuang

In order to realize the multithreshold segmentation of images, an improved segmentation algorithm based on graph cut theory using artificial bee colony is proposed. A new weight function based on gray level and the location of pixels is constructed in this paper to calculate the probability that each pixel belongs to the same region. On this basis, a new cost function is reconstructed that can use both square and nonsquare images. Then the optimal threshold of the image is obtained through searching for the minimum value of the cost function using artificial bee colony algorithm. In this paper, public dataset for segmentation and widely used images were measured separately. Experimental results show that the algorithm proposed in this paper can achieve larger Information Entropy (IE), higher Peak Signal to Noise Ratio (PSNR), higher Structural Similarity Index (SSIM), smaller Root Mean Squared Error (RMSE), and shorter time than other image segmentation algorithms.


Sign in / Sign up

Export Citation Format

Share Document