scholarly journals The Importance of Loss Functions for Increasing the Generalization Abilities of a Deep Learning-Based Next Frame Prediction Model for Traffic Scenes

2020 ◽  
Vol 2 (2) ◽  
pp. 78-98 ◽  
Author(s):  
Sandra Aigner ◽  
Marco Körner

This paper analyzes in detail how different loss functions influence the generalization abilities of a deep learning-based next frame prediction model for traffic scenes. Our prediction model is a convolutional long-short term memory (ConvLSTM) network that generates the pixel values of the next frame after having observed the raw pixel values of a sequence of four past frames. We trained the model with 21 combinations of seven loss terms using the Cityscapes Sequences dataset and an identical hyper-parameter setting. The loss terms range from pixel-error based terms to adversarial terms. To assess the generalization abilities of the resulting models, we generated predictions up to 20 time-steps into the future for four datasets of increasing visual distance to the training dataset—KITTI Tracking, BDD100K, UA-DETRAC, and KIT AIS Vehicles. All predicted frames were evaluated quantitatively with both traditional pixel-based evaluation metrics, that is, mean squared error (MSE), peak signal-to-noise ratio (PSNR), and structural similarity index (SSIM), and recent, more advanced, feature-based evaluation metrics, that is, Fréchet inception distance (FID), and learned perceptual image patch similarity (LPIPS). The results show that solely by choosing a different combination of losses, we can boost the prediction performance on new datasets by up to 55%, and by up to 50% for long-term predictions.

2020 ◽  
Vol 25 (2) ◽  
pp. 86-97
Author(s):  
Sandy Suryo Prayogo ◽  
Tubagus Maulana Kusuma

DVB merupakan standar transmisi televisi digital yang paling banyak digunakan saat ini. Unsur terpenting dari suatu proses transmisi adalah kualitas gambar dari video yang diterima setelah melalui proses transimisi tersebut. Banyak faktor yang dapat mempengaruhi kualitas dari suatu gambar, salah satunya adalah struktur frame dari video. Pada tulisan ini dilakukan pengujian sensitifitas video MPEG-4 berdasarkan struktur frame pada transmisi DVB-T. Pengujian dilakukan menggunakan simulasi matlab dan simulink. Digunakan juga ffmpeg untuk menyediakan format dan pengaturan video akan disimulasikan. Variabel yang diubah dari video adalah bitrate dan juga group-of-pictures (GOP), sedangkan variabel yang diubah dari transmisi DVB-T adalah signal-to-noise-ratio (SNR) pada kanal AWGN di antara pengirim (Tx) dan penerima (Rx). Hasil yang diperoleh dari percobaan berupa kualitas rata-rata gambar pada video yang diukur menggunakan metode pengukuran structural-similarity-index (SSIM). Dilakukan juga pengukuran terhadap jumlah bit-error-rate BER pada bitstream DVB-T. Percobaan yang dilakukan dapat menunjukkan seberapa besar sensitifitas bitrate dan GOP dari video pada transmisi DVB-T dengan kesimpulan semakin besar bitrate maka akan semakin buruk nilai kualitas gambarnya, dan semakin kecil nilai GOP maka akan semakin baik nilai kualitasnya. Penilitian diharapkan dapat dikembangkan menggunakan deep learning untuk memperoleh frame struktur yang tepat di kondisi-kondisi tertentu dalam proses transmisi televisi digital.


2019 ◽  
Vol 2019 ◽  
pp. 1-25
Author(s):  
Yanzhu Hu ◽  
Jiao Wang ◽  
Xinbo Ai ◽  
Xu Zhuang

In order to realize the multithreshold segmentation of images, an improved segmentation algorithm based on graph cut theory using artificial bee colony is proposed. A new weight function based on gray level and the location of pixels is constructed in this paper to calculate the probability that each pixel belongs to the same region. On this basis, a new cost function is reconstructed that can use both square and nonsquare images. Then the optimal threshold of the image is obtained through searching for the minimum value of the cost function using artificial bee colony algorithm. In this paper, public dataset for segmentation and widely used images were measured separately. Experimental results show that the algorithm proposed in this paper can achieve larger Information Entropy (IE), higher Peak Signal to Noise Ratio (PSNR), higher Structural Similarity Index (SSIM), smaller Root Mean Squared Error (RMSE), and shorter time than other image segmentation algorithms.


Author(s):  
S. Bash ◽  
B. Johnson ◽  
W. Gibbs ◽  
T. Zhang ◽  
A. Shankaranarayanan ◽  
...  

Abstract Objective This prospective multicenter multireader study evaluated the performance of 40% scan-time reduced spinal magnetic resonance imaging (MRI) reconstructed with deep learning (DL). Methods A total of 61 patients underwent standard of care (SOC) and accelerated (FAST) spine MRI. DL was used to enhance the accelerated set (FAST-DL). Three neuroradiologists were presented with paired side-by-side datasets (666 series). Datasets were blinded and randomized in sequence and left-right display order. Image features were preference rated. Structural similarity index (SSIM) and per pixel L1 was assessed for the image sets pre and post DL-enhancement as a quantitative assessment of image integrity impact. Results FAST-DL was qualitatively better than SOC for perceived signal-to-noise ratio (SNR) and artifacts and equivalent for other features. Quantitative SSIM was high, supporting the absence of image corruption by DL processing. Conclusion DL enables 40% spine MRI scan time reduction while maintaining diagnostic integrity and image quality with perceived benefits in SNR and artifact reduction, suggesting potential for clinical practice utility.


MR imaging method is widely used for diagnosis applications. The echo signal received from the MR scanning machine is used to generate the image. The data acquisition and reconstruction are the important operations. In this paper the kspace is compressively sampled using Radial Sampling pattern for acquiring the k-space data and Particle Swarm Optimization (PSO) with Total Variation (TV) is used as the reconstruction algorithm for the faithful reconstruction of MR image. The experiments are conducted on MR images of Brain, Head Angiogram and Shoulder images. Performance of the proposed method of reconstruction is analyzed for different sampling kspace scanning percentages. The reconstruction results are compared with the standard sampling pattern used for compressive sampling prove the novelty of the proposed method. The results are verified in terms of Peak Signal to Noise Ratio (PSNR), Mean Squared Error (MSE) and Structural Similarity index (SSIM).


2020 ◽  
Vol 20 (02) ◽  
pp. 2050008
Author(s):  
S. P. Raja

This paper presents a complete analysis of wavelet-based image compression encoding techniques. The techniques involved in this paper are embedded zerotree wavelet (EZW), set partitioning in hierarchical trees (SPIHT), wavelet difference reduction (WDR), adaptively scanned wavelet difference reduction (ASWDR), set partitioned embedded block coder (SPECK), compression with reversible embedded wavelet (CREW) and spatial orientation tree wavelet (STW). Experiments are done by varying level of the decomposition, bits per pixel and compression ratio. The evaluation is done by taking parameters like peak signal to noise ratio (PSNR), mean square error (MSE), image quality index (IQI) and structural similarity index (SSIM), average difference (AD), normalized cross-correlation (NK), structural content (SC), maximum difference (MD), Laplacian mean squared error (LMSE) and normalized absolute error (NAE).


Author(s):  
Kaviya K ◽  
Mridula Bala ◽  
Swathy N P ◽  
Chittam Jeevana Jyothi ◽  
S.Ewins Pon Pushpa

Today, the digital and social media platforms are extremely trending, leading a demand to transmit knowledge very firmly. The information that is exchanged daily becomes ‘a victim’ to hackers. To beat this downside, one of the effective solutions is Steganography or Cryptography. In this paper, the video Steganography and cryptography thoughts are employed, where a key text is hidden behind a ‘certain frame’ of the video using Shi-Tomasi corner point detection and Least Significant Bit (LSB) algorithmic rule. Shi-Tomasi algorithmic rule is employed to observe, the corner points of the frame. In the proposed work, a ‘certain frame’ with large number of corner points is chosen from the video. Then, the secret text is embedded within the detected corner points using LSB algorithmic rule and transmitted. At the receiver end, decryption process is employed, in the reverser order of encryption to retrieve the secret data. As a technical contribution, the average variation of Mean Squared Error, Peak Signal to Noise Ratio, Structural Similarity Index are analysed for original and embedded frames and found to be 0.002, 0.016 and 0.0018 respectively.


2021 ◽  
Vol 11 (3) ◽  
pp. 1089
Author(s):  
Suhong Yoo ◽  
Jisang Lee ◽  
Junsu Bae ◽  
Hyoseon Jang ◽  
Hong-Gyoo Sohn

Aerial images are an outstanding option for observing terrain with their high-resolution (HR) capability. The high operational cost of aerial images makes it difficult to acquire periodic observation of the region of interest. Satellite imagery is an alternative for the problem, but low-resolution is an obstacle. In this study, we proposed a context-based approach to simulate the 10 m resolution of Sentinel-2 imagery to produce 2.5 and 5.0 m prediction images using the aerial orthoimage acquired over the same period. The proposed model was compared with an enhanced deep super-resolution network (EDSR), which has excellent performance among the existing super-resolution (SR) deep learning algorithms, using the peak signal-to-noise ratio (PSNR), structural similarity index measure (SSIM), and root-mean-squared error (RMSE). Our context-based ResU-Net outperformed the EDSR in all three metrics. The inclusion of the 60 m resolution of Sentinel-2 imagery performs better through fine-tuning. When 60 m images were included, RMSE decreased, and PSNR and SSIM increased. The result also validated that the denser the neural network, the higher the quality. Moreover, the accuracy is much higher when both denser feature dimensions and the 60 m images were used.


Sensors ◽  
2020 ◽  
Vol 20 (21) ◽  
pp. 6311
Author(s):  
Eunjae Ha ◽  
Joongchol Shin ◽  
Joonki Paik

In a hazy environment, visibility is reduced and objects are difficult to identify. For this reason, many dehazing techniques have been proposed to remove the haze. Especially, in the case of the atmospheric scattering model estimation-based method, there is a problem of distortion when inaccurate models are estimated. We present a novel residual-based dehazing network model to overcome the performance limitation in an atmospheric scattering model-based method. More specifically, the proposed model adopted the gate fusion network that generates the dehazed results using a residual operator. To further reduce the divergence between the clean and dehazed images, the proposed discriminator distinguishes dehazed results and clean images, and then reduces the statistical difference via adversarial learning. To verify each element of the proposed model, we hierarchically performed the haze removal process in an ablation study. Experimental results show that the proposed method outperformed state-of-the-art approaches in terms of peak signal-to-noise ratio (PSNR), structural similarity index measure (SSIM), international commission on illumination cie delta e 2000 (CIEDE2000), and mean squared error (MSE). It also gives subjectively high-quality images without color distortion or undesired artifacts for both synthetic and real-world hazy images.


2018 ◽  
Vol 19 (2) ◽  
pp. 68-79 ◽  
Author(s):  
Khan Bahadar Khan ◽  
Muhammad Shahid ◽  
Hayat Ullah ◽  
Eid Rehman ◽  
Muhammad Mohsin Khan

A 2-D Adaptive Trimmed Mean Autoregressive (ATMAR) model has been proposed for denoising of medical images corrupted with poisson noise. Unfiltered images are divided into smaller chunks and ATMAR model is applied on each chunk separately. In this paper, two 5x5 windows with 40% overlapping are used to predict the center pixel value of the central row. The AR coefficients are updated by sliding both windows forward with 60% shift. The same process is repeated to scan the entire image for prediction of a new denoised image. The Adaptive Trimmed Mean Filter (ATMF) eradicates the lowest and highest variations in pixel values of the ATMAR model denoised image and also average out the remaining neighborhood pixel values. Finally, power-law transformation is applied on the resultant image of the ATMAR model for contrast stretching. Image quality is judged in terms of correlation, Mean Squared Error (MSE), Structural Similarity Index Measure (SSIM) and Peak Signal to Noise Ratio (PSNR) of the image with latest denoising techniques. The proposed technique showed an efficient way to scale down poisson noise in scintigraphic images on a pixel-by-pixel basis.


2021 ◽  
Author(s):  
Mayank Kumar Singh ◽  
Indu Saini ◽  
Neetu Sood

Abstract Ultrasound in diagnostic imaging is well known for its safety and accessibility. But its efficiency for diagnosis is always limited by the presence of noise. So, in this study, a Log-Exponential shrinkage technique is presented for denoising of ultrasound images. A Combinational filter was designed for the removal of additive noise without losing any details. The speckle noise after homomorphic transformation follows Gaussian distribution and the conventional median estimator has very low accuracy for Gaussian distribution. The scale parameter calculated from the sub-band coefficients after homomorphic transformation was utilized to design the estimator. For shrinkage of wavelet coefficients, a multi-scale thresholding function was designed, with better flexibility. The proposed technique was tested for both medical and standard images. A significant improvement was observed in the estimation of speckle noise variance. For quantitative evaluation of the proposed technique with existing denoising methods, Mean Squared Error (MSE), Structural Similarity Index (SSIM), and Peak Signal to Noise Ratio (PSNR) were used. At the highest noise variance, the minimum improvement achieved by the proposed denoising technique in PSNR, SSIM, and MSE was 10.65%, 23.21%, and 30.46% respectively.


Sign in / Sign up

Export Citation Format

Share Document