scholarly journals Contrast invariant tuning in human perception of image content

2019 ◽  
Author(s):  
Ingo Fruend ◽  
Jaykishan Patel ◽  
Elee D. Stalker

AbstractHigher levels of visual processing are progressively more invariant to low-level visual factors such as contrast. Although this invariance trend has been well documented for simple stimuli like gratings and lines, it is difficult to characterize such invariances in images with naturalistic complexity. Here, we use a generative image model based on a hierarchy of learned visual features—a Generative Adversarial Network—to constrain image manipulations to remain within the vicinity of the manifold of natural images. This allows us to quantitatively characterize visual discrimination behaviour for naturalistically complex, non-linear image manipulations. We find that human tuning to such manipulations has a factorial structure. The first factor governs image contrast with discrimination thresholds following a power law with an exponent between 0.5 and 0.6, similar to contrast discrimination performance for simpler stimuli. A second factor governs image content with approximately constant discrimination thresholds throughout the range of images studied. These results support the idea that human perception factors out image contrast relatively early on, allowing later stages of processing to extract higher level image features in a stable and robust way.

Electronics ◽  
2021 ◽  
Vol 10 (11) ◽  
pp. 1349
Author(s):  
Stefan Lattner ◽  
Javier Nistal

Lossy audio codecs compress (and decompress) digital audio streams by removing information that tends to be inaudible in human perception. Under high compression rates, such codecs may introduce a variety of impairments in the audio signal. Many works have tackled the problem of audio enhancement and compression artifact removal using deep-learning techniques. However, only a few works tackle the restoration of heavily compressed audio signals in the musical domain. In such a scenario, there is no unique solution for the restoration of the original signal. Therefore, in this study, we test a stochastic generator of a Generative Adversarial Network (GAN) architecture for this task. Such a stochastic generator, conditioned on highly compressed musical audio signals, could one day generate outputs indistinguishable from high-quality releases. Therefore, the present study may yield insights into more efficient musical data storage and transmission. We train stochastic and deterministic generators on MP3-compressed audio signals with 16, 32, and 64 kbit/s. We perform an extensive evaluation of the different experiments utilizing objective metrics and listening tests. We find that the models can improve the quality of the audio signals over the MP3 versions for 16 and 32 kbit/s and that the stochastic generators are capable of generating outputs that are closer to the original signals than those of the deterministic generators.


2020 ◽  
Vol 10 (1) ◽  
pp. 375 ◽  
Author(s):  
Zetao Jiang ◽  
Yongsong Huang ◽  
Lirui Hu

The super-resolution generative adversarial network (SRGAN) is a seminal work that is capable of generating realistic textures during single image super-resolution. However, the hallucinated details are often accompanied by unpleasant artifacts. To further enhance the visual quality, we propose a deep learning method for single image super-resolution (SR). Our method directly learns an end-to-end mapping between the low/high-resolution images. The method is based on depthwise separable convolution super-resolution generative adversarial network (DSCSRGAN). A new depthwise separable convolution dense block (DSC Dense Block) was designed for the generator network, which improved the ability to represent and extract image features, while greatly reducing the total amount of parameters. For the discriminator network, the batch normalization (BN) layer was discarded, and the problem of artifacts was reduced. A frequency energy similarity loss function was designed to constrain the generator network to generate better super-resolution images. Experiments on several different datasets showed that the peak signal-to-noise ratio (PSNR) was improved by more than 3 dB, structural similarity index (SSIM) was increased by 16%, and the total parameter was reduced to 42.8% compared with the original model. Combining various objective indicators and subjective visual evaluation, the algorithm was shown to generate richer image details, clearer texture, and lower complexity.


2018 ◽  
Author(s):  
Gongbo Liang ◽  
Sajjad Fouladvand ◽  
Jie Zhang ◽  
Michael A. Brooks ◽  
Nathan Jacobs ◽  
...  

AbstractComputed tomography (CT) is a widely-used diag-reproducibility regarding radiomic features, such as intensity, nostic image modality routinely used for assessing anatomical tissue characteristics. However, non-standardized imaging pro-tocols are commonplace, which poses a fundamental challenge in large-scale cross-center CT image analysis. One approach to address the problem is to standardize CT images using generative adversarial network models (GAN). GAN learns the data distribution of training images and generate synthesized images under the same distribution. However, existing GAN models are not directly applicable to this task mainly due to the lack of constraints on the mode of data to generate. Furthermore, they treat every image equally, but in real applications, some images are more difficult to standardize than the others. All these may lead to the lack-of-detail problem in CT image synthesis. We present a new GAN model called GANai to mitigate the differences in radiomic features across CT images captured using non-standard imaging protocols. Given source images, GANai composes new images by specifying a high-level goal that the image features of the synthesized images should be similar to those of the standard images. GANai introduces an alternative improvement training strategy to alternatively and steadily improve model performance. The new training strategy enables a series of technical improvements, including phase-specific loss functions, phase-specific training data, and the adoption of ensemble learning, leading to better model performance. The experimental results show that GANai is significantly better than the existing state-of-the-art image synthesis algorithms on CT image standardization. Also, it significantly improves the efficiency and stability of GAN model training.


2019 ◽  
Author(s):  
Tijl Grootswagers ◽  
Amanda K. Robinson ◽  
Sophia M. Shatek ◽  
Thomas A. Carlson

AbstractHow are visual inputs transformed into conceptual representations by the human visual system? The contents of human perception, such as objects presented on a visual display, can reliably be decoded from voxel activation patterns in fMRI, and in evoked sensor activations in MEG and EEG. A prevailing question is the extent to which brain activation associated with object categories is due to statistical regularities of visual features within object categories. Here, we assessed the contribution of mid-level features to conceptual category decoding using EEG and a novel fast periodic decoding paradigm. Our study used a stimulus set consisting of intact objects from the animate (e.g., fish) and inanimate categories (e.g., chair) and scrambled versions of the same objects that were unrecognizable and preserved their visual features (Long, Yu, & Konkle, 2018). By presenting the images at different periodic rates, we biased processing to different levels of the visual hierarchy. We found that scrambled objects and their intact counterparts elicited similar patterns of activation, which could be used to decode the conceptual category (animate or inanimate), even for the unrecognizable scrambled objects. Animacy decoding for the scrambled objects, however, was only possible at the slowest periodic presentation rate. Animacy decoding for intact objects was faster, more robust, and could be achieved at faster presentation rates. Our results confirm that the mid-level visual features preserved in the scrambled objects contribute to animacy decoding, but also demonstrate that the dynamics vary markedly for intact versus scrambled objects. Our findings suggest a complex interplay between visual feature coding and categorical representations that is mediated by the visual system’s capacity to use image features to resolve a recognisable object.


2021 ◽  
Vol 2021 ◽  
pp. 1-7
Author(s):  
Yuan Hang

In spite of the gargantuan number of patients affected by the thyroid nodule, the detection at an early stage is still a challenging task. Thyroid ultrasonography (US) is a noninvasive, inexpensive procedure widely used to detect and evaluate the thyroid nodules. The ultrasonography method for image classification is a computer-aided diagnostic technology based on image features. In this paper, we illustrate a method which involves the combination of the deep features with the conventional features together to form a hybrid feature space. Several image enhancement techniques, such as histogram equalization, Laplacian operator, logarithm transform, and Gamma correction, are undertaken to improve the quality and characteristics of the image before feature extraction. Among these methods, applying histogram equalization not only improves the brightness and contrast of the image but also achieves the highest classification accuracy at 69.8%. We extract features such as histograms of oriented gradients, local binary pattern, SIFT, and SURF and combine them with deep features of residual generative adversarial network. We compare the ResNet18, a residual convolutional neural network with 18 layers, with the Res-GAN, a residual generative adversarial network. The experimental result shows that Res-GAN outperforms the former model. Besides, we fuse SURF with deep features with a random forest model as a classifier, which achieves 95% accuracy.


2021 ◽  
Author(s):  
Dong Sui ◽  
Maozu Guo ◽  
Xiaoxuan Ma ◽  
Julian Baptiste ◽  
Lei Zhang

Abstract Background: Precision medicine, a popular treatment strategy, has become increasingly important to the development of targeted therapy. To correlate medical imaging with prognostic and genomic data, researches in radiomics and radiogenomics have provide many pre-de_ned image features to describe image information quantitatively or qualitatively. However, in previous researches, there are only statistical results which proves high correlation among multi-source medical data, but those can't give intuitive and visual result. Results: In this paper, a deep learning based radio-genomics framework is provided to construct the linkage from lung tumor images to genomics data and implement generation process in turn, which form a bi-direction framework to map multi-source medical data. The imaging features are extracted from auto-encoder under the condition of genomics data. It can obtain much more relevant features than traditional radio-genomics methods. Finally, we use generative adversarial network to transform genomics data onto tumor images, which gives a cogent result to explain the linkage between them. Conclusions: Our proposed framework provides a deep learning method to do radio-genomics researches more functionally and intuitively.


2021 ◽  
Vol 40 ◽  
pp. 03013
Author(s):  
Kaustubh Gayadhankar ◽  
Rishi Patel ◽  
Hrithik Lodha ◽  
Swapnil Shinde

In Today’s date plagiarism is a very important aspect because content originality is the client's prior requirement. Many people on the internet use others' images and get publicity while the owner of the image or data won′t get anything out of it. Many users copy the data or image features from the other users and modify it a little bit or create an artificial replica of it. With sufficient computational power and volume of data, the GAN models are capable enough to produce fake images that look very much similar to the real images. These kinds of images are generally not detected by modern plagiarism systems. GAN stands for generative adversarial network. It has two neural networks working inside. The first one is the generator which generates a random image and the second one is the discriminator which identifies whether the image being generated is a real or a fake image. In this paper, we have proposed a system that has been trained on both fake images (GAN Generated images) and real images and will help us in flagging whether the image is plagiarised or a real image.


2021 ◽  
Vol 12 (5) ◽  
pp. 1-18
Author(s):  
Min Wang ◽  
Congyan Lang ◽  
Liqian Liang ◽  
Songhe Feng ◽  
Tao Wang ◽  
...  

Semantic image synthesis is a new rising and challenging vision problem accompanied by the recent promising advances in generative adversarial networks. The existing semantic image synthesis methods only consider the global information provided by the semantic segmentation mask, such as class label, global layout, and location, so the generative models cannot capture the rich local fine-grained information of the images (e.g., object structure, contour, and texture). To address this issue, we adopt a multi-scale feature fusion algorithm to refine the generated images by learning the fine-grained information of the local objects. We propose OA-GAN, a novel object-attention generative adversarial network that allows attention-driven, multi-fusion refinement for fine-grained semantic image synthesis. Specifically, the proposed model first generates multi-scale global image features and local object features, respectively, then the local object features are fused into the global image features to improve the correlation between the local and the global. In the process of feature fusion, the global image features and the local object features are fused through the channel-spatial-wise fusion block to learn ‘what’ and ‘where’ to attend in the channel and spatial axes, respectively. The fused features are used to construct correlation filters to obtain feature response maps to determine the locations, contours, and textures of the objects. Extensive quantitative and qualitative experiments on COCO-Stuff, ADE20K and Cityscapes datasets demonstrate that our OA-GAN significantly outperforms the state-of-the-art methods.


Electronics ◽  
2020 ◽  
Vol 9 (2) ◽  
pp. 220
Author(s):  
Chunxue Wu ◽  
Haiyan Du ◽  
Qunhui Wu ◽  
Sheng Zhang

In the automatic sorting process of express delivery, a three-segment code is used to represent a specific area assigned by a specific delivery person. In the process of obtaining the courier order information, the camera is affected by factors such as light, noise, and subject shake, which will cause the information on the courier order to be blurred, and some information will be lost. Therefore, this paper proposes an image text deblurring method based on a generative adversarial network. The model of the algorithm consists of two generative adversarial networks, combined with Wasserstein distance, using a combination of adversarial loss and perceptual loss on unpaired datasets to train the network model to restore the captured blurred images into clear and natural image. Compared with the traditional method, the advantage of this method is that the loss function between the input and output images can be calculated indirectly through the positive and negative generative adversarial networks. The Wasserstein distance can achieve a more stable training process and a more realistic generation effect. The constraints of adversarial loss and perceptual loss make the model capable of training on unpaired datasets. The experimental results on the GOPRO test dataset and the self-built unpaired dataset showed that the two indicators, peak signal-to-noise ratio (PSNR) and structural similarity index (SSIM), increased by 13.3% and 3%, respectively. The human perception test results demonstrated that the algorithm proposed in this paper was better than the traditional blur algorithm as the deblurring effect was better.


Sign in / Sign up

Export Citation Format

Share Document