Image understanding systems based on the unifying representation of perceptual and conceptual information and the solution of mid-level and high-level vision problems

2001 ◽  
Author(s):  
Igor Kuvychko
Author(s):  
Omkar Madhukar Deshmukh

Computer vision may be a field of computer science that trains computers to interpret and perceive the visual world. exploitation digital pictures from cameras and videos and deep learning models, machines will accurately determine and classify objects — and so react to what they "see.”. Computer vision is Associate in Nursing knowledge domain scientific field that deals with however computers will gain high-level understanding from digital pictures or videos. From the angle of engineering, it seeks to grasp and alter tasks that the human sensory system will do. Computer vision tasks embrace strategies for exploit, processing, analyzing and understanding digital pictures, and extraction of high-dimensional knowledge from the important world so as to supply numerical or symbolic info, e.g. within the styles of selections. Understanding during this context suggests that the transformation of visual pictures (the input of the retina) into descriptions of the planet that be to thought processes and might elicit acceptable action. This image understanding will be seen because the disentangling of symbolic info from image knowledge mistreatment models created with the help of pure mathematics, physics, statistics, and learning theory.


Object detection in videos is gaining more attention recently as it is related to video analytics and facilitates image understanding and applicable to . The video object detection methods can be divided into traditional and deep learning based methods. Trajectory classification, low rank sparse matrix, background subtraction and object tracking are considered as traditional object detection methods as they primary focus is informative feature collection, region selection and classification. The deep learning methods are more popular now days as they facilitate high-level features and problem solving in object detection algorithms. We have discussed various object detection methods and challenges in this paper.


2018 ◽  
Author(s):  
Juan Linde-Domingo ◽  
Matthias S. Treder ◽  
Casper Kerren ◽  
Maria Wimber

AbstractRemembering is a reconstructive process. Surprisingly little is known about how the reconstruction of a memory unfolds in time in the human brain. We used reaction times and EEG time-series decoding to test the hypothesis that the information flow is reversed when an event is reconstructed from memory, compared to when the same event is initially being perceived. Across three experiments, we found highly consistent evidence supporting such a reversed stream. When seeing an object, low-level perceptual features were discriminated faster behaviourally, and could be decoded from brain activity earlier, than high-level conceptual features. This pattern reversed during associative memory recall, with reaction times and brain activity patterns now indicating that conceptual information was reconstructed more rapidly than perceptual details. Our findings support a neurobiologically plausible model of human memory, suggesting that memory retrieval is a hierarchical, multi-layered process that prioritizes semantically meaningful information over perceptual detail.


Author(s):  
N. Bianchi ◽  
P. Bottoni ◽  
P. Mussio ◽  
C. Spinu ◽  
C. Garbay

The paper addresses the problem of controlling situated image understanding processes. Two complementary control styles are considered and applied cooperatively, a deliberative one and a reactive one. The role of deliberative control is to account for the unpredictability of situations, by dynamically determining which strategies to pursue, based on the results obtained so far and more generally on the state of the understanding process. The role of reactive control is to account for the variability of local properties of the image by tuning operations to subimages, each one being homogeneous with respect to a given operation. A variable organization of agents is studied to face this variability. The two control modes are integrated into a unified formalism describing segmentation and interpretation activities. A feedback from high level interpretation tasks to low level segmentation tasks thus becomes possible and is exploited to recover wrong segmentations. Preliminary results in the field of liver biopsy image understanding are shown to demonstrate the potential of the approach.


2005 ◽  
Vol 14 (01n02) ◽  
pp. 233-260 ◽  
Author(s):  
ROXANNE CANOSA

Computational modeling of the human visual system is of current interest to developers of artificial vision systems, primarily because a biologically-inspired model can offer solutions to otherwise intractable image understanding problems. The purpose of this study is to present a biologically-inspired model of selective perception that augments a stimulus-driven approach with a high-level algorithm that takes into account particularly informative regions in the scene. The representation is compact and given in the form of a topographic map of relative perceptual conspicuity values. Other recent attempts at compact scene representation consider only low-level information that codes salient features such as color, edge, and luminance values. The previous attempts do not correlate well with subjects' fixation locations during viewing of complex images or natural scenes. This study uses high-level information in the form of figure/ground segmentation, potential object detection, and task-specific location bias. The results correlate well with the fixation densities of human viewers of natural scenes, and can be used as a preprocessing module for image understanding or intelligent surveillance applications.


Sensors ◽  
2020 ◽  
Vol 20 (18) ◽  
pp. 5080
Author(s):  
Baohua Qiang ◽  
Ruidong Chen ◽  
Mingliang Zhou ◽  
Yuanchao Pang ◽  
Yijie Zhai ◽  
...  

In recent years, increasing image data comes from various sensors, and object detection plays a vital role in image understanding. For object detection in complex scenes, more detailed information in the image should be obtained to improve the accuracy of detection task. In this paper, we propose an object detection algorithm by jointing semantic segmentation (SSOD) for images. First, we construct a feature extraction network that integrates the hourglass structure network with the attention mechanism layer to extract and fuse multi-scale features to generate high-level features with rich semantic information. Second, the semantic segmentation task is used as an auxiliary task to allow the algorithm to perform multi-task learning. Finally, multi-scale features are used to predict the location and category of the object. The experimental results show that our algorithm substantially enhances object detection performance and consistently outperforms other three comparison algorithms, and the detection speed can reach real-time, which can be used for real-time detection.


2019 ◽  
Vol 11 (6) ◽  
pp. 612 ◽  
Author(s):  
Xiangrong Zhang ◽  
Xin Wang ◽  
Xu Tang ◽  
Huiyu Zhou ◽  
Chen Li

Image captioning generates a semantic description of an image. It deals with image understanding and text mining, which has made great progress in recent years. However, it is still a great challenge to bridge the “semantic gap” between low-level features and high-level semantics in remote sensing images, in spite of the improvement of image resolutions. In this paper, we present a new model with an attribute attention mechanism for the description generation of remote sensing images. Therefore, we have explored the impact of the attributes extracted from remote sensing images on the attention mechanism. The results of our experiments demonstrate the validity of our proposed model. The proposed method obtains six higher scores and one slightly lower, compared against several state of the art techniques, on the Sydney Dataset and Remote Sensing Image Caption Dataset (RSICD), and receives all seven higher scores on the UCM Dataset for remote sensing image captioning, indicating that the proposed framework achieves robust performance for semantic description in high-resolution remote sensing images.


2014 ◽  
Vol 678 ◽  
pp. 147-150
Author(s):  
Yu Liang Du ◽  
Ling Feng Yuan ◽  
Wei Bing Wan

Nature Scene classification is a fundamental problem in image understanding. Human can recognize the scene instantly after only a glance. This is mainly because that our visual attention is easily attracted by the salient objects in scene. And these objects are always representative in the natural scene. It is unclear how humans achieve rapid scene categorization. But this kind of high-level cognitive behavior can be reflected by the eye movement. To identify this ability, we propose a model with the guidance of eye movement. It combines the bag of words (BOW) and spatial pyramid matching (SPM) methods to train and test our model on support vector machine (SVM). The eye movement experiments were employed to validate our model. We found that the subjects could recognize the scenes correctly even if given only a few saliency patches with less than one second. These results suggest that the eye tracking saliency patches play an important role for human scene categorization.


Sign in / Sign up

Export Citation Format

Share Document