Bimodal fusion of low-level visual features and high-level semantic features for near-duplicate video clip detection

Hyun-seok Min; Jae Young Choi; Wesley De Neve; Yong Man Ro

doi:10.1016/j.image.2011.04.001

Feature-Specific Neural Reactivation during Episodic Memory

10.1101/622837 ◽

2019 ◽

Author(s):

Michael B. Bone ◽

Fahad Ahmad ◽

Bradley R. Buchsbaum

Keyword(s):

Frontal Cortex ◽

Visual Features ◽

Semantic Features ◽

Low Level ◽

Visual Hierarchy ◽

Strict Interpretation ◽

Novel Approach ◽

High Level ◽

Hierarchical Representations

AbstractWhen recalling an experience of the past, many of the component features of the original episode may be, to a greater or lesser extent, reconstructed in the mind’s eye. There is strong evidence that the pattern of neural activity that occurred during an initial perceptual experience is recreated during episodic recall (neural reactivation), and that the degree of reactivation is correlated with the subjective vividness of the memory. However, while we know that reactivation occurs during episodic recall, we have lacked a way of precisely characterizing the contents—in terms of its featural constituents—of a reactivated memory. Here we present a novel approach, feature-specific informational connectivity (FSIC), that leverages hierarchical representations of image stimuli derived from a deep convolutional neural network to decode neural reactivation in fMRI data collected while participants performed an episodic recall task. We show that neural reactivation associated with low-level visual features (e.g. edges), high-level visual features (e.g. facial features), and semantic features (e.g. “terrier”) occur throughout the dorsal and ventral visual streams and extend into the frontal cortex. Moreover, we show that reactivation of both low- and high-level visual features correlate with the vividness of the memory, whereas only reactivation of low-level features correlates with recognition accuracy when the lure and target images are semantically similar. In addition to demonstrating the utility of FSIC for mapping feature-specific reactivation, these findings resolve the relative contributions of low- and high-level features to the vividness of visual memories, clarify the role of the frontal cortex during episodic recall, and challenge a strict interpretation the posterior-to-anterior visual hierarchy.

Download Full-text

Fusing Low-Level Visual Features and High-Level Semantic Features for Breast Cancer Diagnosis in Digital Mammograms

2020 IEEE 20th International Conference on Bioinformatics and Bioengineering (BIBE) ◽

10.1109/bibe50027.2020.00149 ◽

2020 ◽

Author(s):

George Apostolopoulos ◽

Athanasios Koutras ◽

Dionysios Anyfantis ◽

Ioanna Christoyianni ◽

Evaggelos Dermatas

Keyword(s):

Breast Cancer ◽

Cancer Diagnosis ◽

Breast Cancer Diagnosis ◽

Visual Features ◽

Semantic Features ◽

Low Level ◽

High Level

Download Full-text

Bridging the Semantic Gap in Image Retrieval

Distributed Multimedia Databases ◽

10.4018/978-1-930708-29-7.ch002 ◽

2002 ◽

pp. 14-36 ◽

Cited By ~ 31

Author(s):

Rhong Zhao ◽

William I. Grosky

Keyword(s):

Information Retrieval ◽

Image Retrieval ◽

Research Field ◽

Visual Features ◽

Semantic Features ◽

Visual Data ◽

Image Content ◽

Low Level ◽

Significant Research ◽

High Level

The emergence of multimedia technology and the rapidly expanding image and video collections on the Internet have attracted significant research efforts in providing tools for effective retrieval and management of visual data. Image retrieval is based on the availability of a representation scheme of image content. Image content descriptors may be visual features such as color, texture, shape, and spatial relationships, or semantic primitives. Conventional information retrieval was based solely on text, and those approaches to textual information retrieval have been transplanted into image retrieval in a variety of ways. However, “a picture is worth a thousand words.” Image content is much more versatile compared with text, and the amount of visual data is already enormous and still expanding very rapidly. Hoping to cope with these special characteristics of visual data, content-based image retrieval methods have been introduced. It has been widely recognized that the family of image retrieval techniques should become an integration of both low-level visual features addressing the more detailed perceptual aspects and high-level semantic features underlying the more general conceptual aspects of visual data. Neither of these two types of features is sufficient to retrieve or manage visual data in an effective or efficient way (Smeulders, et al., 2000). Although efforts have been devoted to combining these two aspects of visual data, the gap between them is still a huge barrier in front of researchers. Intuitive and heuristic approaches do not provide us with satisfactory performance. Therefore, there is an urgent need of finding the latent correlation between low-level features and high-level concepts and merging them from a different perspective. How to find this new perspective and bridge the gap between visual features and semantic features has been a major challenge in this research field. Our chapter addresses these issues.

Download Full-text

Content-based image retrieval for fabric images: A survey

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v23.i3.pp1861-1872 ◽

2021 ◽

Vol 23 (3) ◽

pp. 1861

Author(s):

Silvester Tena ◽

Rudy Hartanto ◽

Igi Ardiyanto

Keyword(s):

Feature Extraction ◽

Image Retrieval ◽

Search Time ◽

Computation Time ◽

Content Based Image Retrieval ◽

Visual Features ◽

Semantic Features ◽

Low Level ◽

Human Perceptions ◽

High Level

In <span>recent years, a great deal of research has been conducted in the area of fabric image retrieval, especially the identification and classification of visual features. One of the challenges associated with the domain of content-based image retrieval (CBIR) is the semantic gap between low-level visual features and high-level human perceptions. Generally, CBIR includes two main components, namely feature extraction and similarity measurement. Therefore, this research aims to determine the content-based image retrieval for fabric using feature extraction techniques grouped into traditional methods and convolutional neural networks (CNN). Traditional descriptors deal with low-level features, while CNN addresses the high-level, called semantic features. Traditional descriptors have the advantage of shorter computation time and reduced system requirements. Meanwhile, CNN descriptors, which handle high-level features tailored to human perceptions, deal with large amounts of data and require a great deal of computation time. In general, the features of a CNN's fully connected layers are used for matching query and database images. In several studies, the extracted features of the CNN's convolutional layer were used for image retrieval. At the end of the CNN layer, hash codes are added to reduce </span>search time.

Download Full-text

Robust Image Labeling Using Conditional Random Fields

10.32920/ryerson.14651541 ◽

2021 ◽

Author(s):

Maryam Nematollahi Arani

Keyword(s):

Object Recognition ◽

Mixture Models ◽

Random Fields ◽

Conditional Random Fields ◽

Semantic Gap ◽

Visual Features ◽

Image Labeling ◽

Low Level ◽

Robust Image ◽

High Level

Object recognition has become a central topic in computer vision applications such as image search, robotics and vehicle safety systems. However, it is a challenging task due to the limited discriminative power of low-level visual features in describing the considerably diverse range of high-level visual semantics of objects. Semantic gap between low-level visual features and high-level concepts are a bottleneck in most systems. New content analysis models need to be developed to bridge the semantic gap. In this thesis, algorithms based on conditional random fields (CRF) from the class of probabilistic graphical models are developed to tackle the problem of multiclass image labeling for object recognition. Image labeling assigns a specific semantic category from a predefined set of object classes to each pixel in the image. By well capturing spatial interactions of visual concepts, CRF modeling has proved to be a successful tool for image labeling. This thesis proposes novel approaches to empowering the CRF modeling for robust image labeling. Our primary contributions are twofold. To better represent feature distributions of CRF potentials, new feature functions based on generalized Gaussian mixture models (GGMM) are designed and their efficacy is investigated. Due to its shape parameter, GGMM can provide a proper fit to multi-modal and skewed distribution of data in nature images. The new model proves more successful than Gaussian and Laplacian mixture models. It also outperforms a deep neural network model on Corel imageset by 1% accuracy. Further in this thesis, we apply scene level contextual information to integrate global visual semantics of the image with pixel-wise dense inference of fully-connected CRF to preserve small objects of foreground classes and to make dense inference robust to initial misclassifications of the unary classifier. Proposed inference algorithm factorizes the joint probability of labeling configuration and image scene type to obtain prediction update equations for labeling individual image pixels and also the overall scene type of the image. The proposed context-based dense CRF model outperforms conventional dense CRF model by about 2% in terms of labeling accuracy on MSRC imageset and by 4% on SIFT Flow imageset. Also, the proposed model obtains the highest scene classification rate of 86% on MSRC dataset.

Download Full-text

Distractor-Aware Tracking with Multi-Task and Dynamic Feature Learning

Journal of Circuits System and Computers ◽

10.1142/s0218126621500316 ◽

2020 ◽

pp. 2150031

Author(s):

Weichun Liu ◽

Xiaoan Tang ◽

Chenglin Zhao

Keyword(s):

Correlation Filter ◽

Coarse Grained ◽

Dynamic Feature ◽

Semantic Features ◽

Low Level ◽

Fine Grained ◽

Semantic Embedding ◽

Training Stage ◽

Online Tracking ◽

High Level

Recently, deep trackers based on the siamese networking are enjoying increasing popularity in the tracking community. Generally, those trackers learn a high-level semantic embedding space for feature representation but lose low-level fine-grained details. Meanwhile, the learned high-level semantic features are not updated during online tracking, which results in tracking drift in presence of target appearance variation and similar distractors. In this paper, we present a novel end-to-end trainable Convolutional Neural Network (CNN) based on the siamese network for distractor-aware tracking. It enhances target appearance representation in both the offline training stage and online tracking stage. In the offline training stage, this network learns both the low-level fine-grained details and high-level coarse-grained semantics simultaneously in a multi-task learning framework. The low-level features with better resolution are complementary to semantic features and able to distinguish the foreground target from background distractors. In the online stage, the learned low-level features are fed into a correlation filter layer and updated in an interpolated manner to encode target appearance variation adaptively. The learned high-level features are fed into a cross-correlation layer without online update. Therefore, the proposed tracker benefits from both the adaptability of the fine-grained correlation filter and the generalization capability of the semantic embedding. Extensive experiments are conducted on the public OTB100 and UAV123 benchmark datasets. Our tracker achieves state-of-the-art performance while running with a real-time frame-rate.

Download Full-text

Distraction biases working memory for faces

10.31219/osf.io/qvez5 ◽

2019 ◽

Author(s):

Remington Mallett ◽

Anurima Mummaneni ◽

Jarrod Lewis-Peacock

Keyword(s):

Working Memory ◽

Irrelevant Information ◽

Visual Features ◽

Maintenance Period ◽

Stimulus Space ◽

Estimation Task ◽

Low Level ◽

The Face ◽

High Level ◽

Task Irrelevant

Working memory persists in the face of distraction, yet not without consequence. Previous research has shown that memory for low-level visual features is systematically influenced by the maintenance or presentation of a similar distractor stimulus. Responses are frequently biased in stimulus space towards a perceptual distractor, though this has yet to be determined for high-level stimuli. We investigated whether these influences are shared for complex visual stimuli such as faces. To quantify response accuracies for these stimuli, we used a delayed-estimation task with a computer-generated “face space” consisting of eighty faces that varied continuously as a function of age and sex. In a set of three experiments, we found that responses for a target face held in working memory were biased towards a distractor face presented during the maintenance period. The amount of response bias did not vary as a function of distance between target and distractor. Our data suggest that, similar to low-level visual features, high-level face representations in working memory are biased by the processing of related but task-irrelevant information.

Download Full-text

Visual features influence thought content in the absence of overt semantic information

10.31234/osf.io/ptxdr ◽

2019 ◽

Author(s):

Kathryn E Schertz ◽

Omid Kardan ◽

Marc Berman

Keyword(s):

Cognitive Processes ◽

Semantic Content ◽

Visual Features ◽

Edge Density ◽

Low Level ◽

Life Journey ◽

Edge Content ◽

High Level ◽

Scene Identification ◽

Selection Of

It has recently been shown that the perception of visual features of the environment can influence thought content. Both low-level (e.g., fractalness) and high-level (e.g., presence of water) visual features of the environment can influence thought content, in real-world and experimental settings where these features can make people more reflective and contemplative in their thoughts. It remains to be seen, however, if these visual features retain their influence on thoughts in the absence of overt semantic content, which could indicate a more fundamental mechanism for this effect. In this study, we removed this limitation, by creating scrambled edge versions of images, which maintain edge content from the original images but remove scene identification. Non-straight edge density is one visual feature which has been shown to influence many judgements about objects and landscapes, and has also been associated with thoughts of spirituality. We extend previous findings by showing that non-straight edges retain their influence on the selection of a “Spiritual & Life Journey” topic after scene identification removal. These results strengthen the implication of a causal role for the perception of low-level visual features on the influence of higher-order cognitive function, by demonstrating that in the absence of overt semantic content, low-level features, such as edges, influence cognitive processes.

Download Full-text

Combining visual features and contextual information for image retrieval and annotation

10.32920/ryerson.14649465.v1 ◽

2021 ◽

Author(s):

Rui Zhang

Keyword(s):

Image Retrieval ◽

Image Annotation ◽

Bayesian Framework ◽

Superior Performance ◽

Visual Features ◽

Feature Combination ◽

Low Level ◽

Combination Methods ◽

High Level ◽

Visual Domain

This thesis is primarily focused on the information combination at different levels of a statistical pattern classification framework for image annotation and retrieval. Based on the previous study within the fields of image annotation and retrieval, it has been well-recognized that the low-level visual features, such as color and texture, and high-level features, such as textual description and context, are distinct yet complementary in terms of their distributions and the corresponding discriminative powers of dealing with machine-based recognition and retrieval tasks. Therefore, effective feature combination for image annotation and retrieval has become a desirable and promising perspective from which the semantic gap can be further bridged. Motivated by this fact, the combination of the visual and context modalities and that of different features in the visual domain are tackled by developing two statistical patterns classification approaches considering that the features of the visual modality and those across different modalities exhibit different degrees of heterogeneities, and thus, should be treated differently. Regarding the cross-modality feature combination, a Bayesian framework is proposed to integrate visual content and context, which has been applied to various image annotation and retrieval frameworks. In terms of the combination of different low-level features in the visual domain, the problem is tackled with a novel method that combines texture and color features via a mixture model of their joint distribution. To evaluate the proposed frameworks, many different datasets are employed in the experiments, including the COREL database for image retrieval and the MSRC, LabelMe, PASCAL VOC2009, and an animal image database collected by ourselves for image annotation. Using various evaluation criteria, the first framework is shown to be more effective than the methods purely based on the low-level features or high-level context. As for the second, the experimental results demonstrate not only its superior performance to other feature combination methods but also its ability to discover visual clusters using texture and color simultaneously. Moreover, a demo search engine based on the Bayesian framework is implemented and available online.

Download Full-text

From what we perceive to what we remember: Characterizing representational dynamics of visual memorability

10.1101/049700 ◽

2016 ◽

Cited By ~ 4

Author(s):

Seyed-Mahdi Khaligh-Razavi ◽

Wilma A. Bainbridge ◽

Dimitrios Pantazis ◽

Aude Oliva

Keyword(s):

Neural Dynamics ◽

Visual Features ◽

Memory Encoding ◽

Low Level ◽

Brain Responses ◽

Neural Processes ◽

Neural Bases ◽

Spatio Temporal ◽

High Level

AbstractNot all visual memories are equal—some endure in our minds, while others quickly disappear. Recent behavioral work shows we can reliably predict which images will be remembered. This image property is called memorability. Memorability is intrinsic to an image, robust across observers, and unexplainable by low-level visual features. However, its neural bases and relation with perception and memory remain unknown. Here we characterize the representational dynamics of memorability using magnetoencephalography (MEG). We find memorability is indexed by brain responses starting at 218ms for faces and 371ms for scenes—later than classical early face/scene discrimination perceptual signals, yet earlier than the late memory encoding signal observed at ~700ms. The results show memorability is a high-level image property whose spatio-temporal neural dynamics are different from those of memory encoding. Together, this work brings new insights into the underlying neural processes of the transformation from what we perceive to what we remember.

Download Full-text