scholarly journals Modality-Dependent Cross-Modal Retrieval Based on Graph Regularization

2020 ◽  
Vol 2020 ◽  
pp. 1-17
Author(s):  
Guanhua Wang ◽  
Hua Ji ◽  
Dexin Kong ◽  
Na Zhang

Nowadays, the heterogeneity gap of different modalities is the key problem for cross-modal retrieval. In order to overcome heterogeneity gaps, potential correlations of different modalities need to be mined. At the same time, the semantic information of class labels is used to reduce the semantic gaps between different modalities data and realize the interdependence and interoperability of heterogeneous data. In order to fully exploit the potential correlation of different modalities, we propose a cross-modal retrieval framework based on graph regularization and modality dependence (GRMD). Firstly, considering the potential feature correlation and semantic correlation, different projection matrices are learned for different retrieval tasks, such as image query text (I2T) or text query image (T2I). Secondly, utilizing the internal structure of original feature space constructs an adjacent graph with semantic information constraints which can make different labels of heterogeneous data closer to the corresponding semantic information. The experimental results on three widely used datasets demonstrate the effectiveness of our method.

Author(s):  
Shaily Malik ◽  
Poonam Bansal

The real-world data is multimodal and to classify them by machine learning algorithms, features of both modalities must be transformed into common latent space. The high dimensional common space transformation of features lose their locality information and susceptible to noise. This research article has dealt with this issue of a semantic autoencoder and presents a novel algorithm with distinct mapped features with locality preservation into a commonly hidden space. We call it discriminative regularized semantic autoencoder (DRSAE). It maintains the low dimensional features in the manifold to manage the inter and intra-modality of the data. The data has multi labels, and these are transformed into an aware feature space. Conditional Principal label space transformation (CPLST) is used for it. With the two-fold proposed algorithm, we achieve a significant improvement in text retrieval form image query and image retrieval from the text query.


Author(s):  
Rahmi Hidayati ◽  
Agus Harjoko

AbstrakVideo retrieval digunakan untuk melakukan pencarian video berdasarkan query yang dimasukkan oleh user yaitu teks dan gambar. Sistem ini dapat meningkatkan kemampuan pencarian terhadap video dan diharapkan dapat mengurangi waktu temu-kembali video. Tujuan dari penelitian ini adalah merancang dan membuat sebuah aplikasi perangkat lunak video retrieval berdasarkan teks dan gambar yang ada dalam video. Proses indeks untuk teks adalah proses tokenizing, filtering (stopword), stemming. Hasil stemming disimpan dalam tabel indeks teks. Proses indeks untuk gambar adalah  membuat histogram warna dan menghitung nilai rata-rata serta standar deviasi pada setiap warna dasar red, green dan blue (RGB) dari setiap gambar. Hasil ekstraksi fitur disimpan pada tabel gambar. Proses retrieval video menggunakan query yaitu  teks, gambar atau keduanya. Untuk query teks sistem memproses query teks dengan melihat query teks pada tabel indeks teks. Jika query teks ada pada tabel indeks teks sistem akan menampilkan informasi video sesuai dengan teks query. Untuk query gambar sistem memproses query gambar dengan mencari nilai dari fitur ekstraksi yaitu means red, means green, means blue, standar deviasi red, standar deviasi green dan standar deviasi blue. Jika nilai hasil ekstraksi ke enam fitur  query gambar ada pada tabel indeks gambar  sistem akan menampilkan informasi video sesuai dengan gambar query. Untuk query teks dan query gambar, sistem akan menampilkan informasi video jika query teks dan query gambar memiliki keterkaitan yaitu  query teks dan query gambar mempunyai judul film yang sama.  Kata kunci—  video, indeks, retrieval, teks, gambar AbstractRetrieval video has been used to search a video based on the query entered by user which were text and image. This system could increase the searching ability on video browsing and expected to reduce the video’s retrieval time. The research purposes were designing and creating a software application of retrieval video based on the text and image on the video. The index process for the text is tokenizing, filtering (stopword), stemming. The results of stemming to saved in the text index table. Index process for the image is to create an image color histogram and compute the mean and standard deviation at each primary color red, green and blue (RGB) of each image. The results of feature extraction is stored in the image table The process of video retrieval using the query text, images or both. To text query system to process the text query by looking at the text index tables. If there is a text query on the index table system will display information of the video according to the text query. To image query system to process the image query by finding the value of the feature extraction means red, green means, means blue, red standard deviation, standard deviation and standard deviation of blue green. If the value of the six features extracted query image on the index table image will display the video information system according to the query image. To query text and query images, the system will display the video information if the query text and query images have a relationship that is query text and query image has the same film title.  Keywords—  video, index, retrieval, text, image


2018 ◽  
Vol 8 (9) ◽  
pp. 1621 ◽  
Author(s):  
Fan Jiang ◽  
Zhencai Zhu ◽  
Wei Li ◽  
Yong Ren ◽  
Gongbo Zhou ◽  
...  

Acceleration sensors are frequently applied to collect vibration signals for bearing fault diagnosis. To fully use these vibration signals of multi-sensors, this paper proposes a new approach to fuse multi-sensor information for bearing fault diagnosis by using ensemble empirical mode decomposition (EEMD), correlation coefficient analysis, and support vector machine (SVM). First, EEMD is applied to decompose the vibration signal into a set of intrinsic mode functions (IMFs), and a correlation coefficient ratio factor (CCRF) is defined to select sensitive IMFs to reconstruct new vibration signals for further feature fusion analysis. Second, an original feature space is constructed from the reconstructed signal. Afterwards, weights are assigned by correlation coefficients among the vibration signals of the considered multi-sensors, and the so-called fused features are extracted by the obtained weights and original feature space. Finally, a trained SVM is employed as the classifier for bearing fault diagnosis. The diagnosis results of the original vibration signals, the first IMF, the proposed reconstruction signal, and the proposed method are 73.33%, 74.17%, 95.83% and 100%, respectively. Therefore, the experiments show that the proposed method has the highest diagnostic accuracy, and it can be regarded as a new way to improve diagnosis results for bearings.


2020 ◽  
Vol 34 (04) ◽  
pp. 3513-3520 ◽  
Author(s):  
Man-Sheng Chen ◽  
Ling Huang ◽  
Chang-Dong Wang ◽  
Dong Huang

Previous multi-view clustering algorithms mostly partition the multi-view data in their original feature space, the efficacy of which heavily and implicitly relies on the quality of the original feature presentation. In light of this, this paper proposes a novel approach termed Multi-view Clustering in Latent Embedding Space (MCLES), which is able to cluster the multi-view data in a learned latent embedding space while simultaneously learning the global structure and the cluster indicator matrix in a unified optimization framework. Specifically, in our framework, a latent embedding representation is firstly discovered which can effectively exploit the complementary information from different views. The global structure learning is then performed based on the learned latent embedding representation. Further, the cluster indicator matrix can be acquired directly with the learned global structure. An alternating optimization scheme is introduced to solve the optimization problem. Extensive experiments conducted on several real-world multi-view datasets have demonstrated the superiority of our approach.


Author(s):  
Xiaolong Gong ◽  
Linpeng Huang ◽  
Fuwei Wang

Real web datasets are often associated with multiple views such as long and short commentaries, users preference and so on. However, with the rapid growth of user generated texts, each view of the dataset has a large feature space and leads to the computational challenge during matrix decomposition process. In this paper, we propose a novel multi-view clustering algorithm based on the non-negative matrix factorization that attempts to use feature sampling strategy in order to reduce the complexity during the iteration process. In particular, our method exploits unsupervised semantic information in the learning process to capture the intrinsic similarity through a graph regularization. Moreover, we use Hilbert Schmidt Independence Criterion (HSIC) to explore the unsupervised semantic diversity information among multi-view contents of one web item. The overall objective is to minimize the loss function of multi-view non-negative matrix factorization that combines with an intra-semantic similarity graph regularizer and an inter-semantic diversity term. Compared with some state-of-the-art methods, we demonstrate the effectiveness of our proposed method on a large real-world dataset Doucom and the other three smaller datasets.


2017 ◽  
Vol 10 (1) ◽  
pp. 85-108 ◽  
Author(s):  
Khadidja Belattar ◽  
Sihem Mostefai ◽  
Amer Draa

The use of Computer-Aided Diagnosis in dermatology raises the necessity of integrating Content-Based Image Retrieval (CBIR) technologies. The latter could be helpful to untrained users as a decision support system for skin lesion diagnosis. However, classical CBIR systems perform poorly due to semantic gap. To alleviate this problem, we propose in this paper an intelligent Content-Based Dermoscopic Image Retrieval (CBDIR) system with Relevance Feedback (RF) for melanoma diagnosis that exhibits: efficient and accurate image retrieval as well as visual features extraction that is independent of any specific diagnostic method. After submitting a query image, the proposed system uses linear kernel-based active SVM, combined with histogram intersection-based similarity measure to retrieve the K most similar skin lesion images. The dominant (melanoma, benign) class in this set will be identified as the image query diagnosis. Extensive experiments conducted on our system using a 1097 image database show that the proposed scheme is more effective than CBDIR without the assistance of RF.


Sensors ◽  
2020 ◽  
Vol 20 (23) ◽  
pp. 6846
Author(s):  
Ngoc-Thanh Dinh ◽  
Young-Han Kim

Existing caching mechanisms considers content objects individually without considering the semantic correlation among content objects. We argue that this approach can be inefficient in Internet of Things due to the highly redundant nature of IoT device deployments and the data accuracy tolerance of IoT applications. In many IoT applications, an approximate answer is acceptable. Therefore, a cache of an information object having a high semantic correlation with the requested information object can be used instead of a cache of the exact requested information object. In this case, caching both of the information objects can be inefficient and redundant. This paper proposes a caching retrieval scheme which considers the semantic information correlation of information objects of nodes for cache retrieval. We illustrate the benefits of considering the semantic information correlation in caching by studying IoT data caching at the edge. Our experiments and analysis show that semantic correlated caching can significantly improve the efficiency, cache hit, and reduce the resource consumption of IoT devices.


Information ◽  
2019 ◽  
Vol 10 (8) ◽  
pp. 261 ◽  
Author(s):  
Lu

An important problem in machine learning is that, when using more than two labels, it is very difficult to construct and optimize a group of learning functions that are still useful when the prior distribution of instances is changed. To resolve this problem, semantic information G theory, Logical Bayesian Inference (LBI), and a group of Channel Matching (CM) algorithms are combined to form a systematic solution. A semantic channel in G theory consists of a group of truth functions or membership functions. In comparison with the likelihood functions, Bayesian posteriors, and Logistic functions that are typically used in popular methods, membership functions are more convenient to use, providing learning functions that do not suffer the above problem. In Logical Bayesian Inference (LBI), every label is independently learned. For multilabel learning, we can directly obtain a group of optimized membership functions from a large enough sample with labels, without preparing different samples for different labels. Furthermore, a group of Channel Matching (CM) algorithms are developed for machine learning. For the Maximum Mutual Information (MMI) classification of three classes with Gaussian distributions in a two-dimensional feature space,only 2–3 iterations are required for the mutual information between three classes and three labels to surpass 99% of the MMI for most initial partitions For mixture models, the Expectation-Maximization (EM) algorithm is improved to form the CM-EM algorithm, which can outperform the EM algorithm when the mixture ratios are imbalanced, or when local convergence exists. The CM iteration algorithm needs to combine with neural networks for MMI classification in high-dimensional feature spaces. LBI needs further investigation for the unification of statistics and logic.


Author(s):  
Yihe Liu ◽  
◽  
Huaxiang Zhang ◽  
Li Liu ◽  
Lili Meng ◽  
...  

Existing cross-media retrieval methods usually learn one same latent subspace for different retrieval tasks, which can only achieve a suboptimal retrieval. In this paper, we propose a novel cross-media retrieval method based on Query Modality and Semi-supervised Regularization (QMSR). Taking the cross-media retrieval between images and texts for example, QMSR learns two couples of mappings for different retrieval tasks (i.e. using images to search texts (Im2Te) or using texts to search images (Te2Im)) instead of learning one couple of mappings. QMSR learns two couples of projections by optimizing the correlation between images and texts and the semantic information of query modality (image or text), and integrates together the semi-supervised regularization, the structural information among both labeled and unlabeled data of query modality to transform different media objects from original feature spaces into two different isomorphic subspaces (Im2Te common subspace and Te2Im common subspace). Experimental results show the effectiveness of the proposed method.


Author(s):  
NACER FARAJZADEH ◽  
GANG PAN ◽  
ZHAOHUI WU ◽  
MIN YAO

This paper proposes a new approach to improve multiclass classification performance by employing Stacked Generalization structure and One-Against-One decomposition strategy. The proposed approach encodes the outputs of all pairwise classifiers by implicitly embedding two-class discriminative information in a probabilistic manner. The encoded outputs, called Meta Probability Codes (MPCs), are interpreted as the projections of the original features. It is observed that MPC, compared to the original features, has more appropriate features for clustering. Based on MPC, we introduce a cluster-based multiclass classification algorithm, called MPC-Clustering. The MPC-Clustering algorithm uses the proposed approach to project an original feature space to MPC, and then it employs a clustering scheme to cluster MPCs. Subsequently, it trains individual multiclass classifiers on the produced clusters to complete the procedure of multiclass classifier induction. The performance of the proposed algorithm is extensively evaluated on 20 datasets from the UCI machine learning database repository. The results imply that MPC-Clustering is quite efficient with an improvement of 2.4% overall classification rate compared to the state-of-the-art multiclass classifiers.


Sign in / Sign up

Export Citation Format

Share Document