Structural Similarity Measures for Database Searching

Author(s):  
Peter Willett
2010 ◽  
Vol 38 ◽  
pp. 1-48 ◽  
Author(s):  
S. Katrenko ◽  
P. W. Adriaans ◽  
M. Van Someren

This paper discusses the problem of marrying structural similarity with semantic relatedness for Information Extraction from text. Aiming at accurate recognition of relations, we introduce local alignment kernels and explore various possibilities of using them for this task. We give a definition of a local alignment (LA) kernel based on the Smith-Waterman score as a sequence similarity measure and proceed with a range of possibilities for computing similarity between elements of sequences. We show how distributional similarity measures obtained from unlabeled data can be incorporated into the learning task as semantic knowledge. Our experiments suggest that the LA kernel yields promising results on various biomedical corpora outperforming two baselines by a large margin. Additional series of experiments have been conducted on the data sets of seven general relation types, where the performance of the LA kernel is comparable to the current state-of-the-art results.


2021 ◽  
Author(s):  
Florian Huber ◽  
Sven van der Burg ◽  
Justin J.J. van der Hooft ◽  
Lars Ridder

Mass spectrometry data is one of the key sources of information in many workflows in medicine and across the life sciences. Mass fragmentation spectra are considered characteristic signatures of the chemical compound they originate from, yet the chemical structure itself usually cannot be easily deduced from the spectrum. Often, spectral similarity measures are used as a proxy for structural similarity but this approach is strongly limited by a generally poor correlation between both metrics. Here, we propose MS2DeepScore: a novel Siamese neural network to predict the structural similarity between two chemical structures solely based on their MS/MS fragmentation spectra. Using a cleaned dataset of >100,000 mass spectra of about 15,000 unique known compounds, MS2DeepScore learns to predict structural similarity scores for spectrum pairs with high accuracy. In addition, sampling different model varieties through Monte-Carlo Dropout is used to further improve the predictions and assess the model's prediction uncertainty. On 3,600 spectra of 500 unseen compounds, MS2DeepScore is able to identify highly-reliable structural matches and predicts Tanimoto scores with a root mean squared error of about 0.15. The prediction uncertainty estimate can be used to select a subset of predictions with a root mean squared error of about 0.1. We demonstrate that MS2DeepScore outperforms classical spectral similarity measures in retrieving chemically related compound pairs from large mass spectral datasets, thereby illustrating its potential for spectral library matching. Finally, MS2DeepScore can also be used to create chemically meaningful mass spectral embeddings that could be used to cluster large numbers of spectra. Added to the recently introduced unsupervised Spec2Vec metric, we believe that machine learning-supported mass spectral similarity metrics have great potential for a range of metabolomics data processing pipelines.


2019 ◽  
Vol 16 (4) ◽  
pp. 0948
Author(s):  
Raheem Abdul Sahib Ogla

Steganography is defined as hiding confidential information in some other chosen media without leaving any clear evidence of changing the media's features. Most traditional hiding methods hide the message directly in the covered media like (text, image, audio, and video). Some hiding techniques leave a negative effect on the cover image, so sometimes the change in the carrier medium can be detected by human and machine. The purpose of suggesting hiding information is to make this change undetectable. The current research focuses on using complex method to prevent the detection of hiding information by human and machine based on spiral search method, the Structural Similarity Index Metrics measures are used to get the accuracy and quality of the retrieved image and to improve its perceived quality. The values of information measures are calculated through practical experiments of (perceptibility, robustness, capacity) by using interpolation technique and structural similarity measures. Experimental results show that the use of these measures (PSNR, MSE, and SSIM) has improved the image quality by 87% and has produced values of PSNR (38-41 dB), MSE = 0.6537 and SSIM= 0.8255. The results also demonstrate a remarkable progress in the field of hiding information and the increasing difficulty of detecting it by humans and machines.


Author(s):  
Giovanna Guerrini ◽  
Marco Mesiti ◽  
Elisa Bertino

This chapter discusses existing approaches to evaluate and measure structural similarity in sources of XML documents. A relevant peculiarity of XML documents, indeed, is that information on the document structure is available in the document itself. In the chapter we present different approaches aiming at evaluating structural similarity at three different levels: among documents, between a document and a schema, and among schemas. The most relevant applications of such measures are for document classification and schema extraction, and for document and schema structural clustering, though other interesting applications such as document change detection and structural querying can be devised, and will be discussed throughout the chapter.


2019 ◽  
Vol 1 (1) ◽  
pp. 438-446
Author(s):  
Jacek Batóg ◽  
Barbara Batóg

AbstractModern economy requires knowledge and skills, which are acquired by future employees mostly in the fields of education including science and engineering. The increase in the number of graduates in this type of studies can be achieved in different ways, one of which is to create conditions and increase the propensity of women to obtain this type of education. The aim of the research presented in the article is to analyse long-term trends in the number of students and graduates in Poland, with particular emphasis on engineering faculties and the participation of women. Authors using dispersion and structural similarity measures and dynamic models showed that the total number of students and graduates and the number of students in engineering studies are characterised by different patterns. At the same time, in both cases a different structure of total students and engineers by gender was observed, as well as a growing share of women.


Author(s):  
Brenda Reyes Ayala ◽  
Jennifer McDevitt ◽  
James Sun ◽  
Xiaohui Liu

Since the practice of web archiving, or the act of preserving websites as historical, legal, and informational records, become more commonplace in the 2000s, web archives have become valuable sources for historical research. Unfortunately, many archived websites are of low quality and are missing crucial elements. In this paper, we examine the issue of quality and focus on visual correspondence, the similarity in appearance between the original website and its archived counterpart. We examine how the visual correspondence of an archived website can be measured using image similarity measures. Our results indicate that the Structural Similarity Index metric (SSIM) was able to successfully measure visual correspondence. If applied to the Quality Assurance process of an institution, this similarity metric could help web archivists quickly detect quality problems in their web archives, and fix them in order to create high-quality web archives. Depuis que la pratique de l'archivage Web, ou l'acte de préserver les sites Web en tant que documents historiques, juridiques et informatifs, est devenue plus courante dans les années 2000, les archives Web sont devenues des sources précieuses pour la recherche historique. Malheureusement, de nombreux sites Web archivés sont de mauvaise qualité et manquent d'éléments cruciaux. Dans cet article, nous examinons la question de la qualité et nous nous concentrons sur la correspondance visuelle, la similitude d'apparence entre le site Web d'origine et son homologue archivé. Nous examinons comment la correspondance visuelle d'un site Web archivé peut être mesurée à l'aide de mesures de similitude d'image. Nos résultats indiquent que la Structural Similarity Index metric (SSIM) a pu mesurer avec succès la correspondance visuelle. S'il est appliqué au processus d'assurance qualité d'une institution, cette indicateur de similitude pourrait aider les archivistes Web à détecter rapidement les problèmes de qualité dans leurs archives Web et à les résoudre afin de créer des archives Web de haute qualité.


2021 ◽  
Vol 13 (1) ◽  
Author(s):  
Florian Huber ◽  
Sven van der Burg ◽  
Justin J. J. van der Hooft ◽  
Lars Ridder

AbstractMass spectrometry data is one of the key sources of information in many workflows in medicine and across the life sciences. Mass fragmentation spectra are generally considered to be characteristic signatures of the chemical compound they originate from, yet the chemical structure itself usually cannot be easily deduced from the spectrum. Often, spectral similarity measures are used as a proxy for structural similarity but this approach is strongly limited by a generally poor correlation between both metrics. Here, we propose MS2DeepScore: a novel Siamese neural network to predict the structural similarity between two chemical structures solely based on their MS/MS fragmentation spectra. Using a cleaned dataset of > 100,000 mass spectra of about 15,000 unique known compounds, we trained MS2DeepScore to predict structural similarity scores for spectrum pairs with high accuracy. In addition, sampling different model varieties through Monte-Carlo Dropout is used to further improve the predictions and assess the model’s prediction uncertainty. On 3600 spectra of 500 unseen compounds, MS2DeepScore is able to identify highly-reliable structural matches and to predict Tanimoto scores for pairs of molecules based on their fragment spectra with a root mean squared error of about 0.15. Furthermore, the prediction uncertainty estimate can be used to select a subset of predictions with a root mean squared error of about 0.1. Furthermore, we demonstrate that MS2DeepScore outperforms classical spectral similarity measures in retrieving chemically related compound pairs from large mass spectral datasets, thereby illustrating its potential for spectral library matching. Finally, MS2DeepScore can also be used to create chemically meaningful mass spectral embeddings that could be used to cluster large numbers of spectra. Added to the recently introduced unsupervised Spec2Vec metric, we believe that machine learning-supported mass spectral similarity measures have great potential for a range of metabolomics data processing pipelines.


2014 ◽  
Vol 30 (3) ◽  
pp. 297-323 ◽  
Author(s):  
Maciej Piernik ◽  
Dariusz Brzezinski ◽  
Tadeusz Morzy ◽  
Anna Lesniewska

AbstractWith its presence in data integration, chemistry, biological, and geographic systems, eXtensible Markup Language (XML) has become an important standard not only in computer science. A common problem among the mentioned applications involves structural clustering of XML documents—an issue that has been thoroughly studied and led to the creation of a myriad of approaches. In this paper, we present a comprehensive review of structural XML clustering. First, we provide a basic introduction to the problem and highlight the main challenges in this research area. Subsequently, we divide the problem into three subtasks and discuss the most common document representations, structural similarity measures, and clustering algorithms. In addition, we present the most popular evaluation measures, which can be used to estimate clustering quality. Finally, we analyze and compare 23 state-of-the-art approaches and arrange them in an original taxonomy. By providing an up-to-date analysis of existing structural XML clustering algorithms, we hope to showcase methods suitable for current applications and draw lines of future research.


Sign in / Sign up

Export Citation Format

Share Document