similarity scores
Recently Published Documents


TOTAL DOCUMENTS

82
(FIVE YEARS 18)

H-INDEX

15
(FIVE YEARS 0)

Metabolites ◽  
2022 ◽  
Vol 12 (1) ◽  
pp. 68
Author(s):  
Jesi Lee ◽  
Tobias Kind ◽  
Dean Joseph Tantillo ◽  
Lee-Ping Wang ◽  
Oliver Fiehn

Mass spectrometry is the most commonly used method for compound annotation in metabolomics. However, most mass spectra in untargeted assays cannot be annotated with specific compound structures because reference mass spectral libraries are far smaller than the complement of known molecules. Theoretically predicted mass spectra might be used as a substitute for experimental spectra especially for compounds that are not commercially available. For example, the Quantum Chemistry Electron Ionization Mass Spectra (QCEIMS) method can predict 70 eV electron ionization mass spectra from any given input molecular structure. In this work, we investigated the accuracy of QCEIMS predictions of electron ionization (EI) mass spectra for 80 purine and pyrimidine derivatives in comparison to experimental data in the NIST 17 database. Similarity scores between every pair of predicted and experimental spectra revealed that 45% of the compounds were found as the correct top hit when QCEIMS predicted spectra were matched against the NIST17 library of >267,000 EI spectra, and 74% of the compounds were found within the top 10 hits. We then investigated the impact of matching, missing, and additional fragment ions in predicted EI mass spectra versus ion abundances in MS similarity scores. We further include detailed studies of fragmentation pathways such as retro Diels–Alder reactions to predict neutral losses of (iso)cyanic acid, hydrogen cyanide, or cyanamide in the mass spectra of purines and pyrimidines. We describe how trends in prediction accuracy correlate with the chemistry of the input compounds to better understand how mechanisms of QCEIMS predictions could be improved in future developments. We conclude that QCEIMS is useful for generating large-scale predicted mass spectral libraries for identification of compounds that are absent from experimental libraries and that are not commercially available.


2021 ◽  
Author(s):  
Guru Nagaraj ◽  
Prashanth Pillai ◽  
Mandar Kulkarni

Abstract Over the years, well test analysis or pressure transient analysis (PTA) methods have progressed from straight lines via type curve analysis to pressure derivatives and deconvolution methods. Today, analysis of the log-log (pressure and its derivative) response is the most used method for PTA. Although these methods are widely available through commercial software, they are not fully automated, and human interaction is needed for their application. Furthermore, PTA is described as an inverse problem, whose solution in general is non-unique, and several models (well, reservoir and boundary) can be found applicable to similar pressure-derivative response. This tends to always bring about confusion in choosing the correct model using the conventional approach. This results in multiple iterations that are time consuming and requires constant human interaction. Our approach automates the process of PTA using a Siamese neural network (SNN) architecture comprised of Convolutional neural network (CNN) and Long Short-Term Memory (LSTM) layers. The SNN model is trained on simulated experimental data created using a design of experiments (DOE) approach involving most common 14 interpretation scenarios across well, reservoir, and boundary model types. Across each model type, parameters such as permeability, horizontal well length, skin factor, and distance to the boundary were sampled to compute 560 different pressure derivative responses. SNN is trained using a self-supervised training strategy where the positive and negative pairs are generated from the training data. We use transformations such as compression and expansion to generate positive pairs and negative pairs for the well test model responses. For a given well test model response, similarity scores are computed against the candidates in each model class, and the best match from each class is identified. These matches are then ranked according to the similarity scores to identify optimal candidates. Experimental analysis indicated that the true model class frequently appeared among the top ranked classes. The model achieves an accuracy of 93% for the top one model recommendations when tested on 70 samples from the 14 interpretation scenarios. Prior information on the top ranked probable well test models, significantly reduces the manual effort involved in the analysis. This machine learning (ML) approach can be integrated with any PTA software or function as a standalone application in the interpreter's system. Current work using SNN with LSTM layers can be used to speed up the process of detecting the pressure derivative response explained by a certain combination of well, reservoir and boundary models and produce models with less user interaction. This methodology will facilitate the interpretation engineer in making the model recognition faster for detailed integration with additional information from sources such as geophysics, geology, petrophysics, drilling, and production logging.


Author(s):  
Mourad Fariss ◽  
Naoufal El Allali ◽  
Hakima Asaidi ◽  
Mohamed Bellouki

Web service (WS) discovery is an essential task for implementing complex applications in a service oriented architecture (SOA), such as selecting, composing, and providing services. This task is limited semantically in the incorporation of the customer’s request and the web services. Furthermore, applying suitable similarity methods for the increasing number of WSs is more relevant for efficient web service discovery. To overcome these limitations, we propose a new approach for web service discovery integrating multiple similarity measures and k-means clustering. The approach enables more accurate services appropriate to the customer's request by calculating different similarity scores between the customer's request and the web services. The global semantic similarity is determined by applying k-means clustering using the obtained similarity scores. The experimental results demonstrated that the proposed semantic web service discovery approach outperforms the state-of-the approaches in terms of precision (98%), recall (95%), and F-measure (96%). The proposed approach is efficiently designed to support and facilitate the selection and composition of web services phases in complex applications.


i-Perception ◽  
2021 ◽  
Vol 12 (6) ◽  
pp. 204166952110587
Author(s):  
Zhaoyi Li ◽  
Xiaofang Lei ◽  
Xinze Yan ◽  
Zhiguo Hu ◽  
Hongyan Liu

The present study aims to explore the influence of masculine/feminine changes on the attractiveness evaluation of one's own face, and examine the relationship of this attractiveness evaluation and the similarities between masculine/feminine faces and original faces. A picture was taken from each participant and considered as his or her original self-face, and a male or female face with an average attractiveness score was adopted as the original other face. Masculinized and feminized transformations of the original faces (self-face, male other face, and female other face) into 100% masculine and feminine faces were produced with morphing software stepping by 2%. Thirty female participants and 30 male participants were asked to complete three tasks, i.e., to “like” or “not like” the original face judgment of a given face compared to the original face, to choose the most attractive face from a morphed facial clip, and to subjectively evaluate the attractiveness and similarity of morphed faces. The results revealed that the acceptable range of masculine/feminine transformation for self-faces was narrower than that for other faces. Furthermore, the attractiveness ratings for masculinized or femininized self-faces were correlated with the similarity scores of the faces with the original self-faces. These findings suggested that attractiveness enhancement of self-face through masculinity/femininity must be within reasonable extent and take into account the similarity between the modified faces and the original self-face.


2021 ◽  
Vol 12 ◽  
Author(s):  
Yuhua Yao ◽  
Binbin Ji ◽  
Yaping Lv ◽  
Ling Li ◽  
Ju Xiang ◽  
...  

Studies have found that long non-coding RNAs (lncRNAs) play important roles in many human biological processes, and it is critical to explore potential lncRNA–disease associations, especially cancer-associated lncRNAs. However, traditional biological experiments are costly and time-consuming, so it is of great significance to develop effective computational models. We developed a random walk algorithm with restart on multiplex and heterogeneous networks of lncRNAs and diseases to predict lncRNA–disease associations (MHRWRLDA). First, multiple disease similarity networks are constructed by using different approaches to calculate similarity scores between diseases, and multiple lncRNA similarity networks are also constructed by using different approaches to calculate similarity scores between lncRNAs. Then, a multiplex and heterogeneous network was constructed by integrating multiple disease similarity networks and multiple lncRNA similarity networks with the lncRNA–disease associations, and a random walk with restart on the multiplex and heterogeneous network was performed to predict lncRNA–disease associations. The results of Leave-One-Out cross-validation (LOOCV) showed that the value of Area under the curve (AUC) was 0.68736, which was improved compared with the classical algorithm in recent years. Finally, we confirmed a few novel predicted lncRNAs associated with specific diseases like colon cancer by literature mining. In summary, MHRWRLDA contributes to predict lncRNA–disease associations.


2021 ◽  
Author(s):  
Rahul Sharan Renu ◽  
Gregory Mocko

Abstract Many manufacturing enterprises have large collections of solid models and text-based assembly processes to support assembly operations. These data are often distributed across their extended enterprise. As these enterprises expand globally, there is often an increase in product and process variability which can often lead to challenges with training, quality control, and obstacles with change management to name a few. Thus, there is a desire to increase the consistency of assembly work instructions within and across assembly locations. The objective of this research is to retrieve existing 3d models of components and assemblies and their associated assembly work instructions. This is accomplished using 3d solid model similarity and text mining of assembly work instructions. Initially, a design study was conducted in which participants authored assembly work instructions for several different solid model assemblies. Next, a geometric similarity algorithm was used to compute similarity scores between solid models and latent semantic analysis is used to compute the similarity between text-based assembly work instructions. Finally, a correlation study between solid model-assembly instruction tuples is computed. A moderately strong positive correlation was found to exist between solid model similarity scores and their associated assembly instruction similarity scores. This indicates that designs with a similar shape have a similar assembly process and thus can serve as the basis for authoring new assembly processes. This aids in resolving differences in existing processes by linking three-dimensional solid models and their associated assembly work instructions.


Author(s):  
Katie A. Wilson ◽  
Burkely T. Gallo ◽  
Patrick Skinner ◽  
Adam Clark ◽  
Pamela Heinselman ◽  
...  

AbstractConvection-allowing model ensemble guidance, such as that provided by the Warn-on-Forecast System (WoFS), is designed to provide predictions of individual thunderstorm hazards within the next 0–6 h. The WoFS web viewer provides a large suite of storm and environmental attribute products, but the applicability of these products to the National Weather Service forecast process has not been objectively documented. Therefore, this study describes an experimental forecasting task designed to investigate what WoFS products forecasters accessed and how they accessed them for a total of 26 cases (comprised of 13 weather events, each worked by two forecasters). Analysis of web access log data revealed that in all 26 cases, product accesses were dominated in the reflectivity, rotation, hail, and surface wind categories. However, the number of different product types viewed and the number of transitions between products varied in each case. Therefore, the Levenshtein (Edit Distance) method was used to compute similarity scores across all 26 cases, which helped identify what it meant for relatively similar vs. dissimilar navigation of WoFS products. The Spearman’s Rank correlation coefficient (R) results found that forecasters working the same weather event had higher similarity scores for events that produced more tornado reports and for events in which forecasters had higher performance scores. The findings from this study will influence subsequent efforts for further improving WoFS products and developing an efficient and effective user interface for operational applications.


2021 ◽  
Vol 7 (7) ◽  
pp. 116
Author(s):  
Pasquale Ferrara ◽  
Rudolf Haraksim ◽  
Laurent Beslay

Performance evaluation of source camera attribution methods typically stop at the level of analysis of hard to interpret similarity scores. Standard analytic tools include Detection Error Trade-off or Receiver Operating Characteristic curves, or other scalar performance metrics, such as Equal Error Rate or error rates at a specific decision threshold. However, the main drawback of similarity scores is their lack of probabilistic interpretation and thereby their lack of usability in forensic investigation, when assisting the trier of fact to make more sound and more informed decisions. The main objective of this work is to demonstrate a transition from the similarity scores to likelihood ratios in the scope of digital evidence evaluation, which not only have probabilistic meaning, but can be immediately incorporated into the forensic casework and combined with the rest of the case-related forensic. Likelihood ratios are calculated from the Photo Response Non-Uniformity source attribution similarity scores. The experiments conducted aim to compare different strategies applied to both digital images and videos, by considering their respective peculiarities. The results are presented in a format compatible with the guideline for validation of forensic likelihood ratio methods.


2021 ◽  
Vol 21 (S2) ◽  
Author(s):  
Ni Wang ◽  
Yanqun Huang ◽  
Honglei Liu ◽  
Zhiqiang Zhang ◽  
Lan Wei ◽  
...  

Abstract Background A new learning-based patient similarity measurement was proposed to measure patients’ similarity for heterogeneous electronic medical records (EMRs) data. Methods We first calculated feature-level similarities according to the features’ attributes. A domain expert provided patient similarity scores of 30 randomly selected patients. These similarity scores and feature-level similarities for 30 patients comprised the labeled sample set, which was used for the semi-supervised learning algorithm to learn the patient-level similarities for all patients. Then we used the k-nearest neighbor (kNN) classifier to predict four liver conditions. The predictive performances were compared in four different situations. We also compared the performances between personalized kNN models and other machine learning models. We assessed the predictive performances by the area under the receiver operating characteristic curve (AUC), F1-score, and cross-entropy (CE) loss. Results As the size of the random training samples increased, the kNN models using the learned patient similarity to select near neighbors consistently outperformed those using the Euclidean distance to select near neighbors (all P values < 0.001). The kNN models using the learned patient similarity to identify the top k nearest neighbors from the random training samples also had a higher best-performance (AUC: 0.95 vs. 0.89, F1-score: 0.84 vs. 0.67, and CE loss: 1.22 vs. 1.82) than those using the Euclidean distance. As the size of the similar training samples increased, which composed the most similar samples determined by the learned patient similarity, the performance of kNN models using the simple Euclidean distance to select the near neighbors degraded gradually. When exchanging the role of the Euclidean distance, and the learned patient similarity in selecting the near neighbors and similar training samples, the performance of the kNN models gradually increased. These two kinds of kNN models had the same best-performance of AUC 0.95, F1-score 0.84, and CE loss 1.22. Among the four reference models, the highest AUC and F1-score were 0.94 and 0.80, separately, which were both lower than those for the simple and similarity-based kNN models. Conclusions This learning-based method opened an opportunity for similarity measurement based on heterogeneous EMR data and supported the secondary use of EMR data.


Sign in / Sign up

Export Citation Format

Share Document