Introducing a Rule Importance Measure

Author(s):  
Jiye Li ◽  
Nick Cercone
Keyword(s):  
2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Sofia Kapsiani ◽  
Brendan J. Howlin

AbstractAgeing is a major risk factor for many conditions including cancer, cardiovascular and neurodegenerative diseases. Pharmaceutical interventions that slow down ageing and delay the onset of age-related diseases are a growing research area. The aim of this study was to build a machine learning model based on the data of the DrugAge database to predict whether a chemical compound will extend the lifespan of Caenorhabditis elegans. Five predictive models were built using the random forest algorithm with molecular fingerprints and/or molecular descriptors as features. The best performing classifier, built using molecular descriptors, achieved an area under the curve score (AUC) of 0.815 for classifying the compounds in the test set. The features of the model were ranked using the Gini importance measure of the random forest algorithm. The top 30 features included descriptors related to atom and bond counts, topological and partial charge properties. The model was applied to predict the class of compounds in an external database, consisting of 1738 small-molecules. The chemical compounds of the screening database with a predictive probability of ≥ 0.80 for increasing the lifespan of Caenorhabditis elegans were broadly separated into (1) flavonoids, (2) fatty acids and conjugates, and (3) organooxygen compounds.


2019 ◽  
Vol 35 (19) ◽  
pp. 3663-3671 ◽  
Author(s):  
Stephan Seifert ◽  
Sven Gundlach ◽  
Silke Szymczak

Abstract Motivation It has been shown that the machine learning approach random forest can be successfully applied to omics data, such as gene expression data, for classification or regression and to select variables that are important for prediction. However, the complex relationships between predictor variables, in particular between causal predictor variables, make the interpretation of currently applied variable selection techniques difficult. Results Here we propose a new variable selection approach called surrogate minimal depth (SMD) that incorporates surrogate variables into the concept of minimal depth (MD) variable importance. Applying SMD, we show that simulated correlation patterns can be reconstructed and that the increased consideration of variable relationships improves variable selection. When compared with existing state-of-the-art methods and MD, SMD has higher empirical power to identify causal variables while the resulting variable lists are equally stable. In conclusion, SMD is a promising approach to get more insight into the complex interplay of predictor variables and outcome in a high-dimensional data setting. Availability and implementation https://github.com/StephanSeifert/SurrogateMinimalDepth. Supplementary information Supplementary data are available at Bioinformatics online.


2013 ◽  
Vol 842 ◽  
pp. 746-749
Author(s):  
Bo Yang ◽  
Liang Zhang

A novel sparse weighted LSSVM classifier is proposed in this paper, which is based on Suykens weighted LSSVM. Unlike Suykens weighted LSSVM, the proposed weighted method is more suitable for classification. The distance between sample and classification border is used as the sample importance measure in our weighted method. Based on this importance measure, a new weight calculating function, using which can adjust the sparseness of weight, is designed. In order to solve the imbalance problem, a kind of normalization weights calculating method is proposed. Finally, the proposed method is used on digit recognition. Comparative experiment results show that the proposed sparse weighted LSSVM can improve the recognition correct rate effectively.


2010 ◽  
Vol 42 (02) ◽  
pp. 577-604 ◽  
Author(s):  
Yana Volkovich ◽  
Nelly Litvak

PageRank with personalization is used in Web search as an importance measure for Web documents. The goal of this paper is to characterize the tail behavior of the PageRank distribution in the Web and other complex networks characterized by power laws. To this end, we model the PageRank as a solution of a stochastic equationwhere theRis are distributed asR. This equation is inspired by the original definition of the PageRank. In particular,Nmodels the number of incoming links to a page, andBstays for the user preference. Assuming thatNorBare heavy tailed, we employ the theory of regular variation to obtain the asymptotic behavior ofRunder quite general assumptions on the involved random variables. Our theoretical predictions show good agreement with experimental data.


Author(s):  
Tang Zhangchun ◽  
Lu Zhenzhou ◽  
Pan Wang ◽  
Zhang Feng

Based on the entropy of the uncertain variable, a novel importance measure is proposed to identify the effect of the uncertain variables on the system, which is subjected to the combination of random variables and fuzzy variables. For the system with the mixture of random variables and fuzzy variables, the membership function of the failure probability can be obtained by the uncertainty propagation theory first. And then the effect of each input variable on the output response of the system can be evaluated by measuring the shift between entropies of two membership functions of the failure probability, obtained before and after the uncertainty elimination of the input variable. The intersecting effect of the multiple input variables can be calculated by the similar measure. The mathematical properties of the proposed global sensitivity indicators are investigated and proved in detail. A simple example is first employed to demonstrate the procedure of solving the proposed global sensitivity indicators and then the influential variables of four practical applications are identified by the proposed global sensitivity indicators.


Sign in / Sign up

Export Citation Format

Share Document