Introducing a Rule Importance Measure

AbstractAgeing is a major risk factor for many conditions including cancer, cardiovascular and neurodegenerative diseases. Pharmaceutical interventions that slow down ageing and delay the onset of age-related diseases are a growing research area. The aim of this study was to build a machine learning model based on the data of the DrugAge database to predict whether a chemical compound will extend the lifespan of Caenorhabditis elegans. Five predictive models were built using the random forest algorithm with molecular fingerprints and/or molecular descriptors as features. The best performing classifier, built using molecular descriptors, achieved an area under the curve score (AUC) of 0.815 for classifying the compounds in the test set. The features of the model were ranked using the Gini importance measure of the random forest algorithm. The top 30 features included descriptors related to atom and bond counts, topological and partial charge properties. The model was applied to predict the class of compounds in an external database, consisting of 1738 small-molecules. The chemical compounds of the screening database with a predictive probability of ≥ 0.80 for increasing the lifespan of Caenorhabditis elegans were broadly separated into (1) flavonoids, (2) fatty acids and conjugates, and (3) organooxygen compounds.

Download Full-text

Surrogate minimal depth as an importance measure for variables in random forests

Bioinformatics ◽

10.1093/bioinformatics/btz149 ◽

2019 ◽

Vol 35 (19) ◽

pp. 3663-3671 ◽

Cited By ~ 6

Author(s):

Stephan Seifert ◽

Sven Gundlach ◽

Silke Szymczak

Keyword(s):

Variable Selection ◽

Supplementary Information ◽

Predictor Variables ◽

Importance Measure ◽

Surrogate Variables ◽

Machine Learning Approach ◽

Complex Relationships ◽

Minimal Depth ◽

Insight Into ◽

Causal Variables

Abstract Motivation It has been shown that the machine learning approach random forest can be successfully applied to omics data, such as gene expression data, for classification or regression and to select variables that are important for prediction. However, the complex relationships between predictor variables, in particular between causal predictor variables, make the interpretation of currently applied variable selection techniques difficult. Results Here we propose a new variable selection approach called surrogate minimal depth (SMD) that incorporates surrogate variables into the concept of minimal depth (MD) variable importance. Applying SMD, we show that simulated correlation patterns can be reconstructed and that the increased consideration of variable relationships improves variable selection. When compared with existing state-of-the-art methods and MD, SMD has higher empirical power to identify causal variables while the resulting variable lists are equally stable. In conclusion, SMD is a promising approach to get more insight into the complex interplay of predictor variables and outcome in a high-dimensional data setting. Availability and implementation https://github.com/StephanSeifert/SurrogateMinimalDepth. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

A Novel Sparse Weighted Least Squares Support Vector Classifier

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.842.746 ◽

2013 ◽

Vol 842 ◽

pp. 746-749

Author(s):

Bo Yang ◽

Liang Zhang

Keyword(s):

Least Squares ◽

Weighted Least Squares ◽

Importance Measure ◽

Support Vector ◽

Comparative Experiment ◽

Digit Recognition ◽

Calculating Method ◽

Imbalance Problem ◽

Support Vector Classifier

A novel sparse weighted LSSVM classifier is proposed in this paper, which is based on Suykens weighted LSSVM. Unlike Suykens weighted LSSVM, the proposed weighted method is more suitable for classification. The distance between sample and classification border is used as the sample importance measure in our weighted method. Based on this importance measure, a new weight calculating function, using which can adjust the sparseness of weight, is designed. In order to solve the imbalance problem, a kind of normalization weights calculating method is proposed. Finally, the proposed method is used on digit recognition. Comparative experiment results show that the proposed sparse weighted LSSVM can improve the recognition correct rate effectively.

Download Full-text

Development and evaluation of an uncertainty importance measure in fault tree analysis

Reliability Engineering & System Safety ◽

10.1016/s0951-8320(97)00024-0 ◽

1997 ◽

Vol 57 (2) ◽

pp. 143-157 ◽

Cited By ~ 10

Author(s):

Jae-Gyeun Cho ◽

Bong-Jin Yum

Keyword(s):

Fault Tree ◽

Fault Tree Analysis ◽

Importance Measure ◽

Tree Analysis ◽

Uncertainty Importance

Download Full-text

A Hellinger-Based Importance Measure of Association Rules for Classification Learning

International Journal of Intelligent Systems ◽

10.1002/int.21664 ◽

2014 ◽

Vol 29 (9) ◽

pp. 807-822 ◽

Cited By ~ 1

Author(s):

Chang-Hwan Lee

Keyword(s):

Association Rules ◽

Importance Measure ◽

Classification Learning ◽

Measure Of Association

Download Full-text

Moment-independent importance measure of correlated input variable and its state dependent parameter solution

Aerospace Science and Technology ◽

10.1016/j.ast.2015.11.019 ◽

2016 ◽

Vol 48 ◽

pp. 281-290 ◽

Cited By ~ 7

Author(s):

Luyi Li ◽

Zhenzhou Lu ◽

Chao Chen

Keyword(s):

Importance Measure ◽

State Dependent ◽

Dependent Parameter

Download Full-text

A Moment Independent Based Importance Measure with Hybrid Uncertainty

Communications in Computer and Information Science - Modeling, Design and Simulation of Systems ◽

10.1007/978-981-10-6463-0_19 ◽

2017 ◽

pp. 213-224

Author(s):

Xiaobing Shang ◽

Tao Chao ◽

Ping Ma

Keyword(s):

Importance Measure ◽

Hybrid Uncertainty

Download Full-text

Asymptotic analysis for personalized Web search

Advances in Applied Probability ◽

10.1017/s0001867800004201 ◽

2010 ◽

Vol 42 (02) ◽

pp. 577-604 ◽

Cited By ~ 13

Author(s):

Yana Volkovich ◽

Nelly Litvak

Keyword(s):

Web Search ◽

Power Laws ◽

User Preference ◽

Importance Measure ◽

Web Documents ◽

Original Definition ◽

Theoretical Predictions ◽

Heavy Tailed ◽

Definition Of ◽

Good Agreement

PageRank with personalization is used in Web search as an importance measure for Web documents. The goal of this paper is to characterize the tail behavior of the PageRank distribution in the Web and other complex networks characterized by power laws. To this end, we model the PageRank as a solution of a stochastic equationwhere theRis are distributed asR. This equation is inspired by the original definition of the PageRank. In particular,Nmodels the number of incoming links to a page, andBstays for the user preference. Assuming thatNorBare heavy tailed, we employ the theory of regular variation to obtain the asymptotic behavior ofRunder quite general assumptions on the involved random variables. Our theoretical predictions show good agreement with experimental data.

Download Full-text

An entropy-based global sensitivity analysis for the structures with both fuzzy variables and random variables

Proceedings of the Institution of Mechanical Engineers Part C Journal of Mechanical Engineering Science ◽

10.1177/0954406212448575 ◽

2012 ◽

Vol 227 (2) ◽

pp. 195-212 ◽

Cited By ~ 5

Author(s):

Tang Zhangchun ◽

Lu Zhenzhou ◽

Pan Wang ◽

Zhang Feng

Keyword(s):

Failure Probability ◽

Uncertainty Propagation ◽

Random Variables ◽

Uncertain Variable ◽

Importance Measure ◽

Similar Measure ◽

Practical Applications ◽

Fuzzy Variables ◽

Global Sensitivity ◽

Uncertain Variables

Based on the entropy of the uncertain variable, a novel importance measure is proposed to identify the effect of the uncertain variables on the system, which is subjected to the combination of random variables and fuzzy variables. For the system with the mixture of random variables and fuzzy variables, the membership function of the failure probability can be obtained by the uncertainty propagation theory first. And then the effect of each input variable on the output response of the system can be evaluated by measuring the shift between entropies of two membership functions of the failure probability, obtained before and after the uncertainty elimination of the input variable. The intersecting effect of the multiple input variables can be calculated by the similar measure. The mathematical properties of the proposed global sensitivity indicators are investigated and proved in detail. A simple example is first employed to demonstrate the procedure of solving the proposed global sensitivity indicators and then the influential variables of four practical applications are identified by the proposed global sensitivity indicators.

Download Full-text