scholarly journals Selecting Genes by Test Statistics

2005 ◽  
Vol 2005 (2) ◽  
pp. 132-138 ◽  
Author(s):  
Dechang Chen ◽  
Zhenqiu Liu ◽  
Xiaobin Ma ◽  
Dong Hua

Gene selection is an important issue in analyzing multiclass microarray data. Among many proposed selection methods, the traditional ANOVA F test statistic has been employed to identify informative genes for both class prediction (classification) and discovery problems. However, the F test statistic assumes an equal variance. This assumption may not be realistic for gene expression data. This paper explores other alternative test statistics which can handle heterogeneity of the variances. We study five such test statistics, which include Brown-Forsythe test statistic and Welch test statistic. Their performance is evaluated and compared with that of F statistic over different classification methods applied to publicly available microarray datasets.

Author(s):  
Samarendra Das ◽  
Shesh N. Rai

Selection of biologically relevant genes from high dimensional expression data is a key research problem in gene expression genomics. Most of the available gene selection methods are either based on relevancy or redundancy measure, which are usually adjudged through post selection classification accuracy. Through these methods the ranking of genes was done on a single high-dimensional expression data, which leads to the selection of spuriously associated and redundant genes. Hence, we developed a statistical approach through combining Support Vector Machine with Maximum Relevance and Minimum Redundancy under a sound statistical setup for the selection of biologically relevant genes. Here, the genes are selected through statistical significance values computed using a non-parametric test statistic under a bootstrap based subject sampling model. Further, a systematic and rigorous evaluation of the proposed approach with nine existing competitive methods was carried on six different real crop gene expression datasets. This performance analysis was carried out under three comparison settings, i.e. subject classification, biological relevant criteria based on quantitative trait loci, and gene ontology. Our analytical results showed that the proposed approach selects genes that are more biologically relevant as compared to the existing methods. Moreover, the proposed approach was also found to be better with respect to the competitive existing methods. The proposed statistical approach provides a framework for combining filter, and wrapper methods of gene selection.


Entropy ◽  
2020 ◽  
Vol 22 (11) ◽  
pp. 1205
Author(s):  
Samarendra Das ◽  
Shesh N. Rai

Selection of biologically relevant genes from high-dimensional expression data is a key research problem in gene expression genomics. Most of the available gene selection methods are either based on relevancy or redundancy measure, which are usually adjudged through post selection classification accuracy. Through these methods the ranking of genes was conducted on a single high-dimensional expression data, which led to the selection of spuriously associated and redundant genes. Hence, we developed a statistical approach through combining a support vector machine with Maximum Relevance and Minimum Redundancy under a sound statistical setup for the selection of biologically relevant genes. Here, the genes were selected through statistical significance values and computed using a nonparametric test statistic under a bootstrap-based subject sampling model. Further, a systematic and rigorous evaluation of the proposed approach with nine existing competitive methods was carried on six different real crop gene expression datasets. This performance analysis was carried out under three comparison settings, i.e., subject classification, biological relevant criteria based on quantitative trait loci and gene ontology. Our analytical results showed that the proposed approach selects genes which are more biologically relevant as compared to the existing methods. Moreover, the proposed approach was also found to be better with respect to the competitive existing methods. The proposed statistical approach provides a framework for combining filter and wrapper methods of gene selection.


Author(s):  
Anna L Tyler ◽  
Baha El Kassaby ◽  
Georgi Kolishovski ◽  
Jake Emerson ◽  
Ann E Wells ◽  
...  

Abstract It is well understood that variation in relatedness among individuals, or kinship, can lead to false genetic associations. Multiple methods have been developed to adjust for kinship while maintaining power to detect true associations. However, relatively unstudied, are the effects of kinship on genetic interaction test statistics. Here we performed a survey of kinship effects on studies of six commonly used mouse populations. We measured inflation of main effect test statistics, genetic interaction test statistics, and interaction test statistics reparametrized by the Combined Analysis of Pleiotropy and Epistasis (CAPE). We also performed linear mixed model (LMM) kinship corrections using two types of kinship matrix: an overall kinship matrix calculated from the full set of genotyped markers, and a reduced kinship matrix, which left out markers on the chromosome(s) being tested. We found that test statistic inflation varied across populations and was driven largely by linkage disequilibrium. In contrast, there was no observable inflation in the genetic interaction test statistics. CAPE statistics were inflated at a level in between that of the main effects and the interaction effects. The overall kinship matrix overcorrected the inflation of main effect statistics relative to the reduced kinship matrix. The two types of kinship matrices had similar effects on the interaction statistics and CAPE statistics, although the overall kinship matrix trended toward a more severe correction. In conclusion, we recommend using a LMM kinship correction for both main effects and genetic interactions and further recommend that the kinship matrix be calculated from a reduced set of markers in which the chromosomes being tested are omitted from the calculation. This is particularly important in populations with substantial population structure, such as recombinant inbred lines in which genomic replicates are used.


2018 ◽  
Vol 14 (6) ◽  
pp. 868-880 ◽  
Author(s):  
Shilan S. Hameed ◽  
Fahmi F. Muhammad ◽  
Rohayanti Hassan ◽  
Faisal Saeed

Author(s):  
Lingtao Kong

The exponential distribution has been widely used in engineering, social and biological sciences. In this paper, we propose a new goodness-of-fit test for fuzzy exponentiality using α-pessimistic value. The test statistics is established based on Kullback-Leibler information. By using Monte Carlo method, we obtain the empirical critical points of the test statistic at four different significant levels. To evaluate the performance of the proposed test, we compare it with four commonly used tests through some simulations. Experimental studies show that the proposed test has higher power than other tests in most cases. In particular, for the uniform and linear failure rate alternatives, our method has the best performance. A real data example is investigated to show the application of our test.


2018 ◽  
Vol 8 (9) ◽  
pp. 1569 ◽  
Author(s):  
Shengbing Wu ◽  
Hongkun Jiang ◽  
Haiwei Shen ◽  
Ziyi Yang

In recent years, gene selection for cancer classification based on the expression of a small number of gene biomarkers has been the subject of much research in genetics and molecular biology. The successful identification of gene biomarkers will help in the classification of different types of cancer and improve the prediction accuracy. Recently, regularized logistic regression using the L 1 regularization has been successfully applied in high-dimensional cancer classification to tackle both the estimation of gene coefficients and the simultaneous performance of gene selection. However, the L 1 has a biased gene selection and dose not have the oracle property. To address these problems, we investigate L 1 / 2 regularized logistic regression for gene selection in cancer classification. Experimental results on three DNA microarray datasets demonstrate that our proposed method outperforms other commonly used sparse methods ( L 1 and L E N ) in terms of classification performance.


2021 ◽  
Author(s):  
Ronald J Yurko ◽  
Kathryn Roeder ◽  
Bernie Devlin ◽  
Max G'Sell

In genome-wide association studies (GWAS), it has become commonplace to test millions of SNPs for phenotypic association. Gene-based testing can improve power to detect weak signal by reducing multiple testing and pooling signal strength. While such tests account for linkage disequilibrium (LD) structure of SNP alleles within each gene, current approaches do not capture LD of SNPs falling in different nearby genes, which can induce correlation of gene-based test statistics. We introduce an algorithm to account for this correlation. When a gene's test statistic is independent of others, it is assessed separately; when test statistics for nearby genes are strongly correlated, their SNPs are agglomerated and tested as a locus. To provide insight into SNPs and genes driving association within loci, we develop an interactive visualization tool to explore localized signal. We demonstrate our approach in the context of weakly powered GWAS for autism spectrum disorder, which is contrasted to more highly powered GWAS for schizophrenia and educational attainment. To increase power for these analyses, especially those for autism, we use adaptive p-value thresholding (AdaPT), guided by high-dimensional metadata modeled with gradient boosted trees, highlighting when and how it can be most useful. Notably our workflow is based on summary statistics.


2015 ◽  
Vol 16 (1) ◽  
Author(s):  
Putri W. Novianti ◽  
Victor L. Jong ◽  
Kit C. B. Roes ◽  
Marinus J. C. Eijkemans

2021 ◽  
Vol 20 (2) ◽  
pp. 51-60
Author(s):  
A.O. Abidoye ◽  
W.A. Lamidi ◽  
M.O. Alabi ◽  
J. Popoola

In this paper, we are interested in comparing the conventional t –test with the proposed t – test for testing equality of means with unequal and equal variances. Here, we proposed harmonic mean of variances as an alternative to the pooled sample variance when there is heterogeneity of variances. Two sets of secondary data were obtained from Agricultural Development Project (KWADP) and the Ministry of Agriculture in Ilorin, Kwara State to demonstrate the two test statistics used and the results show that the proposed t – test statistic is found to be appropriate than the conventional t – test statistic when we have unequal variances but the conventional t – test perform better when we have equal variances.


Sign in / Sign up

Export Citation Format

Share Document