Learning and Feature Selection Using the Set Covering Machine with Data-Dependent Rays on Gene Expression Profiles

A Robust Gene selection Method for Microarray-based Cancer Classification

Cancer Informatics ◽

10.4137/cin.s3794 ◽

2010 ◽

Vol 9 ◽

pp. CIN.S3794 ◽

Cited By ~ 21

Author(s):

Xiaosheng Wang ◽

Osamu Gotoh

Keyword(s):

Gene Expression ◽

Feature Selection ◽

Gene Selection ◽

Information Gain ◽

Expression Profiles ◽

Feature Selection Method ◽

Gene Expression Profiles ◽

Molecular Classification ◽

Selection Method ◽

Chi Square

Gene selection is of vital importance in molecular classification of cancer using high-dimensional gene expression data. Because of the distinct characteristics inherent to specific cancerous gene expression profiles, developing flexible and robust feature selection methods is extremely crucial. We investigated the properties of one feature selection approach proposed in our previous work, which was the generalization of the feature selection method based on the depended degree of attribute in rough sets. We compared the feature selection method with the established methods: the depended degree, chi-square, information gain, Relief-F and symmetric uncertainty, and analyzed its properties through a series of classification experiments. The results revealed that our method was superior to the canonical depended degree of attribute based method in robustness and applicability. Moreover, the method was comparable to the other four commonly used methods. More importantly, the method can exhibit the inherent classification difficulty with respect to different gene expression datasets, indicating the inherent biology of specific cancers.

Download Full-text

Improved Feature Selection by Incorporating Gene Similarity into the LASSO

International Journal of Knowledge Discovery in Bioinformatics ◽

10.4018/jkdb.2012010101 ◽

2012 ◽

Vol 3 (1) ◽

pp. 1-22 ◽

Cited By ~ 1

Author(s):

Christopher E. Gillies ◽

Xiaoli Gao ◽

Nilesh V. Patel ◽

Mohammad-Reza Siadat ◽

George D. Wilson

Keyword(s):

Gene Expression ◽

Feature Selection ◽

Personalized Medicine ◽

Objective Function ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Genetic Profile ◽

Data Set ◽

Coordinate Descent Algorithm ◽

Gene Similarity

Personalized medicine is customizing treatments to a patient’s genetic profile and has the potential to revolutionize medical practice. An important process used in personalized medicine is gene expression profiling. Analyzing gene expression profiles is difficult, because there are usually few patients and thousands of genes, leading to the curse of dimensionality. To combat this problem, researchers suggest using prior knowledge to enhance feature selection for supervised learning algorithms. The authors propose an enhancement to the LASSO, a shrinkage and selection technique that induces parameter sparsity by penalizing a model’s objective function. Their enhancement gives preference to the selection of genes that are involved in similar biological processes. The authors’ modified LASSO selects similar genes by penalizing interaction terms between genes. They devise a coordinate descent algorithm to minimize the corresponding objective function. To evaluate their method, the authors created simulation data where they compared their model to the standard LASSO model and an interaction LASSO model. The authors’ model outperformed both the standard and interaction LASSO models in terms of detecting important genes and gene interactions for a reasonable number of training samples. They also demonstrated the performance of their method on a real gene expression data set from lung cancer cell lines.

Download Full-text

Kidney transplant classification with gene expression profiles using L1 feature selection ensemble classifier based on data clustering

2017 International Conference on Advanced Computer Science and Information Systems (ICACSIS) ◽

10.1109/icacsis.2017.8355040 ◽

2017 ◽

Author(s):

M Octaviano Pratama

Keyword(s):

Gene Expression ◽

Feature Selection ◽

Kidney Transplant ◽

Data Clustering ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Ensemble Classifier

Download Full-text

Correlation-based Gene Selection and Classification Using Taguchi-BPSO

Methods of Information in Medicine ◽

10.3414/me09-01-0010 ◽

2010 ◽

Vol 49 (03) ◽

pp. 254-268 ◽

Cited By ~ 10

Author(s):

C.-S. Yang ◽

K.-C. Wu ◽

C.-H. Yang ◽

L.-Y. Chuang

Keyword(s):

Gene Expression ◽

Feature Selection ◽

Microarray Data ◽

Error Rate ◽

Gene Expression Analysis ◽

Gene Selection ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Classification Error ◽

Classification Error Rate

Summary Background: Microarray data with reference to gene expression profiles have provided some valuable results related to a variety of problems, and contributed to advances in clinical medicine. Microarray data characteristically have a high dimension and small sample size, which makes it difficult for a general classification method to obtain correct data for classification. However, not every gene is potentially relevant for distinguishing the sample class. Thus, in order to analyze gene expression profiles correctly, feature (gene) selection is crucial for the classification process, and an effective gene extraction method is necessary for eliminating irrelevant genes and decreasing the classification error rate. Objective: The purpose of gene expression analysis is to discriminate between classes of samples, and to predict the relative importance of each gene for sample classification. Method: In this paper, correlation-based feature selection (CFS) and Taguchi-binary particle swarm optimization (TBPSO) were combined into a hybrid method, and the K-nearest neighbor (K-NN) with leave-one-out cross-validation (LOOCV) method served as a classifier for ten gene expression profiles. Results: Experimental results show that this hybrid method effectively simplifies feature selection by reducing the number of features needed. The classification error rate obtained by the proposed method had the lowest classification error rate for all of the ten gene expression data set problems tested. For six of the gene expression profile data sets a classification error rate of zero could be reached. Conclusion: The introduced method outperformed five other methods from the literature in terms of classification error rate. It could thus constitute a valuable tool for gene expression analysis in future studies.

Download Full-text

A review of feature selection techniques via gene expression profiles

2008 International Symposium on Information Technology ◽

10.1109/itsim.2008.4631678 ◽

2008 ◽

Cited By ~ 10

Author(s):

Farzana Kabir Ahmad ◽

Norita Md. Norwawi ◽

Safaai Deris ◽

Nor Hayati Othman

Keyword(s):

Gene Expression ◽

Feature Selection ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Feature Selection Techniques

Download Full-text

Chaotic Harmony Search based Multi-objective Feature Selection for Classification of Gene Expression Profiles

2021 IEEE 9th International Conference on Bioinformatics and Computational Biology (ICBCB) ◽

10.1109/icbcb52223.2021.9459222 ◽

2021 ◽

Author(s):

Aiguo Wang ◽

Huancheng Liu ◽

Guilin Chen

Keyword(s):

Gene Expression ◽

Feature Selection ◽

Expression Profiles ◽

Harmony Search ◽

Gene Expression Profiles ◽

Multi Objective ◽

Selection For

Download Full-text

Side effect prediction based on drug-induced gene expression profiles and random forest with iterative feature selection

The Pharmacogenomics Journal ◽

10.1038/s41397-021-00246-4 ◽

2021 ◽

Author(s):

Arzu Cakir ◽

Melisa Tuncer ◽

Hilal Taymaz-Nikerel ◽

Ozlem Ulucan

Keyword(s):

Gene Expression ◽

Feature Selection ◽

Random Forest ◽

Side Effect ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Drug Induced

Download Full-text

Machine Learning Techniques and Chi-Square Feature Selection for Cancer Classification Using SAGE Gene Expression Profiles

Lecture Notes in Computer Science - Data Mining for Biomedical Applications ◽

10.1007/11691730_11 ◽

2006 ◽

pp. 106-115 ◽

Cited By ~ 58

Author(s):

Xin Jin ◽

Anbang Xu ◽

Rongfang Bie ◽

Ping Guo

Keyword(s):

Gene Expression ◽

Machine Learning ◽

Feature Selection ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Cancer Classification ◽

Machine Learning Techniques ◽

Chi Square ◽

Learning Techniques ◽

Selection For

Download Full-text

Identification of Pan-Cancer Biomarkers Based on the Gene Expression Profiles of Cancer Cell Lines

Frontiers in Cell and Developmental Biology ◽

10.3389/fcell.2021.781285 ◽

2021 ◽

Vol 9 ◽

Author(s):

ShiJian Ding ◽

Hao Li ◽

Yu-Hang Zhang ◽

XianChao Zhou ◽

KaiYan Feng ◽

...

Keyword(s):

Gene Expression ◽

Feature Selection ◽

Cell Line ◽

Cancer Patients ◽

Cell Lines ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Cancer Biomarkers ◽

Cancer Types ◽

Pan Cancer

There are many types of cancers. Although they share some hallmarks, such as proliferation and metastasis, they are still very different from many perspectives. They grow on different organ or tissues. Does each cancer have a unique gene expression pattern that makes it different from other cancer types? After the Cancer Genome Atlas (TCGA) project, there are more and more pan-cancer studies. Researchers want to get robust gene expression signature from pan-cancer patients. But there is large variance in cancer patients due to heterogeneity. To get robust results, the sample size will be too large to recruit. In this study, we tried another approach to get robust pan-cancer biomarkers by using the cell line data to reduce the variance. We applied several advanced computational methods to analyze the Cancer Cell Line Encyclopedia (CCLE) gene expression profiles which included 988 cell lines from 20 cancer types. Two feature selection methods, including Boruta, and max-relevance and min-redundancy methods, were applied to the cell line gene expression data one by one, generating a feature list. Such list was fed into incremental feature selection method, incorporating one classification algorithm, to extract biomarkers, construct optimal classifiers and decision rules. The optimal classifiers provided good performance, which can be useful tools to identify cell lines from different cancer types, whereas the biomarkers (e.g. NCKAP1, TNFRSF12A, LAMB2, FKBP9, PFN2, TOM1L1) and rules identified in this work may provide a meaningful and precise reference for differentiating multiple types of cancer and contribute to the personalized treatment of tumors.

Download Full-text

Identification of the Gene Expression Rules That Define the Subtypes in Glioma

Journal of Clinical Medicine ◽

10.3390/jcm7100350 ◽

2018 ◽

Vol 7 (10) ◽

pp. 350 ◽

Cited By ~ 24

Author(s):

Yu-Dong Cai ◽

Shiqi Zhang ◽

Yu-Hang Zhang ◽

Xiaoyong Pan ◽

KaiYan Feng ◽

...

Keyword(s):

Gene Expression ◽

Feature Selection ◽

Single Cell ◽

Anaplastic Astrocytoma ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Diffuse Astrocytoma ◽

Feature List ◽

Cell Gene Expression ◽

Cell Gene

As a common brain cancer derived from glial cells, gliomas have three subtypes: glioblastoma, diffuse astrocytoma, and anaplastic astrocytoma. The subtypes have distinctive clinical features but are closely related to each other. A glioblastoma can be derived from the early stage of diffuse astrocytoma, which can be transformed into anaplastic astrocytoma. Due to the complexity of these dynamic processes, single-cell gene expression profiles are extremely helpful to understand what defines these subtypes. We analyzed the single-cell gene expression profiles of 5057 cells of anaplastic astrocytoma tissues, 261 cells of diffuse astrocytoma tissues, and 1023 cells of glioblastoma tissues with advanced machine learning methods. In detail, a powerful feature selection method, Monte Carlo feature selection (MCFS) method, was adopted to analyze the gene expression profiles of cells, resulting in a feature list. Then, the incremental feature selection (IFS) method was applied to the obtained feature list, with the help of support vector machine (SVM), to extract key features (genes) and construct an optimal SVM classifier. Several key biomarker genes, such as IGFBP2, IGF2BP3, PRDX1, NOV, NEFL, HOXA10, GNG12, SPRY4, and BCL11A, were identified. In addition, the underlying rules of classifying the three subtypes were produced by Johnson reducer algorithm. We found that in diffuse astrocytoma, PRDX1 is highly expressed, and in glioblastoma, the expression level of PRDX1 is low. These rules revealed the difference among the three subtypes, and how they are formed and transformed. These genes are not only biomarkers for glioma subtypes, but also drug targets that may switch the clinical features or even reverse the tumor progression.

Download Full-text