Consistent gene signature of schizophrenia identified by a novel feature selection strategy from comprehensive sets of transcriptomic data

Qingxia Yang; Bo Li; Jing Tang; Xuejiao Cui; Yunxia Wang; Xiaofeng Li; Jie Hu; Yuzong Chen; Weiwei Xue; Yan Lou; Yunqing Qiu; Feng Zhu

doi:10.1093/bib/bbz049

Consistent gene signature of schizophrenia identified by a novel feature selection strategy from comprehensive sets of transcriptomic data

Briefings in Bioinformatics ◽

10.1093/bib/bbz049 ◽

2019 ◽

Vol 21 (3) ◽

pp. 1058-1068 ◽

Cited By ~ 16

Author(s):

Qingxia Yang ◽

Bo Li ◽

Jing Tang ◽

Xuejiao Cui ◽

Yunxia Wang ◽

...

Keyword(s):

Feature Selection ◽

Current Medical Research ◽

Gene Signature ◽

Selection Strategy ◽

Selection Methods ◽

Transcriptomic Data ◽

New Strategy ◽

Consensus Scoring ◽

Novel Strategy ◽

Gene Rank

Abstract The etiology of schizophrenia (SCZ) is regarded as one of the most fundamental puzzles in current medical research, and its diagnosis is limited by the lack of objective molecular criteria. Although plenty of studies were conducted, SCZ gene signatures identified by these independent studies are found highly inconsistent. As one of the most important factors contributing to this inconsistency, the feature selection methods used currently do not fully consider the reproducibility among the signatures discovered from different datasets. Therefore, it is crucial to develop new bioinformatics tools of novel strategy for ensuring a stable discovery of gene signature for SCZ. In this study, a novel feature selection strategy (1) integrating repeated random sampling with consensus scoring and (2) evaluating the consistency of gene rank among different datasets was constructed. By systematically assessing the identified SCZ signature comprising 135 differentially expressed genes, this newly constructed strategy demonstrated significantly enhanced stability and better differentiating ability compared with the feature selection methods popular in current SCZ research. Based on a first-ever assessment on methods’ reproducibility cross-validated by independent datasets from three representative studies, the new strategy stood out among the popular methods by showing superior stability and differentiating ability. Finally, 2 novel and 17 previously reported transcription factors were identified and showed great potential in revealing the etiology of SCZ. In sum, the SCZ signature identified in this study would provide valuable clues for discovering diagnostic molecules and potential targets for SCZ.

Download Full-text

Improving Urban Land Cover Mapping with the Fusion of Optical and SAR Data Based on Feature Selection Strategy

Photogrammetric Engineering & Remote Sensing ◽

10.14358/pers.21-00030r2 ◽

2022 ◽

Vol 88 (1) ◽

pp. 17-28

Author(s):

Qing Ding ◽

Zhenfeng Shao ◽

Xiao Huang ◽

Orhan Altan ◽

Yewen Fan

Keyword(s):

Feature Selection ◽

Land Cover ◽

Urban Land ◽

Model Complexity ◽

Selection Strategy ◽

Support Vector ◽

Land Cover Mapping ◽

Selection Methods ◽

Urban Land Cover ◽

Sar Data

Taking the Futian District as the research area, this study proposed an effective urban land cover mapping framework fusing optical and SAR data. To simplify the model complexity and improve the mapping results, various feature selection methods were compared and evaluated. The results showed that feature selection can eliminate irrelevant features, increase the mean correlation between features slightly, and improve the classification accuracy and computational efficiency significantly. The recursive feature elimination-support vector machine (RFE-SVM) model obtained the best results, with an overall accuracy of 89.17% and a kappa coefficient of 0.8695, respectively. In addition, this study proved that the fusion of optical and SAR data can effectively improve mapping and reduce the confusion between different land covers. The novelty of this study is with the insight into the merits of multi-source data fusion and feature selection in the land cover mapping process over complex urban environments, and to evaluate the performance differences between different feature selection methods.

Download Full-text

Sentiment Analysis of Movie Reviews: A Study of Machine Learning Algorithms with Various Feature Selection Methods

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v5i9.113121 ◽

2017 ◽

Vol 5 (9) ◽

Cited By ~ 1

Author(s):

Rajwinder Kaur

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Sentiment Analysis ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Selection Methods

Download Full-text

The Effectiveness of the Fused Weighted Filter Feature Selection Method to Improve Software Fault Prediction

Journal of Communications Technology Electronics and Computer Science ◽

10.22385/jctecs.v8i0.96 ◽

2016 ◽

Vol 8 ◽

pp. 5 ◽

Cited By ~ 1

Author(s):

Fatemeh Alighardashi ◽

Mohammad Ali Zare Chahooki

Keyword(s):

Feature Selection ◽

Feature Selection Method ◽

Selection Method ◽

Machine Learning Algorithms ◽

Fault Prediction ◽

Filter Method ◽

Selection Methods ◽

Software Projects ◽

Software Fault Prediction ◽

Software Fault

Improving the software product quality before releasing by periodic tests is one of the most expensive activities in software projects. Due to limited resources to modules test in software projects, it is important to identify fault-prone modules and use the test sources for fault prediction in these modules. Software fault predictors based on machine learning algorithms, are effective tools for identifying fault-prone modules. Extensive studies are being done in this field to find the connection between features of software modules, and their fault-prone. Some of features in predictive algorithms are ineffective and reduce the accuracy of prediction process. So, feature selection methods to increase performance of prediction models in fault-prone modules are widely used. In this study, we proposed a feature selection method for effective selection of features, by using combination of filter feature selection methods. In the proposed filter method, the combination of several filter feature selection methods presented as fused weighed filter method. Then, the proposed method caused convergence rate of feature selection as well as the accuracy improvement. The obtained results on NASA and PROMISE with ten datasets, indicates the effectiveness of proposed method in improvement of accuracy and convergence of software fault prediction.

Download Full-text

Feature Selection Methods in Sentiment Analysis

Proceedings of the 3rd International Conference on Networking, Information Systems & Security ◽

10.1145/3386723.3387840 ◽

2020 ◽

Author(s):

Nurilhami Izzatie Khairi ◽

Azlinah Mohamed ◽

Nor Nadiah Yusof

Keyword(s):

Feature Selection ◽

Sentiment Analysis ◽

Selection Methods

Download Full-text

A Unified View of Causal and Non-causal Feature Selection

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3436891 ◽

2021 ◽

Vol 15 (4) ◽

pp. 1-46

Author(s):

Kui Yu ◽

Lin Liu ◽

Jiuyong Li

Keyword(s):

Feature Selection ◽

Bayesian Network ◽

Synthetic Data ◽

Selection Methods ◽

Bayesian Network Model ◽

Real World Data ◽

Feature Sets ◽

Unified View ◽

Optimal Feature ◽

Different Levels

In this article, we aim to develop a unified view of causal and non-causal feature selection methods. The unified view will fill in the gap in the research of the relation between the two types of methods. Based on the Bayesian network framework and information theory, we first show that causal and non-causal feature selection methods share the same objective. That is to find the Markov blanket of a class attribute, the theoretically optimal feature set for classification. We then examine the assumptions made by causal and non-causal feature selection methods when searching for the optimal feature set, and unify the assumptions by mapping them to the restrictions on the structure of the Bayesian network model of the studied problem. We further analyze in detail how the structural assumptions lead to the different levels of approximations employed by the methods in their search, which then result in the approximations in the feature sets found by the methods with respect to the optimal feature set. With the unified view, we can interpret the output of non-causal methods from a causal perspective and derive the error bounds of both types of methods. Finally, we present practical understanding of the relation between causal and non-causal methods using extensive experiments with synthetic data and various types of real-world data.

Download Full-text

A fuzzy gaussian rank aggregation ensemble feature selection method for microarray data

International Journal of Knowledge-based and Intelligent Engineering Systems ◽

10.3233/kes-190134 ◽

2021 ◽

Vol 24 (4) ◽

pp. 289-301

Author(s):

B. Venkatesh ◽

J. Anuradha

Keyword(s):

Feature Selection ◽

Microarray Data ◽

Classification Accuracy ◽

Performance Metrics ◽

Feature Selection Method ◽

Selection Method ◽

Support Vector ◽

Svm Classifier ◽

Binary Particle Swarm Optimization ◽

Selection Methods

In Microarray Data, it is complicated to achieve more classification accuracy due to the presence of high dimensions, irrelevant and noisy data. And also It had more gene expression data and fewer samples. To increase the classification accuracy and the processing speed of the model, an optimal number of features need to extract, this can be achieved by applying the feature selection method. In this paper, we propose a hybrid ensemble feature selection method. The proposed method has two phases, filter and wrapper phase in filter phase ensemble technique is used for aggregating the feature ranks of the Relief, minimum redundancy Maximum Relevance (mRMR), and Feature Correlation (FC) filter feature selection methods. This paper uses the Fuzzy Gaussian membership function ordering for aggregating the ranks. In wrapper phase, Improved Binary Particle Swarm Optimization (IBPSO) is used for selecting the optimal features, and the RBF Kernel-based Support Vector Machine (SVM) classifier is used as an evaluator. The performance of the proposed model are compared with state of art feature selection methods using five benchmark datasets. For evaluation various performance metrics such as Accuracy, Recall, Precision, and F1-Score are used. Furthermore, the experimental results show that the performance of the proposed method outperforms the other feature selection methods.

Download Full-text

Lung adenocarcinoma and lung squamous cell carcinoma cancer classification, biomarker identification, and gene expression analysis using overlapping feature selection methods

Scientific Reports ◽

10.1038/s41598-021-92725-8 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Joe W. Chen ◽

Joseph Dhahbi

Keyword(s):

Gene Expression ◽

Feature Selection ◽

Expression Analysis ◽

Gene Expression Analysis ◽

Biomarker Discovery ◽

Lung Squamous Cell Carcinoma ◽

Cancer Classification ◽

Selection Methods ◽

Biomarker Identification ◽

Overlapping Method

AbstractLung cancer is one of the deadliest cancers in the world. Two of the most common subtypes, lung adenocarcinoma (LUAD) and lung squamous cell carcinoma (LUSC), have drastically different biological signatures, yet they are often treated similarly and classified together as non-small cell lung cancer (NSCLC). LUAD and LUSC biomarkers are scarce, and their distinct biological mechanisms have yet to be elucidated. To detect biologically relevant markers, many studies have attempted to improve traditional machine learning algorithms or develop novel algorithms for biomarker discovery. However, few have used overlapping machine learning or feature selection methods for cancer classification, biomarker identification, or gene expression analysis. This study proposes to use overlapping traditional feature selection or feature reduction techniques for cancer classification and biomarker discovery. The genes selected by the overlapping method were then verified using random forest. The classification statistics of the overlapping method were compared to those of the traditional feature selection methods. The identified biomarkers were validated in an external dataset using AUC and ROC analysis. Gene expression analysis was then performed to further investigate biological differences between LUAD and LUSC. Overall, our method achieved classification results comparable to, if not better than, the traditional algorithms. It also identified multiple known biomarkers, and five potentially novel biomarkers with high discriminating values between LUAD and LUSC. Many of the biomarkers also exhibit significant prognostic potential, particularly in LUAD. Our study also unraveled distinct biological pathways between LUAD and LUSC.

Download Full-text