Comparison of Gene Expression Programming and Common Metamodeling Techniques in Engineering Design

Volume 5: 37th Design Automation Conference, Parts A and B ◽

10.1115/detc2011-47130 ◽

2011 ◽

Cited By ~ 1

Author(s):

Mi Xiao ◽

Liang Gao ◽

Xinyu Shao ◽

Haobo Qiu ◽

Li Nie

Keyword(s):

Gene Expression ◽

Engineering Design ◽

Small Sample Size ◽

Gene Expression Programming ◽

Small Sample ◽

Mathematical Functions ◽

Approximation Accuracy ◽

Design And Optimization ◽

Metamodeling Technique ◽

Approximation Models

To reduce the tremendous computational expense of implementing complex simulation and analysis in engineering design, more and more researchers pay attention to the construction of approximation models. The approximation models, also called surrogate models and metamodels, can be utilized to replace simulation and analysis codes for design and optimization. Commonly used metamodeling techniques include response surface methodology (RSM), kriging and radial basis functions (RBF). In this paper, gene expression programming (GEP) algorithm in evolutionary computing is investigated as an alternative technique for approximation. The performance of GEP is examined by its innovative applications to the approximation of mathematical functions and engineering analyses. Compared to RSM, kriging and RBF, GEP is demonstrated to be more accurate for the small sample size. For large sample sets, GEP also shows good approximation accuracy. Additionally, GEP has the best transparency since it can provide explicit and compact function relationships and clear factor contributions. Overall, as a novel metamodeling technique, GEP exhibits great capabilities to provide the accurate approximation of a design space and will have wide applications in engineering design, especially when only a few sample points are selected for approximation.

Download Full-text

An Integrated Feature Selection Algorithm for Cancer Classification using Gene Expression Data

Combinatorial Chemistry & High Throughput Screening ◽

10.2174/1386207322666181220124756 ◽

2019 ◽

Vol 21 (9) ◽

pp. 631-645 ◽

Cited By ~ 5

Author(s):

Saeed Ahmed ◽

Muhammad Kabir ◽

Zakir Ali ◽

Muhammad Arif ◽

Farman Ali ◽

...

Keyword(s):

Gene Expression ◽

Feature Selection ◽

Gene Expression Data ◽

Classification Accuracy ◽

Early Stage ◽

Small Sample Size ◽

Feature Selection Method ◽

Small Sample ◽

Expression Data ◽

Base Function

Aim and Objective: Cancer is a dangerous disease worldwide, caused by somatic mutations in the genome. Diagnosis of this deadly disease at an early stage is exceptionally new clinical application of microarray data. In DNA microarray technology, gene expression data have a high dimension with small sample size. Therefore, the development of efficient and robust feature selection methods is indispensable that identify a small set of genes to achieve better classification performance. Materials and Methods: In this study, we developed a hybrid feature selection method that integrates correlation-based feature selection (CFS) and Multi-Objective Evolutionary Algorithm (MOEA) approaches which select the highly informative genes. The hybrid model with Redial base function neural network (RBFNN) classifier has been evaluated on 11 benchmark gene expression datasets by employing a 10-fold cross-validation test. Results: The experimental results are compared with seven conventional-based feature selection and other methods in the literature, which shows that our approach owned the obvious merits in the aspect of classification accuracy ratio and some genes selected by extensive comparing with other methods. Conclusion: Our proposed CFS-MOEA algorithm attained up to 100% classification accuracy for six out of eleven datasets with a minimal sized predictive gene subset.

Download Full-text

Abstract W P73: Feasibility and Preliminary Results of Whole Blood RNA-Sequencing Analysis in Patients With Intracranial Aneurysms

Stroke ◽

10.1161/str.45.suppl_1.wp73 ◽

2014 ◽

Vol 45 (suppl_1) ◽

Author(s):

Blake Haas ◽

Nestor R Gonzalez ◽

Elina Nikkola ◽

Mark Connolly ◽

William Hsu ◽

...

Keyword(s):

Gene Expression ◽

Network Analysis ◽

Rna Sequencing ◽

Whole Blood ◽

Intracranial Aneurysms ◽

Small Sample Size ◽

Expression Patterns ◽

Small Sample ◽

Cellular Respiration ◽

Sequencing Analysis

Introduction: Intracranial aneurysms (IA) growth and rupture have been associated with chronic remodeling of the arterial wall. However, the pathobiology of this process remains poorly understood. The objective of the present study was to evaluate the feasibility of analyzing gene expression patterns in peripheral blood of patients with ruptured and unruptured saccular IAs. Materials and Methods: We analyzed human whole blood transcriptomes by performing paired-end, 100 bp RNA-sequencing (RNAseq) using the Illumina platform. We used STAR to align reads to the genome, HTSeq to count reads, and DESeq to normalize counts across samples. Self-reported patient information was used to correct expression values for ancestry, age, and sex. We utilized weighted gene co-expression network analysis (WGCNA) to identify gene expression network modules associated with IA size and rupture. The DAVID tool was employed to search for Gene Ontology enrichment in relevant modules. Results: Samples from 12 patients (9 females, age 57.6 +/-12) with IAs were analyzed. Four had ruptured aneurysms. RNA isolation and application of the methodology described above was successful in all samples. Although the small sample size prevents us from drawing definite conclusions, we observed promising novel co-expression networks for IAs: WCGNA analysis showed down-regulation of two transcript modules associated with ruptured IA status (r=-0.78, p=0.008 and r=-0.77, p=0.009), and up-regulation of two modules associated with aneurysm size (r=0.86, p=0.002 and r=0.9, p=4e-04), respectively. DAVID analyses showed that genes upregulated in an IA size-associated module were enriched with genes involved in cellular respiration and translation, while genes involved in transcription were down-regulated in a module associated with ruptured IAs. Conclusions: Whole blood RNAseq analysis is a feasible tool to capture transcriptome dynamics and achieve a better understanding of the pathophysiology of IAs. Further longitudinal studies of patients with IAs using network analysis are justified.

Download Full-text

COMBINING GENERALIZED NMF AND DISCRIMINATIVE MIXTURE MODELS FOR CLASSIFICATION OF GENE EXPRESSION DATA

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001408006892 ◽

2008 ◽

Vol 22 (08) ◽

pp. 1587-1598 ◽

Cited By ~ 3

Author(s):

WEIXIANG LIU ◽

KEHONG YUAN ◽

JIAN WU ◽

DATIAN YE ◽

ZHEN JI ◽

...

Keyword(s):

Gene Expression ◽

Mixture Model ◽

Gene Expression Data ◽

Small Sample Size ◽

Data Classification ◽

Small Sample ◽

Training Data ◽

Microarray Data Analysis ◽

Expression Data

Classification of gene expression samples is a core task in microarray data analysis. How to reduce thousands of genes and to select a suitable classifier are two key issues for gene expression data classification. This paper introduces a framework on combining both feature extraction and classifier simultaneously. Considering the non-negativity, high dimensionality and small sample size, we apply a discriminative mixture model which is designed for non-negative gene express data classification via non-negative matrix factorization (NMF) for dimension reduction. In order to enhance the sparseness of training data for fast learning of the mixture model, a generalized NMF is also adopted. Experimental results on several real gene expression datasets show that the classification accuracy, stability and decision quality can be significantly improved by using the generalized method, and the proposed method can give better performance than some previous reported results on the same datasets.

Download Full-text

Analysis of gene expression programming for approximation in engineering design

Structural and Multidisciplinary Optimization ◽

10.1007/s00158-012-0767-7 ◽

2012 ◽

Vol 46 (3) ◽

pp. 399-413 ◽

Cited By ~ 14

Author(s):

Liang Gao ◽

Mi Xiao ◽

Xinyu Shao ◽

Ping Jiang ◽

Li Nie ◽

...

Keyword(s):

Gene Expression ◽

Engineering Design ◽

Gene Expression Programming

Download Full-text

Identification of Gene Signatures Used to Recognize Biological Characteristics of Gastric Cancer upon Gene Expression Data

Biomarker Insights ◽

10.4137/bmi.s13059 ◽

2014 ◽

Vol 9 ◽

pp. BMI.S13059 ◽

Cited By ~ 3

Author(s):

Zhi Yan ◽

Brian T. Luke ◽

Shirley X. Tsang ◽

Rui Xing ◽

Yuanming Pan ◽

...

Keyword(s):

Gene Expression ◽

Gastric Cancer ◽

Information Gain ◽

Small Sample Size ◽

High Sensitivity ◽

Small Sample ◽

Machine Learning Algorithms ◽

Biological Characteristics ◽

Gene Signatures ◽

Mining Model

High-throughput gene expression microarrays can be examined by machine-learning algorithms to identify gene signatures that recognize the biological characteristics of specific human diseases, including cancer, with high sensitivity and specificity. A previous study compared 20 gastric cancer (GC) samples against 20 normal tissue (NT) samples and identified 1,519 differentially expressed genes (DEGs). In this study, Classification Information Index (CII), Information Gain Index (IGI), and RELIEF algorithms are used to mine the previously reported gene expression profiling data. In all, 29 of these genes are identified by all three algorithms and are treated as GC candidate biomarkers. Three biomarkers, COL1A2, ATP4B, and HADHSC, are selected and further examined using quantitative real-time polymerase chain reaction (qRT-PCR) and immunohistochemistry (IHC) staining in two independent sets of GC and normal adjacent tissue (NAT) samples. Our study shows that COL1A2 and HADHSC are the two best biomarkers from the microarray data, distinguishing all GC from the NT, whereas ATP4B is diagnostically significant in lab tests because of its wider range of fold-changes in expression. Herein, a data-mining model applicable for small sample sizes is presented and discussed. Our result suggested that this mining model may be useful in small sample-size studies to identify putative biomarkers and potential biological features of GC.

Download Full-text

IMPROVED PARAMETER ESTIMATION FOR VARIANCE-STABILIZING TRANSFORMATION OF GENE-EXPRESSION MICROARRAY DATA

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720004000806 ◽

2004 ◽

Vol 02 (04) ◽

pp. 669-679 ◽

Cited By ~ 3

Author(s):

MASATO INOUE ◽

SHIN-ICHI NISHIMURA ◽

GEN HORI ◽

HIROYUKI NAKAHARA ◽

MICHIKO SAITO ◽

...

Keyword(s):

Gene Expression ◽

Parameter Estimation ◽

Small Sample Size ◽

Estimation Method ◽

Small Sample ◽

Gene Expression Microarray ◽

Expression Microarray ◽

Gene Expression Microarray Data ◽

Log Normal ◽

Poor Management

A gene-expression microarray datum is modeled as an exponential expression signal (log-normal distribution) and additive noise. Variance-stabilizing transformation based on this model is useful for improving the uniformity of variance, which is often assumed for conventional statistical analysis methods. However, the existing method of estimating transformation parameters may not be perfect because of poor management of outliers. By employing an information normalization technique, we have developed an improved parameter estimation method, which enables statistically more straightforward outlier exclusion and works well even in the case of small sample size. Validation of this method with experimental data has suggested that it is superior to the conventional method.

Download Full-text

A Cascade Flexible Neural Forest Model for Cancer Subtypes Classification on Gene Expression Data

Computational Intelligence and Neuroscience ◽

10.1155/2021/6480456 ◽

2021 ◽

Vol 2021 ◽

pp. 1-11

Author(s):

Lianxin Zhong ◽

Qingfang Meng ◽

Yuehui Chen

Keyword(s):

Gene Expression ◽

Sample Size ◽

Gene Expression Data ◽

Small Sample Size ◽

Small Sample ◽

Expression Data ◽

Cancer Subtypes ◽

Subtype Classification ◽

Cancer Subtype

The correct classification of cancer subtypes is of great significance for the in-depth study of cancer pathogenesis and the realization of accurate treatment for cancer patients. In recent years, the classification of cancer subtypes using deep neural networks and gene expression data has become a hot topic. However, most classifiers may face the challenges of overfitting and low classification accuracy when dealing with small sample size and high-dimensional biological data. In this paper, the Cascade Flexible Neural Forest (CFNForest) Model was proposed to accomplish cancer subtype classification. CFNForest extended the traditional flexible neural tree structure to FNT Group Forest exploiting a bagging ensemble strategy and could automatically generate the model’s structure and parameters. In order to deepen the FNT Group Forest without introducing new hyperparameters, the multilayer cascade framework was exploited to design the FNT Group Forest model, which transformed features between levels and improved the performance of the model. The proposed CFNForest model also improved the operational efficiency and the robustness of the model by sample selection mechanism between layers and setting different weights for the output of each layer. To accomplish cancer subtype classification, FNT Group Forest with different feature sets was used to enrich the structural diversity of the model, which make it more suitable for processing small sample size datasets. The experiments on RNA-seq gene expression data showed that CFNForest effectively improves the accuracy of cancer subtype classification. The classification results have good robustness.

Download Full-text

A laminar augmented cascading flexible neural forest model for classification of cancer subtypes based on gene expression data

BMC Bioinformatics ◽

10.1186/s12859-021-04391-2 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Lianxin Zhong ◽

Qingfang Meng ◽

Yuehui Chen ◽

Lei Du ◽

Peng Wu

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Small Sample Size ◽

Small Sample ◽

Expression Data ◽

Cancer Subtypes ◽

Cancer Pathogenesis ◽

Depth Study ◽

And Function

Abstract Background Correctly classifying the subtypes of cancer is of great significance for the in-depth study of cancer pathogenesis and the realization of personalized treatment for cancer patients. In recent years, classification of cancer subtypes using deep neural networks and gene expression data has gradually become a research hotspot. However, most classifiers may face overfitting and low classification accuracy when dealing with small sample size and high-dimensional biology data. Results In this paper, a laminar augmented cascading flexible neural forest (LACFNForest) model was proposed to complete the classification of cancer subtypes. This model is a cascading flexible neural forest using deep flexible neural forest (DFNForest) as the base classifier. A hierarchical broadening ensemble method was proposed, which ensures the robustness of classification results and avoids the waste of model structure and function as much as possible. We also introduced an output judgment mechanism to each layer of the forest to reduce the computational complexity of the model. The deep neural forest was extended to the densely connected deep neural forest to improve the prediction results. The experiments on RNA-seq gene expression data showed that LACFNForest has better performance in the classification of cancer subtypes compared to the conventional methods. Conclusion The LACFNForest model effectively improves the accuracy of cancer subtype classification with good robustness. It provides a new approach for the ensemble learning of classifiers in terms of structural design.

Download Full-text

Quantitative nuclease protection assay (qNPA) for gene expression analysis on breast cancer core biopsies.

Journal of Clinical Oncology ◽

10.1200/jco.2011.29.27_suppl.49 ◽

2011 ◽

Vol 29 (27_suppl) ◽

pp. 49-49 ◽

Cited By ~ 1

Author(s):

M. C. Evangelist ◽

J. Snider ◽

J. Krushkal ◽

Y. Qu ◽

A. Kulkarni ◽

...

Keyword(s):

Breast Cancer ◽

Gene Expression ◽

Small Sample Size ◽

Small Sample ◽

Ffpe Tissue ◽

Cancer Tissue ◽

Breast Cancer Patients ◽

Fresh Tissue ◽

Interacting Protein ◽

Oncogenic Pathways

49 Background: qNPA employs in situ hybridization of detection probes to cross linked mRNA, making it ideal for formalin fixed paraffin embedded (FFPE) tissue. It has been shown to measure gene expression in archived lymphoid and lung cancer tissue. We assessed the feasibility of qNPA to measure differentially expressed genes in pretreatment FFPE core breast biopsies among pathologic responders (pR) and nonresponders (pNR) to preoperative chemotherapy. Methods: We included preoperative breast cancer patients treated at our institution from 2003-09 with FFPE core biopsies. mRNA expression of 170 genes, representing oncogenic pathways or associated with anthracycline and taxane response was measured by qNPA (HTG, Tucson, AZ). Data was normalized to 3 housekeeper genes and average of 3 biologic replicates reported. Seven genes below detection in > 50% samples were excluded. Expression values of 163 unique genes were analyzed for pR vs pNR with dChip software. Empirical FDR was estimated using 1,000 permutations of sample labels. Results: Treatment and response: Sample failure: 6/57 (10%). pR vs pNR did not separate on hierarchical clustering. FLJ12650 and IGFBP2 showed lower expression in pR vs pNR with fold changes of 4.09 and 2.40, respectively (p < 0.01; median FDR: 1/163). FLJ12650 was significant (p < 0.01, median FDR: 0/163) when patients receiving anthracycline ± taxane were analyzed (groups 1-3) and showed a trend (p < 0.05) in group 1 alone. Conclusions: qNPA for limited available FFPE tissue from core biopsies is feasible with acceptable sample failure rates. Small sample size and number of analyzed genes limited definitive conclusion about informative genes in our study. Our FLJ12650 results, a gene coding for membrane Na+/K+ ATPase interacting protein, are consistent with previous findings of overexpression in pNR to anthracyclines + taxanes (Hess et al, JCO 2006, Vol 24; 4236) using fresh tissue. Future qNPA validation of predictive markers, identified by whole transcriptome analysis in a homogenous cohort may provide more definitive results. [Table: see text]

Download Full-text

Biomarkers in Parkinson Disease: global gene expression analysis in peripheral blood from patients with and without mutations in PARK2 and PARK8

Einstein (São Paulo) ◽

10.1590/s1679-45082010ao1674 ◽

2010 ◽

Vol 8 (3) ◽

pp. 291-297 ◽

Cited By ~ 8

Author(s):

Patricia Maria de Carvalho Aguiar ◽

Patricia Severino

Keyword(s):

Gene Expression ◽

Parkinson Disease ◽

Expression Analysis ◽

Peripheral Blood ◽

Gene Expression Analysis ◽

Small Sample Size ◽

Expression Profiles ◽

Global Gene Expression ◽

Small Sample ◽

Global Gene Expression Analysis

ABSTRACT Objective: To evaluate the performance of gene expression analysis in the peripheral blood of Parkinson disease patients with different genetic profiles using microarray as a tool to identify possible diseases related biomarkers which could contribute to the elucidation of the pathological process, as well as be useful in diagnosis. Methods: Global gene expression analysis by means of DNA microarrays was performed in peripheral blood of Parkinson disease patients with previously identified mutations in PARK2 or PARK8 genes, Parkinson disease patients without known mutations in these genes and normal controls. Each group consisted of five individuals. Results: Global gene expression profiles were heterogeneous among patients and controls, and it was not possible to detect a consistent pattern between groups. However, analyzing genes with differential expression of p < 0.005 and fold change ≥ 1.2, we were able to identify a small group of well-annotated genes. Conclusions: Despite the small sample size, the identification of differentially expressed genes suggests that the microarray technique may be useful in identifying potential biomarkers in the peripheral blood of Parkinson disease patients or in people at risk of developing the disease. This will be important once neuroprotective therapies become available, and may contribute to the identification of new pathways involved in the disease physiopathology. Results presented here should be further validated in larger groups of patients.

Download Full-text