scholarly journals Component Thermodynamical Selection Based Gene Expression Programming for Function Finding

2014 ◽  
Vol 2014 ◽  
pp. 1-16 ◽  
Author(s):  
Zhaolu Guo ◽  
Zhijian Wu ◽  
Xiaojian Dong ◽  
Kejun Zhang ◽  
Shenwen Wang ◽  
...  

Gene expression programming (GEP), improved genetic programming (GP), has become a popular tool for data mining. However, like other evolutionary algorithms, it tends to suffer from premature convergence and slow convergence rate when solving complex problems. In this paper, we propose an enhanced GEP algorithm, called CTSGEP, which is inspired by the principle of minimal free energy in thermodynamics. In CTSGEP, it employs a component thermodynamical selection (CTS) operator to quantitatively keep a balance between the selective pressure and the population diversity during the evolution process. Experiments are conducted on several benchmark datasets from the UCI machine learning repository. The results show that the performance of CTSGEP is better than the conventional GEP and some GEP variations.

2011 ◽  
Vol 204-210 ◽  
pp. 288-292 ◽  
Author(s):  
Yong Qiang Zhang ◽  
Jing Xiao

Population diversity is one of the most important factors that influence the convergence speed and evolution efficiency of gene expression programming (GEP) algorithm. In this paper, the population diversity strategy of GEP (GEP-PDS) is presented, inheriting the advantage of superior population producing strategy and various population strategy, to increase population average fitness and decrease generations, to make the population maintain diversification throughout the evolutionary process and avoid “premature” to ensure the convergence ability and evolution efficiency. The simulation experiments show that GEP-PDS can increase the population average fitness by 10% in function finding, and decrease the generations for convergence to the optimal solution by 30% or more compared with other improved GEP.


2019 ◽  
Vol 06 (02) ◽  
pp. 163-175 ◽  
Author(s):  
Joanna Jȩdrzejowicz ◽  
Piotr Jȩdrzejowicz ◽  
Izabela Wierzbowska

The paper investigates a Gene Expression Programming (GEP)-based ensemble classifier constructed using the stacked generalization concept. The classifier has been implemented with a view to enable parallel processing with the use of Spark and SWIM — an open source genetic programming library. The classifier has been validated in computational experiments carried out on benchmark datasets. Also, it has been inbvestigated how the results are influenced by some settings. The paper is an extension of a previous paper of the authors.


2019 ◽  
Author(s):  
Pei-Yau Lung ◽  
Xiaodong Pang ◽  
Yan Li ◽  
Jinfeng Zhang

AbstractReusability is part of the FAIR data principle, which aims to make data Findable, Accessible, Interoperable, and Reusable. One of the current efforts to increase the reusability of public genomics data has been to focus on the inclusion of quality metadata associated with the data. When necessary metadata are missing, most researchers will consider the data useless. In this study, we develop a framework to predict the missing metadata of gene expression datasets to maximize their reusability. We propose a new metric called Proportion of Cases Accurately Predicted (PCAP), which is optimized in our specifically-designed machine learning pipeline. The new approach performed better than pipelines using commonly used metrics such as F1-score in terms of maximizing the reusability of data with missing values. We also found that different variables might need to be predicted using different machine learning methods and/or different data processing protocols. Using differential gene expression analysis as an example, we show that when missing variables are accurately predicted, the corresponding gene expression data can be reliably used in downstream analyses.


2017 ◽  
Vol 10 (9) ◽  
pp. 3519-3545 ◽  
Author(s):  
Iulia Ilie ◽  
Peter Dittrich ◽  
Nuno Carvalhais ◽  
Martin Jung ◽  
Andreas Heinemeyer ◽  
...  

Abstract. Accurate model representation of land–atmosphere carbon fluxes is essential for climate projections. However, the exact responses of carbon cycle processes to climatic drivers often remain uncertain. Presently, knowledge derived from experiments, complemented by a steadily evolving body of mechanistic theory, provides the main basis for developing such models. The strongly increasing availability of measurements may facilitate new ways of identifying suitable model structures using machine learning. Here, we explore the potential of gene expression programming (GEP) to derive relevant model formulations based solely on the signals present in data by automatically applying various mathematical transformations to potential predictors and repeatedly evolving the resulting model structures. In contrast to most other machine learning regression techniques, the GEP approach generates readable models that allow for prediction and possibly for interpretation. Our study is based on two cases: artificially generated data and real observations. Simulations based on artificial data show that GEP is successful in identifying prescribed functions, with the prediction capacity of the models comparable to four state-of-the-art machine learning methods (random forests, support vector machines, artificial neural networks, and kernel ridge regressions). Based on real observations we explore the responses of the different components of terrestrial respiration at an oak forest in south-eastern England. We find that the GEP-retrieved models are often better in prediction than some established respiration models. Based on their structures, we find previously unconsidered exponential dependencies of respiration on seasonal ecosystem carbon assimilation and water dynamics. We noticed that the GEP models are only partly portable across respiration components, the identification of a general terrestrial respiration model possibly prevented by equifinality issues. Overall, GEP is a promising tool for uncovering new model structures for terrestrial ecology in the data-rich era, complementing more traditional modelling approaches.


Sign in / Sign up

Export Citation Format

Share Document