Bias-variance decomposition in Genetic Programming

Taras Kowaliw; René Doursat

doi:10.1515/math-2016-0005

Bias-variance decomposition in Genetic Programming

Open Mathematics ◽

10.1515/math-2016-0005 ◽

2016 ◽

Vol 14 (1) ◽

pp. 62-80 ◽

Cited By ~ 2

Author(s):

Taras Kowaliw ◽

René Doursat

Keyword(s):

Genetic Programming ◽

Variance Decomposition ◽

Initial Population ◽

Linear Genetic Programming ◽

Improve Performance ◽

Training Samples ◽

Bias Variance ◽

Selection Of

AbstractWe study properties of Linear Genetic Programming (LGP) through several regression and classification benchmarks. In each problem, we decompose the results into bias and variance components, and explore the effect of varying certain key parameters on the overall error and its decomposed contributions. These parameters are the maximum program size, the initial population, and the function set used. We confirm and quantify several insights into the practical usage of GP, most notably that (a) the variance between runs is primarily due to initialization rather than the selection of training samples, (b) parameters can be reasonably optimized to obtain gains in efficacy, and (c) functions detrimental to evolvability are easily eliminated, while functions well-suited to the problem can greatly improve performance—therefore, larger and more diverse function sets are always preferable.

Download Full-text

An Analysis of the Influence of Non-effective Instructions in Linear Genetic Programming

Evolutionary Computation ◽

10.1162/evco_a_00296 ◽

2021 ◽

pp. 1-23

Author(s):

Léo Françoso Dal Piccol Sotto ◽

Franz Rothlauf ◽

Vinçcius Veloso de Melo ◽

Márcio P. Basgalupp

Keyword(s):

Genetic Programming ◽

Directed Acyclic Graph ◽

Genetic Material ◽

Population Diversity ◽

Search Performance ◽

Linear Genetic Programming ◽

Improve Performance ◽

Neutral Mutations ◽

Evolutionary Memory

Abstract Linear Genetic Programming (LGP) represents programs as sequences of instructions and has a Directed Acyclic Graph (DAG) dataflow. The results of instructions are stored in registers that can be used as arguments by other instructions. Instructions that are disconnected from the main part of the program are called non-effective instructions, or structural introns. They also appear in other DAG-based GP approaches like Cartesian Genetic Programming (CGP). This paper studies four hypotheses on the role of structural introns: non-effective instructions (1) serve as evolutionary memory, where evolved information is stored and later used in search, (2) preserve population diversity, (3) allow neutral search, where structural introns increase the number of neutral mutations and improve performance, and (4) serve as genetic material to enable program growth. We study different variants of LGP controlling the influence of introns for symbolic regression, classification, and digital circuits problems. We find that there is (1) evolved information in the non-effective instructions that can be reactivated and that (2) structural introns can promote programs with higher effective diversity. However, both effects have no influence on LGP search performance. On the other hand, allowing mutations to not only be applied to effective but also to noneffective instructions (3) increases the rate of neutral mutations and (4) contributes to program growth by making use of the genetic material available as structural introns. This comes along with a significant increase of LGP performance, which makes structural introns important for LGP.

Download Full-text

A predictive equation for residual strength using a hybrid of Subset Selection of maximum dissimilarity method with Pareto optimal multi-gene genetic programming

Geoscience Frontiers ◽

10.1016/j.gsf.2021.101222 ◽

2021 ◽

pp. 101222

Author(s):

Hossien Riahi-Madvar ◽

Mahsa Gholami ◽

Bahram Gharabaghi ◽

Seyed Morteza Seyedian

Keyword(s):

Genetic Programming ◽

Residual Strength ◽

Subset Selection ◽

Pareto Optimal ◽

Predictive Equation ◽

Selection Of

Download Full-text

Disaster Intensity-Based Selection of Training Samples for Remote Sensing Building Damage Classification

IEEE Transactions on Geoscience and Remote Sensing ◽

10.1109/tgrs.2020.3046004 ◽

2021 ◽

pp. 1-17

Author(s):

Luis Moya ◽

Christian Geiss ◽

Masakazu Hashimoto ◽

Erick Mas ◽

Shunichi Koshimura ◽

...

Keyword(s):

Remote Sensing ◽

Building Damage ◽

Damage Classification ◽

Training Samples ◽

Disaster Intensity ◽

Selection Of

Download Full-text

Bias-Variance Decomposition for Ranking

Proceedings of the 14th ACM International Conference on Web Search and Data Mining ◽

10.1145/3437963.3441772 ◽

2021 ◽

Author(s):

Pannaga Shivaswamy ◽

Ashok Chandrashekar

Keyword(s):

Variance Decomposition ◽

Bias Variance

Download Full-text

Numerical analysis of the shear strength of circular reinforced concrete columns subjected to cyclic lateral loads using linear genetic programming

Engineering Computations ◽

10.1108/ec-10-2018-0453 ◽

2020 ◽

Vol 37 (7) ◽

pp. 2517-2537

Author(s):

Mostafa Rezvani Sharif ◽

Seyed Mohammad Reza Sadri Tabaei Zavareh

Keyword(s):

Numerical Modeling ◽

Shear Strength ◽

Reinforced Concrete ◽

Numerical Analysis ◽

Genetic Programming ◽

Linear Genetic Programming ◽

Rc Columns ◽

Content Type ◽

Alternative Approach ◽

Engineering Problems

Purpose The shear strength of reinforced concrete (RC) columns under cyclic lateral loading is a crucial concern, particularly, in the seismic design of RC structures. Considering the costly procedure of testing methods for measuring the real value of the shear strength factor and the existence of several parameters impacting the system behavior, numerical modeling techniques have been very much appreciated by engineers and researchers. This study aims to propose a new model for estimation of the shear strength of cyclically loaded circular RC columns through a robust computational intelligence approach, namely, linear genetic programming (LGP). Design/methodology/approach LGP is a data-driven self-adaptive algorithm recently used for classification, pattern recognition and numerical modeling of engineering problems. A reliable database consisting of 64 experimental data is collected for the development of shear strength LGP models here. The obtained models are evaluated from both engineering and accuracy perspectives by means of several indicators and supplementary studies and the optimal model is presented for further purposes. Additionally, the capability of LGP is examined to be used as an alternative approach for the numerical analysis of engineering problems. Findings A new predictive model is proposed for the estimation of the shear strength of cyclically loaded circular RC columns using the LGP approach. To demonstrate the capability of the proposed model, the analysis results are compared to those obtained by some well-known models recommended in the existing literature. The results confirm the potential of the LGP approach for numerical analysis of engineering problems in addition to the fact that the obtained LGP model outperforms existing models in estimation and predictability. Originality/value This paper mainly represents the capability of the LGP approach as a robust alternative approach among existing analytical and numerical methods for modeling and analysis of relevant engineering approximation and estimation problems. The authors are confident that the shear strength model proposed can be used for design and pre-design aims. The authors also declare that they have no conflict of interest.

Download Full-text

General Schema Theory for Genetic Programming with Subtree-Swapping Crossover: Part II

Evolutionary Computation ◽

10.1162/106365603766646825 ◽

2003 ◽

Vol 11 (2) ◽

pp. 169-206 ◽

Cited By ~ 51

Author(s):

Riccardo Poli ◽

Nicholas Freitag McPhee

Keyword(s):

Genetic Programming ◽

Schema Theory ◽

Exact Formulation ◽

Expected Number ◽

Size Evolution ◽

General Schema ◽

Definition Of ◽

Effective Fitness ◽

Selection Of ◽

Exact Definition

This paper is the second part of a two-part paper which introduces a general schema theory for genetic programming (GP) with subtree-swapping crossover (Part I (Poli and McPhee, 2003)). Like other recent GP schema theory results, the theory gives an exact formulation (rather than a lower bound) for the expected number of instances of a schema at the next generation. The theory is based on a Cartesian node reference system, introduced in Part I, and on the notion of a variable-arity hyperschema, introduced here, which generalises previous definitions of a schema. The theory includes two main theorems describing the propagation of GP schemata: a microscopic and a macroscopic schema theorem. The microscopic version is applicable to crossover operators which replace a subtree in one parent with a subtree from the other parent to produce the offspring. Therefore, this theorem is applicable to Koza's GP crossover with and without uniform selection of the crossover points, as well as one-point crossover, size-fair crossover, strongly-typed GP crossover, context-preserving crossover and many others. The macroscopic version is applicable to crossover operators in which the probability of selecting any two crossover points in the parents depends only on the parents' size and shape. In the paper we provide examples, we show how the theory can be specialised to specific crossover operators and we illustrate how it can be used to derive other general results. These include an exact definition of effective fitness and a size-evolution equation for GP with subtree-swapping crossover.

Download Full-text

Optimized selection of training samples for One-Class Neural Network classifier

2014 International Joint Conference on Neural Networks (IJCNN) ◽

10.1109/ijcnn.2014.6889429 ◽

2014 ◽

Cited By ~ 4

Author(s):

Bilal Hadjadji ◽

Youcef Chibani

Keyword(s):

Neural Network ◽

Neural Network Classifier ◽

Training Samples ◽

Selection Of

Download Full-text

Classification of Cardiac Arrhythmia by Random Forests with Features Constructed by Kaizen Programming with Linear Genetic Programming

Proceedings of the 2016 on Genetic and Evolutionary Computation Conference - GECCO '16 ◽

10.1145/2908812.2908882 ◽

2016 ◽

Cited By ~ 1

Author(s):

Léo F.D.P. Sotto ◽

Regina C. Coelho ◽

Vinícius V. de Melo

Keyword(s):

Genetic Programming ◽

Cardiac Arrhythmia ◽

Random Forests ◽

Linear Genetic Programming

Download Full-text

A NEW LINEAR GENETIC PROGRAMMING APPROACH BASED ON STRAIGHT LINE PROGRAMS: SOME THEORETICAL AND EXPERIMENTAL ASPECTS

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213009000391 ◽

2009 ◽

Vol 18 (05) ◽

pp. 757-781 ◽

Cited By ~ 7

Author(s):

CÉSAR L. ALONSO ◽

JOSÉ LUIS MONTAÑA ◽

JORGE PUENTE ◽

CRUZ ENRIQUE BORGES

Keyword(s):

Data Structure ◽

Genetic Programming ◽

Computer Programs ◽

Symbolic Regression ◽

Programming Approach ◽

Linear Genetic Programming ◽

Straight Line ◽

Structured Representations ◽

Regression Problems ◽

Straight Line Programs

Tree encodings of programs are well known for their representative power and are used very often in Genetic Programming. In this paper we experiment with a new data structure, named straight line program (slp), to represent computer programs. The main features of this structure are described, new recombination operators for GP related to slp's are introduced and a study of the Vapnik-Chervonenkis dimension of families of slp's is done. Experiments have been performed on symbolic regression problems. Results are encouraging and suggest that the GP approach based on slp's consistently outperforms conventional GP based on tree structured representations.

Download Full-text