Comprehensive model optimization in pulp quality prediction: a machine learning approach
Feature selection in machine learning is of great interest since it is reckoned as creating more efficient predictive models in several engineering domains. It is even of special importance in the pulp and paper transformation industry as the knowledge of this particular process is generally very limited. In this paper, we first compared the performance of rule-based genetic algorithm and that of adaptive neuro-fuzzy inference system; the latter is found to be more precise in predicting the pulp quality. We then combined several data mining algorithms such as genetic algorithm-partial least square regression, along with other statistical methods, to explore the relevance of all the potential variables that could be used to predict the pulp ISO brightness, an important property that is usually linked to model performance and hence pulp quality prediction. A few highly relevant variables are thereby determined, and the full set of 79 variables obtained from a Chip Management System was trimmed down to an optimized combination of 3 inputs depending on their relevancy. Peroxide charge (P), average luminance (L) and hue (H) were chosen as the optimal subset to describe the ISO brightness of the pulp and the model was simplified without losing much of its accuracy. Finally, we derived the numbers of membership functions for each variable to further refine the fuzzy logic-based prediction model. The error then reached 2.18%. The loss on accuracy was compensated by adjusting to the fittest membership function numbers