Optimum Feature Subset for Optimizing Crop Yield Prediction Using Filter and Wrapper Approaches

P. S. Maya Gopal; R Bhargavi

doi:10.13031/aea.12938

Optimum Feature Subset for Optimizing Crop Yield Prediction Using Filter and Wrapper Approaches

Applied Engineering in Agriculture ◽

10.13031/aea.12938 ◽

2019 ◽

Vol 35 (1) ◽

pp. 9-14 ◽

Cited By ~ 3

Author(s):

P. S. Maya Gopal ◽

R Bhargavi

Keyword(s):

Feature Selection ◽

Linear Regression ◽

Multiple Linear Regression ◽

Crop Yield ◽

Computational Time ◽

Feature Subset ◽

Yield Prediction ◽

Selection Algorithm ◽

Cultivable Land ◽

Selection Algorithms

Abstract. In agriculture, crop yield prediction is critical. Crop yield depends on various features which can be categorized as geographical, climatic, and biological. Geographical features consist of cultivable land in hectares, canal length to cover the cultivable land, number of tanks and tube wells available for irrigation. Climatic features consist of rainfall, temperature, and radiation. Biological features consist of seeds, minerals, and nutrients. In total, 15 features were considered for this study to understand features impact on paddy crop yield for all seasons of each year. For selecting vital features, five filter and wrapper approaches were applied. For predicting accuracy of features selection algorithm, Multiple Linear Regression (MLR) model was used. The RMSE, MAE, R, and RRMSE metrics were used to evaluate the performance of feature selection algorithms. Data used for the analysis was drawn from secondary sources of state Agriculture Department, Government of Tamil Nadu, India, for over 30 years. Seventy-five percent of data was used for training and 25% was used for testing. Low computational time was also considered for the selection of best feature subset. Outcome of all feature selection algorithms have given similar results in the RMSE, RRMSE, R, and MAE values. The adjusted R2 value was used to find the optimum feature subset despite all the deviations. The evaluation of the dataset used in this work shows that total area of cultivation, number of tanks and open wells used for irrigation, length of canals used for irrigation, and average maximum temperature during the season of the crop are the best features for better crop yield prediction on the study area. The MLR gives 85% of model accuracy for the selected features with low computational time. Keywords: Feature selection algorithm, Model validation, Multiple linear regression, Performance metrics.

Download Full-text

Selection of Important Features for Optimizing Crop Yield Prediction

International Journal of Agricultural and Environmental Information Systems ◽

10.4018/ijaeis.2019070104 ◽

2019 ◽

Vol 10 (3) ◽

pp. 54-71

Author(s):

Maya Gopal P S ◽

Bhargavi R

Keyword(s):

Feature Selection ◽

Crop Yield ◽

Tamil Nadu ◽

Feature Subset ◽

Yield Prediction ◽

Multilinear Regression ◽

Secondary Sources ◽

Backward Elimination ◽

Tamil Nadu State ◽

Selection Algorithms

In agriculture, crop yield prediction is critical. Crop yield depends on various features including geographic, climate and biological. This research article discusses five Feature Selection (FS) algorithms namely Sequential Forward FS, Sequential Backward Elimination FS, Correlation based FS, Random Forest Variable Importance and the Variance Inflation Factor algorithm for feature selection. Data used for the analysis was drawn from secondary sources of the Tamil Nadu state Agriculture Department for a period of 30 years. 75% of data was used for training and 25% data was used for testing. The performance of the feature selection algorithms are evaluated by Multiple Linear Regression. RMSE, MAE, R and RRMSE metrics are calculated for the feature selection algorithms. The adjusted R2 was used to find the optimum feature subset. Also, the time complexity of the algorithms was considered for the computation. The selected features are applied to Multilinear regression, Artificial Neural Network and M5Prime. MLR gives 85% of accuracy by using the features which are selected by SFFS algorithm.

Download Full-text

Feature selection for incomplete set-valued data

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-210135 ◽

2021 ◽

pp. 1-19

Author(s):

Lulu Li

Keyword(s):

Information System ◽

Feature Selection ◽

Practical Significance ◽

Feature Subset ◽

Selection Algorithm ◽

Uncertainty Measurement ◽

Selection For ◽

Average Size ◽

Selection Algorithms ◽

Information Values

Set-valued data is a significant kind of data, such as data obtained from different search engines, market data, patients’ symptoms and behaviours. An information system (IS) based on incomplete set-valued data is called an incomplete set-valued information system (ISVIS), which generalized model of a single-valued incomplete information system. This paper gives feature selection for an ISVIS by means of uncertainty measurement. Firstly, the similarity degree between two information values on a given feature of an ISVIS is proposed. Then, the tolerance relation on the object set with respect to a given feature subset in an ISVIS is obtained. Next, λ-reduction in an ISVIS is presented. What’s more, connections between the proposed feature selection and uncertainty measurement are exhibited. Lastly, feature selection algorithms based on λ-discernibility matrix, λ-information granulation, λ-information entropy and λ-significance in an ISVIS are provided. In order to better prove the practical significance of the provided algorithms, a numerical experiment is carried out, and experiment results show the number of features and average size of features by each feature selection algorithm.

Download Full-text

An Enhancement of Feature Selection Algorithm for EDM: A Review

International Journal of Advanced Research in Computer Science and Software Engineering ◽

10.23956/ijarcsse.v8i5.661 ◽

2018 ◽

Vol 8 (5) ◽

pp. 29

Author(s):

Manpreet Kaur ◽

Chamkaur Singh

Keyword(s):

Feature Selection ◽

Educational Data Mining ◽

Problem Formulation ◽

Research Area ◽

Education Quality ◽

Educational Institutions ◽

Selection Algorithm ◽

Positive Role ◽

Data Set ◽

Selection Algorithms

Educational Data Mining (EDM) is an emerging research area help the educational institutions to improve the performance of their students. Feature Selection (FS) algorithms remove irrelevant data from the educational dataset and hence increases the performance of classifiers used in EDM techniques. This paper present an analysis of the performance of feature selection algorithms on student data set. .In this papers the different problems that are defined in problem formulation. All these problems are resolved in future. Furthermore the paper is an attempt of playing a positive role in the improvement of education quality, as well as guides new researchers in making academic intervention.

Download Full-text

A novel feature selection algorithm based on damping oscillation theory

PLoS ONE ◽

10.1371/journal.pone.0255307 ◽

2021 ◽

Vol 16 (8) ◽

pp. e0255307

Author(s):

Fujun Wang ◽

Xing Wang

Keyword(s):

Feature Selection ◽

Optimization Algorithm ◽

Euclidean Distance ◽

Oscillation Theory ◽

Feature Subset Selection ◽

Support Vector ◽

Data Sets ◽

Feature Subset ◽

Selection Algorithm ◽

Filter Model

Feature selection is an important task in big data analysis and information retrieval processing. It reduces the number of features by removing noise, extraneous data. In this paper, one feature subset selection algorithm based on damping oscillation theory and support vector machine classifier is proposed. This algorithm is called the Maximum Kendall coefficient Maximum Euclidean Distance Improved Gray Wolf Optimization algorithm (MKMDIGWO). In MKMDIGWO, first, a filter model based on Kendall coefficient and Euclidean distance is proposed, which is used to measure the correlation and redundancy of the candidate feature subset. Second, the wrapper model is an improved grey wolf optimization algorithm, in which its position update formula has been improved in order to achieve optimal results. Third, the filter model and the wrapper model are dynamically adjusted by the damping oscillation theory to achieve the effect of finding an optimal feature subset. Therefore, MKMDIGWO achieves both the efficiency of the filter model and the high precision of the wrapper model. Experimental results on five UCI public data sets and two microarray data sets have demonstrated the higher classification accuracy of the MKMDIGWO algorithm than that of other four state-of-the-art algorithms. The maximum ACC value of the MKMDIGWO algorithm is at least 0.5% higher than other algorithms on 10 data sets.

Download Full-text

Satellite soil moisture for yield prediction in water limited regions

10.5194/egusphere-egu21-12549 ◽

2021 ◽

Author(s):

Mariette Vreugdenhil ◽

Isabella Pfeil ◽

Luca Brocca ◽

Stefania Camici ◽

Markus Enenkel ◽

...

Keyword(s):

Soil Moisture ◽

Linear Regression ◽

Multiple Linear Regression ◽

Early Warning ◽

Growing Season ◽

Early Warning Systems ◽

Yield Prediction ◽

Warning Systems ◽

Yield Data ◽

Disaster Risk Financing

<div> <p>Accurate and reliable&#160;early warning systems can&#160;support&#160;anticipatory&#160;disaster risk financing&#160;which&#160;can be more cost effective than post-disaster&#160;emergency response.&#160;One of the challenges in&#160;anticipatory&#160;disaster risk financing is basis risk, as a result of&#160;data and&#160;model uncertainty.&#160;The increasing availability of Earth Observation&#160;(EO)&#160;data provides the opportunity to&#160;develop shadow models or include different variables in early warning systems&#160;and weather index insurance. Especially of interest is the early indication of&#160;climate impacts on agricultural production.&#160;Traditionally, crop and yield prediction models&#160;use meteorological data such as precipitation and temperature, or&#160;optical based indicators&#160;such as&#160;Normalized&#160;Difference&#160;Vegetation&#160;Index (NDVI), for yield prediction.&#160;&#160;In recent years, soil moisture has gained popularity for yield prediction as it controls the water availability for plants.&#160;&#160;</p> </div><div> <p>Here, we will present the use of different satellite-based rainfall and soil moisture products, in combination with NDVI, to develop a yield deficiency indicator over two water limited regions. An analysis for Senegal and Morocco is performed at the national level using yield data of four major crops from the Food and Agriculture Organization of the United Nations. Freely available EO datasets for rainfall, soil moisture, root zone soil moisture and NDVI were used. All datasets were spatially resampled to a 0.1&#176; grid, temporally aggregated to monthly anomalies and finally detrended and standardized. First, regression analysis with yearly yield was performed per EO dataset for single months. For this, EO datasets where aggregated over areas where the specific crop was grown. Secondly, based on these results multiple linear regression was performed using the months and variables with the highest explanatory power. The multiple linear regression was used to provide spatially varying yield predictions by trading time for space. The spatial predictions were validated using sub-national yield data from Senegal.&#160;&#160;</p> </div><div> <p>The analysis&#160;demonstrates the added-value of&#160;satellite&#160;soil&#160;moisture for&#160;early yield prediction.&#160;Both in Senegal and Morocco&#160;rainfall and&#160;soil moisture&#160;showed&#160;a high predictive&#160;skill&#160;early in the growing season: negative early season soil moisture anomalies often lead to low yield. NDVI&#160;showed&#160;more predictive power later in the growing season.&#160;For example, in Morocco soil moisture at the start of the season can already explain 56% of the variability in yield. NDVI&#160;can explain 80% of the yield, however this is at the&#160;end of the growing season.&#160;Combining&#160;anomalies of the&#160;optimal months&#160;based on&#160;the&#160;different variables in multiple linear regression&#160;improved yield prediction. Again,&#160;including NDVI&#160;led&#160;to higher predictive power, at the cost of early warning.&#160;&#160;This analysis shows very clearly that soil moisture&#160;can be&#160;a valuable tool&#160;for&#160;anticipatory&#160;drought risk financing and early warning systems.&#160;</p> </div>

Download Full-text

Utilizing Feature Selection on Higher Order Neural Networks

Nature-Inspired Computing ◽

10.4018/978-1-5225-0788-8.ch041 ◽

2016 ◽

pp. 1099-1114

Author(s):

Zongyuan Zhao ◽

Shuxiang Xu ◽

Byeong Ho Kang ◽

Mir Md Jahangir Kabir ◽

Yunling Liu ◽

...

Keyword(s):

Neural Network ◽

Feature Selection ◽

Prediction Accuracy ◽

Higher Order ◽

Computational Time ◽

Learning Capabilities ◽

Higher Order Neural Networks ◽

And Function ◽

Selection Algorithms ◽

Fully Connected

Artificial Neural Network has shown its impressive ability on many real world problems such as pattern recognition, classification and function approximation. An extension of ANN, higher order neural network (HONN), improves ANN's computational and learning capabilities. However, the large number of higher order attributes leads to long learning time and complex network structure. Some irrelevant higher order attributes can also hinder the performance of HONN. In this chapter, feature selection algorithms will be used to simplify HONN architecture. Comparisons of fully connected HONN with feature selected HONN demonstrate that proper feature selection can be effective on decreasing number of inputs, reducing computational time, and improving prediction accuracy of HONN.

Download Full-text

Modulation Recognition of Digital Multimedia Signal Based on Data Feature Selection

International Journal of Mobile Computing and Multimedia Communications ◽

10.4018/ijmcmc.2017070107 ◽

2017 ◽

Vol 8 (3) ◽

pp. 90-111 ◽

Cited By ~ 2

Author(s):

Hui Wang ◽

Li Li Guo ◽

Yun Lin

Keyword(s):

Feature Selection ◽

Information Entropy ◽

Feature Subset ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Modulation Recognition ◽

Signal Modulation ◽

Digital Multimedia ◽

Optimal Feature Subset ◽

Optimal Feature

Automatic modulation recognition is very important for the receiver design in the broadband multimedia communication system, and the reasonable signal feature extraction and selection algorithm is the key technology of Digital multimedia signal recognition. In this paper, the information entropy is used to extract the single feature, which are power spectrum entropy, wavelet energy spectrum entropy, singular spectrum entropy and Renyi entropy. And then, the feature selection algorithm of distance measurement and Sequential Feature Selection(SFS) are presented to select the optimal feature subset. Finally, the BP neural network is used to classify the signal modulation. The simulation result shows that the four-different information entropy can be used to classify different signal modulation, and the feature selection algorithm is successfully used to choose the optimal feature subset and get the best performance.

Download Full-text

Composition Analysis and Feature Selection of the Oral Microbiota Associated with Periodontal Disease

BioMed Research International ◽

10.1155/2018/3130607 ◽

2018 ◽

Vol 2018 ◽

pp. 1-14 ◽

Cited By ~ 9

Author(s):

Wen-Pei Chen ◽

Shih-Hao Chang ◽

Chuan-Yi Tang ◽

Ming-Li Liou ◽

Suh-Jen Jane Tsai ◽

...

Keyword(s):

Feature Selection ◽

Periodontal Disease ◽

Clinical Decision Making ◽

Prediction Models ◽

Periodontal Diseases ◽

Computational Time ◽

Composition Analysis ◽

Oral Microbiota ◽

Selection Algorithm ◽

Feature Selection Algorithm

Periodontitis is an inflammatory disease involving complex interactions between oral microorganisms and the host immune response. Understanding the structure of the microbiota community associated with periodontitis is essential for improving classifications and diagnoses of various types of periodontal diseases and will facilitate clinical decision-making. In this study, we used a 16S rRNA metagenomics approach to investigate and compare the compositions of the microbiota communities from 76 subgingival plagues samples, including 26 from healthy individuals and 50 from patients with periodontitis. Furthermore, we propose a novel feature selection algorithm for selecting features with more information from many variables with a combination of these features and machine learning methods were used to construct prediction models for predicting the health status of patients with periodontal disease. We identified a total of 12 phyla, 124 genera, and 355 species and observed differences between health- and periodontitis-associated bacterial communities at all phylogenetic levels. We discovered that the generaPorphyromonas,Treponema,Tannerella,Filifactor, andAggregatibacterwere more abundant in patients with periodontal disease, whereasStreptococcus,Haemophilus,Capnocytophaga,Gemella,Campylobacter, andGranulicatellawere found at higher levels in healthy controls. Using our feature selection algorithm, random forests performed better in terms of predictive power than other methods and consumed the least amount of computational time.

Download Full-text

A NOVEL FEATURE SELECTION ALGORITHM WITH SUPERVISED MUTUAL INFORMATION FOR CLASSIFICATION

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213013500279 ◽

2013 ◽

Vol 22 (04) ◽

pp. 1350027

Author(s):

JAGANATHAN PALANICHAMY ◽

KUPPUCHAMY RAMASAMY

Keyword(s):

Machine Learning ◽

Data Mining ◽

Feature Selection ◽

Mutual Information ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Class A ◽

Selection Algorithms ◽

The Relationship ◽

Class Variable

Feature selection is essential in data mining and pattern recognition, especially for database classification. During past years, several feature selection algorithms have been proposed to measure the relevance of various features to each class. A suitable feature selection algorithm normally maximizes the relevancy and minimizes the redundancy of the selected features. The mutual information measure can successfully estimate the dependency of features on the entire sampling space, but it cannot exactly represent the redundancies among features. In this paper, a novel feature selection algorithm is proposed based on maximum relevance and minimum redundancy criterion. The mutual information is used to measure the relevancy of each feature with class variable and calculate the redundancy by utilizing the relationship between candidate features, selected features and class variables. The effectiveness is tested with ten benchmarked datasets available in UCI Machine Learning Repository. The experimental results show better performance when compared with some existing algorithms.

Download Full-text

Performance Evaluation of Feature Selection Algorithms Applied to Online Learning in Concept Drift Environments

10.5753/eniac.2018.4438 ◽

2018 ◽

Author(s):

Matheus B. De Moraes ◽

André L. S. Gradvohl

Keyword(s):

Feature Selection ◽

Information Gain ◽

Concept Drift ◽

Computational Cost ◽

Selection Algorithm ◽

Information Need ◽

High Speeds ◽

Online Feature Selection ◽

Classification Tasks ◽

Selection Algorithms

Data streams are transmitted at high speeds with huge volume and may contain critical information need processing in real-time. Hence, to reduce computational cost and time, the system may apply a feature selection algorithm. However, this is not a trivial task due to the concept drift. In this work, we show that two feature selection algorithms, Information Gain and Online Feature Selection, present lower performance when compared to classification tasks without feature selection. Both algorithms presented more relevant results in one distinct scenario each, showing final accuracies up to 14% higher. The experiments using both real and artificial datasets present a potential for using these methods due to their better adaptability in some concept drift situations.

Download Full-text