The detection of Alternaria solani infection on tomatoes using ensemble learning

2020 ◽  
Vol 12 (5) ◽  
pp. 407-418
Author(s):  
Bogdan Ruszczak ◽  
Krzysztof Smykała ◽  
Karol Dziubański

This paper presents a detection method of Alternaria solani in tomatoes. Several machine learning models were used to detect the pathogen, such as the implementation of decision trees and ensemble learning methods. The use of these methods requires the acquisition of large volumes of data and adequate preprocessing of this data. For the presented study the dataset of hyperspectral measurements of two varieties of tomatoes was used. Measurements were split into two groups: one inoculated with the Alternaria solani pathogen and the other one was treated as the reference. Measurements were taken by the spectroradiometer in consecutive measurement series. The main part of the study was the evaluation of the decision trees and the popular ensemble learning algorithms to select the most accurate one. After subsequent iterations of the training process and adjustment of hyperparameters, satisfactory accuracy results, equal to 0.987 for random forest, were obtained. This paper also covers the examination of the spectral range required for Alternaria solani identification. From several variants, the accuracy of models based on VIS and NIR spectral range was the closest to the accuracy obtained with the whole spectrum of measured absolute reflectance.

Mathematics ◽  
2021 ◽  
Vol 9 (4) ◽  
pp. 325
Author(s):  
Ángel González-Prieto ◽  
Alberto Mozo ◽  
Edgar Talavera ◽  
Sandra Gómez-Canaval

Generative Adversarial Networks (GANs) are powerful machine learning models capable of generating fully synthetic samples of a desired phenomenon with a high resolution. Despite their success, the training process of a GAN is highly unstable, and typically, it is necessary to implement several accessory heuristics to the networks to reach acceptable convergence of the model. In this paper, we introduce a novel method to analyze the convergence and stability in the training of generative adversarial networks. For this purpose, we propose to decompose the objective function of the adversary min–max game defining a periodic GAN into its Fourier series. By studying the dynamics of the truncated Fourier series for the continuous alternating gradient descend algorithm, we are able to approximate the real flow and to identify the main features of the convergence of GAN. This approach is confirmed empirically by studying the training flow in a 2-parametric GAN, aiming to generate an unknown exponential distribution. As a by-product, we show that convergent orbits in GANs are small perturbations of periodic orbits so the Nash equillibria are spiral attractors. This theoretically justifies the slow and unstable training observed in GANs.


2021 ◽  
Vol 11 (5) ◽  
pp. 2164
Author(s):  
Jiaxin Li ◽  
Zhaoxin Zhang ◽  
Changyong Guo

X.509 certificates play an important role in encrypting the transmission of data on both sides under HTTPS. With the popularization of X.509 certificates, more and more criminals leverage certificates to prevent their communications from being exposed by malicious traffic analysis tools. Phishing sites and malware are good examples. Those X.509 certificates found in phishing sites or malware are called malicious X.509 certificates. This paper applies different machine learning models, including classical machine learning models, ensemble learning models, and deep learning models, to distinguish between malicious certificates and benign certificates with Verification for Extraction (VFE). The VFE is a system we design and implement for obtaining plentiful characteristics of certificates. The result shows that ensemble learning models are the most stable and efficient models with an average accuracy of 95.9%, which outperforms many previous works. In addition, we obtain an SVM-based detection model with an accuracy of 98.2%, which is the highest accuracy. The outcome indicates the VFE is capable of capturing essential and crucial characteristics of malicious X.509 certificates.


2017 ◽  
Vol 14 (01) ◽  
pp. 1650031 ◽  
Author(s):  
Wenjun Ye ◽  
Zhijun Li ◽  
Chenguang Yang ◽  
Fei Chen ◽  
Chun-Yi Su

The paper studies the control design of an exoskeleton robot based on electromyography (EMG). An EMG-based motion detection method is proposed to trigger the rehabilitation assistance according to user intention. An adaptive control scheme that compensates for the exoskeleton's dynamics is employed, and it is able to provide assistance tailored to the human user, who is supposed to participate actively in the training process. The experiment results verify the effectiveness of the control method developed in this paper.


2017 ◽  
Vol 2017 ◽  
pp. 1-9 ◽  
Author(s):  
Bin Jia ◽  
Xiaohong Huang ◽  
Rujun Liu ◽  
Yan Ma

The explosive growth of network traffic and its multitype on Internet have brought new and severe challenges to DDoS attack detection. To get the higher True Negative Rate (TNR), accuracy, and precision and to guarantee the robustness, stability, and universality of detection system, in this paper, we propose a DDoS attack detection method based on hybrid heterogeneous multiclassifier ensemble learning and design a heuristic detection algorithm based on Singular Value Decomposition (SVD) to construct our detection system. Experimental results show that our detection method is excellent in TNR, accuracy, and precision. Therefore, our algorithm has good detective performance for DDoS attack. Through the comparisons with Random Forest, k-Nearest Neighbor (k-NN), and Bagging comprising the component classifiers when the three algorithms are used alone by SVD and by un-SVD, it is shown that our model is superior to the state-of-the-art attack detection techniques in system generalization ability, detection stability, and overall detection performance.


2021 ◽  
Author(s):  
◽  
Benjamin Evans

<p>Ensemble learning is one of the most powerful extensions for improving upon individual machine learning models. Rather than a single model being used, several models are trained and the predictions combined to make a more informed decision. Such combinations will ideally overcome the shortcomings of any individual member of the ensemble. Most ma- chine learning competition winners feature an ensemble of some sort, and there is also sound theoretical proof to the performance of certain ensem- bling schemes. The benefits of ensembling are clear in both theory and practice.  Despite the great performance, ensemble learning is not a trivial task. One of the main difficulties is designing appropriate ensembles. For exam- ple, how large should an ensemble be? What members should be included in an ensemble? How should these members be weighted? Our first contribution addresses these concerns using a strongly-typed population- based search (genetic programming) to construct well-performing ensem- bles, where the entire ensemble (members, hyperparameters, structure) is automatically learnt. The proposed method was found, in general, to be significantly better than all base members and commonly used compari- son methods trialled.  With automatically designed ensembles, there is a range of applica- tions, such as competition entries, forecasting and state-of-the-art predic- tions. However, often these applications also require additional prepro- cessing of the input data. Above the ensemble considers only the original training data, however, in many machine learning scenarios a pipeline is required (for example performing feature selection before classification). For the second contribution, a novel automated machine learning method is proposed based on ensemble learning. This method uses a random population-based search of appropriate tree structures, and as such is em- barrassingly parallel, an important consideration for automated machine learning. The proposed method is able to achieve equivalent or improved results over the current state-of-the-art methods and does so in a fraction of the time (six times as fast).  Finally, while complex ensembles offer great performance, one large limitation is the interpretability of such ensembles. For example, why does a forest of 500 trees predict a particular class for a given instance? In an effort to explain the behaviour of complex models (such as ensem- bles), several methods have been proposed. However, these approaches tend to suffer at least one of the following limitations: overly complex in the representation, local in their application, limited to particular fea- ture types (i.e. categorical only), or limited to particular algorithms. For our third contribution, a novel model agnostic method for interpreting complex black-box machine learning models is proposed. The method is based on strongly-typed genetic programming and overcomes the afore- mentioned limitations. Multi-objective optimisation is used to generate a Pareto frontier of simple and explainable models which approximate the behaviour of much more complex methods. We found the resulting rep- resentations are far simpler than existing approaches (an important con- sideration for interpretability) while providing equivalent reconstruction performance.  Overall, this thesis addresses two of the major limitations of existing ensemble learning, i.e. the complex construction process and the black- box models that are often difficult to interpret. A novel application of ensemble learning in the field of automated machine learning is also pro- posed. All three methods have shown at least equivalent or improved performance than existing methods.</p>


2019 ◽  
Vol 77 ◽  
pp. 188-204 ◽  
Author(s):  
Yuyan Wang ◽  
Dujuan Wang ◽  
Na Geng ◽  
Yanzhang Wang ◽  
Yunqiang Yin ◽  
...  

2020 ◽  
Vol 2020 ◽  
pp. 1-15
Author(s):  
Wenjuan Lian ◽  
Guoqing Nie ◽  
Bin Jia ◽  
Dandan Shi ◽  
Qi Fan ◽  
...  

With the rapid development of the Internet, various forms of network attack have emerged, so how to detect abnormal behavior effectively and to recognize their attack categories accurately have become an important research subject in the field of cyberspace security. Recently, many hot machine learning-based approaches are applied in the Intrusion Detection System (IDS) to construct a data-driven model. The methods are beneficial to reduce the time and cost of manual detection. However, the real-time network data contain an ocean of redundant terms and noises, and some existing intrusion detection technologies have lower accuracy and inadequate ability of feature extraction. In order to solve the above problems, this paper proposes an intrusion detection method based on the Decision Tree-Recursive Feature Elimination (DT-RFE) feature in ensemble learning. We firstly propose a data processing method by the Decision Tree-Based Recursive Elimination Algorithm to select features and to reduce the feature dimension. This method eliminates the redundant and uncorrelated data from the dataset to achieve better resource utilization and to reduce time complexity. In this paper, we use the Stacking ensemble learning algorithm by combining Decision Tree (DT) with Recursive Feature Elimination (RFE) methods. Finally, a series of comparison experiments by cross-validation on the KDD CUP 99 and NSL-KDD datasets indicate that the DT-RFE and Stacking-based approach can better improve the performance of the IDS, and the accuracy for all kinds of features is higher than 99%, except in the case of U2R accuracy, which is 98%.


Author(s):  
NIUSVEL ACOSTA-MENDOZA ◽  
ALICIA MORALES-REYES ◽  
HUGO JAIR ESCALANTE ◽  
ANDRÉS GAGO-ALONSO

This paper introduces a novel approach for building heterogeneous ensembles based on genetic programming (GP). Ensemble learning is a paradigm that aims at combining individual classifier's outputs to improve their performance. Commonly, classifiers outputs are combined by a weighted sum or a voting strategy. However, linear fusion functions may not effectively exploit individual models' redundancy and diversity. In this research, a GP-based approach to learn fusion functions that combine classifiers outputs is proposed. Heterogeneous ensembles are aimed in this study, these models use individual classifiers which are based on different principles (e.g. decision trees and similarity-based techniques). A detailed empirical assessment is carried out to validate the effectiveness of the proposed approach. Results show that the proposed method is successful at building very effective classification models, outperforming alternative ensemble methodologies. The proposed ensemble technique is also applied to fuse homogeneous models' outputs with results also showing its effectiveness. Therefore, an in-depth analysis from different perspectives of the proposed strategy to build ensembles is presented with a strong experimental support.


Foods ◽  
2021 ◽  
Vol 10 (4) ◽  
pp. 809
Author(s):  
Liyang Wang ◽  
Dantong Niu ◽  
Xinjie Zhao ◽  
Xiaoya Wang ◽  
Mengzhen Hao ◽  
...  

Traditional food allergen identification mainly relies on in vivo and in vitro experiments, which often needs a long period and high cost. The artificial intelligence (AI)-driven rapid food allergen identification method has solved the above mentioned some drawbacks and is becoming an efficient auxiliary tool. Aiming to overcome the limitations of lower accuracy of traditional machine learning models in predicting the allergenicity of food proteins, this work proposed to introduce deep learning model—transformer with self-attention mechanism, ensemble learning models (representative as Light Gradient Boosting Machine (LightGBM) eXtreme Gradient Boosting (XGBoost)) to solve the problem. In order to highlight the superiority of the proposed novel method, the study also selected various commonly used machine learning models as the baseline classifiers. The results of 5-fold cross-validation showed that the area under the receiver operating characteristic curve (AUC) of the deep model was the highest (0.9578), which was better than the ensemble learning and baseline algorithms. But the deep model need to be pre-trained, and the training time is the longest. By comparing the characteristics of the transformer model and boosting models, it can be analyzed that, each model has its own advantage, which provides novel clues and inspiration for the rapid prediction of food allergens in the future.


2020 ◽  
Vol 34 (03) ◽  
pp. 2451-2458
Author(s):  
Akansha Bhardwaj ◽  
Jie Yang ◽  
Philippe Cudré-Mauroux

Microblogging platforms such as Twitter are increasingly being used in event detection. Existing approaches mainly use machine learning models and rely on event-related keywords to collect the data for model training. These approaches make strong assumptions on the distribution of the relevant microposts containing the keyword – referred to as the expectation of the distribution – and use it as a posterior regularization parameter during model training. Such approaches are, however, limited as they fail to reliably estimate the informativeness of a keyword and its expectation for model training. This paper introduces a Human-AI loop approach to jointly discover informative keywords for model training while estimating their expectation. Our approach iteratively leverages the crowd to estimate both keyword-specific expectation and the disagreement between the crowd and the model in order to discover new keywords that are most beneficial for model training. These keywords and their expectation not only improve the resulting performance but also make the model training process more transparent. We empirically demonstrate the merits of our approach, both in terms of accuracy and interpretability, on multiple real-world datasets and show that our approach improves the state of the art by 24.3%.


Sign in / Sign up

Export Citation Format

Share Document