scholarly journals Symmetrical Uncertainty-Based Feature Subset Generation and Ensemble Learning for Electricity Customer Classification

Symmetry ◽  
2019 ◽  
Vol 11 (4) ◽  
pp. 498 ◽  
Author(s):  
Minghao Piao ◽  
Yongjun Piao ◽  
Jong Lee

The use of actual electricity consumption data provided the chance to detect the change of customer class types. This work could be done by using classification techniques. However, there are several challenges in computational techniques. The most important one is to efficiently handle a large number of dimensions to increase customer classification performance. In this paper, we proposed a symmetrical uncertainty based feature subset generation and ensemble learning method for the electricity customer classification. Redundant and significant feature sets are generated according to symmetrical uncertainty. After that, a classifier ensemble is built based on significant feature sets and the results are combined for the final decision. The results show that the proposed method can efficiently find useful feature subsets and improve classification performance.

2014 ◽  
Vol 2014 ◽  
pp. 1-9 ◽  
Author(s):  
José Hernández-Torruco ◽  
Juana Canul-Reich ◽  
Juan Frausto-Solís ◽  
Juan José Méndez-Castillo

Guillain-Barré syndrome (GBS) is a neurological disorder which has not been explored using clustering algorithms. Clustering algorithms perform more efficiently when they work only with relevant features. In this work, we applied correlation-based feature selection (CFS), chi-squared, information gain, symmetrical uncertainty, and consistency filter methods to select the most relevant features from a 156-feature real dataset. This dataset contains clinical, serological, and nerve conduction tests data obtained from GBS patients. The most relevant feature subsets, determined with each filter method, were used to identify four subtypes of GBS present in the dataset. We used partitions around medoids (PAM) clustering algorithm to form four clusters, corresponding to the GBS subtypes. We applied the purity of each cluster as evaluation measure. After experimentation, symmetrical uncertainty and information gain determined a feature subset of seven variables. These variables conformed as a dataset were used as input to PAM and reached a purity of 0.7984. This result leads to a first characterization of this syndrome using computational techniques.


Author(s):  
Mahua Bhattacharya ◽  
Arpita Das

The problem of feature selection consists of finding a significant feature subset of input training as well as test patterns that enable to describe all information required to classify a particular pattern. In present paper we focus in this particular problem which plays a key role in machine learning problems. In fact, before building a model for feature selection, our goal is to identify and to reject the features that degrade the classification performance of a classifier. This is especially true when the available input feature space is very large, and need exists to develop an efficient searching algorithm to combine these features spaces to a few significant one which are capable to represent that particular class. Presently, authors have described two approaches for combining the large feature spaces to efficient numbers using Genetic Algorithm and Fuzzy Clustering techniques. Finally the classification of patterns has been achieved using adaptive neuro-fuzzy techniques. The aim of entire work is to implement the recognition scheme for classification of tumor lesions appearing in human brain as space occupying lesions identified by CT and MR images. A part of the work has been presented in this paper. The proposed model indicates a promising direction for adaptation in a changing environment.


2021 ◽  
Vol 21 (S2) ◽  
Author(s):  
Kun Zeng ◽  
Yibin Xu ◽  
Ge Lin ◽  
Likeng Liang ◽  
Tianyong Hao

Abstract Background Eligibility criteria are the primary strategy for screening the target participants of a clinical trial. Automated classification of clinical trial eligibility criteria text by using machine learning methods improves recruitment efficiency to reduce the cost of clinical research. However, existing methods suffer from poor classification performance due to the complexity and imbalance of eligibility criteria text data. Methods An ensemble learning-based model with metric learning is proposed for eligibility criteria classification. The model integrates a set of pre-trained models including Bidirectional Encoder Representations from Transformers (BERT), A Robustly Optimized BERT Pretraining Approach (RoBERTa), XLNet, Pre-training Text Encoders as Discriminators Rather Than Generators (ELECTRA), and Enhanced Representation through Knowledge Integration (ERNIE). Focal Loss is used as a loss function to address the data imbalance problem. Metric learning is employed to train the embedding of each base model for feature distinguish. Soft Voting is applied to achieve final classification of the ensemble model. The dataset is from the standard evaluation task 3 of 5th China Health Information Processing Conference containing 38,341 eligibility criteria text in 44 categories. Results Our ensemble method had an accuracy of 0.8497, a precision of 0.8229, and a recall of 0.8216 on the dataset. The macro F1-score was 0.8169, outperforming state-of-the-art baseline methods by 0.84% improvement on average. In addition, the performance improvement had a p-value of 2.152e-07 with a standard t-test, indicating that our model achieved a significant improvement. Conclusions A model for classifying eligibility criteria text of clinical trials based on multi-model ensemble learning and metric learning was proposed. The experiments demonstrated that the classification performance was improved by our ensemble model significantly. In addition, metric learning was able to improve word embedding representation and the focal loss reduced the impact of data imbalance to model performance.


Atmosphere ◽  
2021 ◽  
Vol 12 (7) ◽  
pp. 869
Author(s):  
Xiuguo Zou ◽  
Jiahong Wu ◽  
Zhibin Cao ◽  
Yan Qian ◽  
Shixiu Zhang ◽  
...  

In order to adequately characterize the visual characteristics of atmospheric visibility and overcome the disadvantages of the traditional atmospheric visibility measurement method with significant dependence on preset reference objects, high cost, and complicated steps, this paper proposed an ensemble learning method for atmospheric visibility grading based on deep neural network and stochastic weight averaging. An experiment was conducted using the scene of an expressway, and three visibility levels were set, i.e., Level 1, Level 2, and Level 3. Firstly, the EfficientNet was transferred to extract the abstract features of the images. Then, training and grading were performed on the feature sets through the SoftMax regression model. Subsequently, the feature sets were ensembled using the method of stochastic weight averaging to obtain the atmospheric visibility grading model. The obtained datasets were input into the grading model and tested. The grading model classified the results into three categories, with the grading accuracy being 95.00%, 89.45%, and 90.91%, respectively, and the average accuracy of 91.79%. The results obtained by the proposed method were compared with those obtained by the existing methods, and the proposed method showed better performance than those of other methods. This method can be used to classify the atmospheric visibility of traffic and reduce the incidence of traffic accidents caused by atmospheric visibility.


Electronics ◽  
2021 ◽  
Vol 10 (14) ◽  
pp. 1714
Author(s):  
Mohamed Marey ◽  
Hala Mostafa

In this work, we propose a general framework to design a signal classification algorithm over time selective channels for wireless communications applications. We derive an upper bound on the maximum number of observation samples over which the channel response is an essential invariant. The proposed framework relies on dividing the received signal into blocks, and each of them has a length less than the mentioned bound. Then, these blocks are fed into a number of classifiers in a parallel fashion. A final decision is made through a well-designed combiner and detector. As a case study, we employ the proposed framework on a space-time block-code classification problem by developing two combiners and detectors. Monte Carlo simulations show that the proposed framework is capable of achieving excellent classification performance over time selective channels compared to the conventional algorithms.


2021 ◽  
Author(s):  
Arjun Singh

Abstract Drug discovery is incredibly time-consuming and expensive, averaging over 10 years and $985 million per drug. Calculating the binding affinity between a target protein and a ligand is critical for discovering viable drugs. Although supervised machine learning (ML) models can predict binding affinity accurately, they suffer from lack of interpretability and inaccurate feature selection caused by multicollinear data. This study used self-supervised ML to reveal underlying protein-ligand characteristics that strongly influence binding affinity. Protein-ligand 3D models were collected from the PDBBind database and vectorized into 2422 features per complex. LASSO Regression and hierarchical clustering were utilized to minimize multicollinearity between features. Correlation analyses and Autoencoder-based latent space representations were generated to identify features significantly influencing binding affinity. A Generative Adversarial Network was used to simulate ligands with certain counts of a significant feature, and thereby determine the effect of a feature on improving binding affinity with a given target protein. It was found that the CC and CCCN fragment counts in the ligand notably influence binding affinity. Re-pairing proteins with simulated ligands that had higher CC and CCCN fragment counts could increase binding affinity by 34.99-37.62% and 36.83%-36.94%, respectively. This discovery contributes to a more accurate representation of ligand chemistry that can increase the accuracy, explainability, and generalizability of ML models so that they can more reliably identify novel drug candidates. Directions for future work include integrating knowledge on ligand fragments into supervised ML models, examining the effect of CC and CCCN fragments on fragment-based drug design, and employing computational techniques to elucidate the chemical activity of these fragments.


2021 ◽  
Vol 6 (3) ◽  
pp. 177
Author(s):  
Muhamad Arief Hidayat

In health science there is a technique to determine the level of risk of pregnancy, namely the Poedji Rochyati score technique. In this evaluation technique, the level of pregnancy risk is calculated from the values ​​of 22 parameters obtained from pregnant women. Under certain conditions, some parameter values ​​are unknown. This causes the level of risk of pregnancy can not be calculated. For that we need a way to predict pregnancy risk status in cases of incomplete attribute values. There are several studies that try to overcome this problem. The research "classification of pregnancy risk using cost sensitive learning" [3] applies cost sensitive learning to the process of classifying the level of pregnancy risk. In this study, the best classification accuracy achieved was 73% and the best value was 77.9%. To increase the accuracy and recall of predicting pregnancy risk status, in this study several improvements were proposed. 1) Using ensemble learning based on classification tree 2) using the SVMattributeEvaluator evaluator to optimize the feature subset selection stage. In the trials conducted using the classification tree-based ensemble learning method and the SVMattributeEvaluator at the feature subset selection stage, the best value for accuracy was up to 76% and the best value for recall was up to 89.5%


2017 ◽  
Vol 2017 ◽  
pp. 1-22 ◽  
Author(s):  
Jihyun Kim ◽  
Thi-Thu-Huong Le ◽  
Howon Kim

Monitoring electricity consumption in the home is an important way to help reduce energy usage. Nonintrusive Load Monitoring (NILM) is existing technique which helps us monitor electricity consumption effectively and costly. NILM is a promising approach to obtain estimates of the electrical power consumption of individual appliances from aggregate measurements of voltage and/or current in the distribution system. Among the previous studies, Hidden Markov Model (HMM) based models have been studied very much. However, increasing appliances, multistate of appliances, and similar power consumption of appliances are three big issues in NILM recently. In this paper, we address these problems through providing our contributions as follows. First, we proposed state-of-the-art energy disaggregation based on Long Short-Term Memory Recurrent Neural Network (LSTM-RNN) model and additional advanced deep learning. Second, we proposed a novel signature to improve classification performance of the proposed model in multistate appliance case. We applied the proposed model on two datasets such as UK-DALE and REDD. Via our experimental results, we have confirmed that our model outperforms the advanced model. Thus, we show that our combination between advanced deep learning and novel signature can be a robust solution to overcome NILM’s issues and improve the performance of load identification.


2012 ◽  
Vol 2012 ◽  
pp. 1-20 ◽  
Author(s):  
Gulshan Kumar ◽  
Krishan Kumar

In supervised learning-based classification, ensembles have been successfully employed to different application domains. In the literature, many researchers have proposed different ensembles by considering different combination methods, training datasets, base classifiers, and many other factors. Artificial-intelligence-(AI-) based techniques play prominent role in development of ensemble for intrusion detection (ID) and have many benefits over other techniques. However, there is no comprehensive review of ensembles in general and AI-based ensembles for ID to examine and understand their current research status to solve the ID problem. Here, an updated review of ensembles and their taxonomies has been presented in general. The paper also presents the updated review of various AI-based ensembles for ID (in particular) during last decade. The related studies of AI-based ensembles are compared by set of evaluation metrics driven from (1) architecture & approach followed; (2) different methods utilized in different phases of ensemble learning; (3) other measures used to evaluate classification performance of the ensembles. The paper also provides the future directions of the research in this area. The paper will help the better understanding of different directions in which research of ensembles has been done in general and specifically: field of intrusion detection systems (IDSs).


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Arijit Dey ◽  
Soham Chattopadhyay ◽  
Pawan Kumar Singh ◽  
Ali Ahmadian ◽  
Massimiliano Ferrara ◽  
...  

AbstractCOVID-19 is a respiratory disease that causes infection in both lungs and the upper respiratory tract. The World Health Organization (WHO) has declared it a global pandemic because of its rapid spread across the globe. The most common way for COVID-19 diagnosis is real-time reverse transcription-polymerase chain reaction (RT-PCR) which takes a significant amount of time to get the result. Computer based medical image analysis is more beneficial for the diagnosis of such disease as it can give better results in less time. Computed Tomography (CT) scans are used to monitor lung diseases including COVID-19. In this work, a hybrid model for COVID-19 detection has developed which has two key stages. In the first stage, we have fine-tuned the parameters of the pre-trained convolutional neural networks (CNNs) to extract some features from the COVID-19 affected lungs. As pre-trained CNNs, we have used two standard CNNs namely, GoogleNet and ResNet18. Then, we have proposed a hybrid meta-heuristic feature selection (FS) algorithm, named as Manta Ray Foraging based Golden Ratio Optimizer (MRFGRO) to select the most significant feature subset. The proposed model is implemented over three publicly available datasets, namely, COVID-CT dataset, SARS-COV-2 dataset, and MOSMED dataset, and attains state-of-the-art classification accuracies of 99.15%, 99.42% and 95.57% respectively. Obtained results confirm that the proposed approach is quite efficient when compared to the local texture descriptors used for COVID-19 detection from chest CT-scan images.


Sign in / Sign up

Export Citation Format

Share Document