Application of an Interpretable Classification Model on Early Folding Residues during Protein Folding

Mapping Intimacies ◽

10.1101/381483 ◽

2018 ◽

Author(s):

Sebastian Bittrich ◽

Marika Kaden ◽

Christoph Leberecht ◽

Florian Kaiser ◽

Thomas Villmann ◽

...

Keyword(s):

Machine Learning ◽

Protein Folding ◽

Learning Strategies ◽

Life Sciences ◽

Classification Model ◽

Classification Problems ◽

Hydrophobic Residues ◽

Imbalanced Classification ◽

Fine Grained ◽

Generalized Matrix

AbstractBackgroundMachine learning strategies are prominent tools for data analysis. Especially in life sciences, they have become increasingly important to handle the growing datasets collected by the scientific community. Meanwhile, algorithms improve in performance, but also gain complexity, and tend to neglect interpretability and comprehensiveness of the resulting models.ResultsGeneralized Matrix Learning Vector Quantization (GMLVQ) is a supervised, prototype-based machine learning method and provides comprehensive visualization capabilities not present in other classifiers which allow for a fine-grained interpretation of the data. In contrast to commonly used machine learning strategies, GMLVQ is well-suited for imbalanced classification problems which are frequent in life sciences. We present a Weka plug-in implementing GMLVQ. The feasibility of GMLVQ is demonstrated on a dataset of Early Folding Residues (EFR) that have been shown to initiate and guide the protein folding process. Using 27 features, an area under the receiver operating characteristic of 76.6% was achieved which is comparable to other state-of-the-art classifiers.ConclusionsThe application on EFR prediction demonstrates how an easy interpretation of classification models can promote the comprehension of biological mechanisms. The results shed light on the special features of EFR which were reported as most influential for the classification: EFR are embedded in ordered secondary structure elements and they participate in networks of hydrophobic residues. Visualization capabilities of GMLVQ are presented as we demonstrate how to interpret the results.

Download Full-text

Exploiting Scalable Machine-Learning Distributed Frameworks to Forecast Power Consumption of Buildings

Energies ◽

10.3390/en12152933 ◽

2019 ◽

Vol 12 (15) ◽

pp. 2933 ◽

Cited By ~ 1

Author(s):

Tania Cerquitelli ◽

Giovanni Malnati ◽

Daniele Apiletti

Keyword(s):

Machine Learning ◽

Random Forest ◽

Power Consumption ◽

Classification Model ◽

Specific Power ◽

Research Issues ◽

Energy Data ◽

Fine Grained ◽

Research Activities ◽

Computer Science Research

The pervasive and increasing deployment of smart meters allows collecting a huge amount of fine-grained energy data in different urban scenarios. The analysis of such data is challenging and opening up a variety of interesting and new research issues across energy and computer science research areas. The key role of computer scientists is providing energy researchers and practitioners with cutting-edge and scalable analytics engines to effectively support their daily research activities, hence fostering and leveraging data-driven approaches. This paper presents SPEC, a scalable and distributed engine to predict building-specific power consumption. SPEC addresses the full analytic stack and exploits a data stream approach over sliding time windows to train a prediction model tailored to each building. The model allows us to predict the upcoming power consumption at a time instant in the near future. SPEC integrates different machine learning approaches, specifically ridge regression, artificial neural networks, and random forest regression, to predict fine-grained values of power consumption, and a classification model, the random forest classifier, to forecast a coarse consumption level. SPEC exploits state-of-the-art distributed computing frameworks to address the big data challenges in harvesting energy data: the current implementation runs on Apache Spark, the most widespread high-performance data-processing platform, and can natively scale to huge datasets. As a case study, SPEC has been tested on real data of an heating distribution network and power consumption data collected in a major Italian city. Experimental results demonstrate the effectiveness of SPEC to forecast both fine-grained values and coarse levels of power consumption of buildings.

Download Full-text

Human Activity Recognition using Smartphone

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.d4521.118419 ◽

2019 ◽

Vol 8 (4) ◽

pp. 10159-10163

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Mobile Phone ◽

Machine Learning Algorithms ◽

Classification Model ◽

Support Vector ◽

Classification Problems ◽

The People ◽

Accuracy And Precision ◽

Multi Classification

This paper consists of all the actions that are done by the people by using mobile phone. We used to record the actions by using the mobile phone. The main aim of this paper is to construct a classification model that is required for identifying actions of the people. This paper is used mainly to get solution of multi-classification problems. By using in a theoretic way, we can understand what the problem is, but we can only solve the problem by performing it in mathematical way. We can get very perfect result by solving in the mathematical way. Here we are deriving the actions of the persons using mobile phone, in a phone there are many sensors present in it. The sensors used in this paper are Accelerometer, Gyroscope. They are required for determining the actions of the person. The output obtained in this paper is used to compare the values in the term’s accuracy and precision. It uses a 3-dimension based accelerometer in order to collect the values obtained; there we determined that 31 values we contained in it. All the actions that are present are derived by using machine learning algorithms, they are, Naïve Bayes Classifiers, support vector machine, and neural networks. The output of the action determination by using the dataset required is used to determine a decrease of marking work to accomplish similar execution with machine learning.

Download Full-text

Comparison of the Different Sampling Techniques for Imbalanced Classification Problems in Machine Learning

2019 11th International Conference on Measuring Technology and Mechatronics Automation (ICMTMA) ◽

10.1109/icmtma.2019.00101 ◽

2019 ◽

Author(s):

Peng Zhihao ◽

Yan Fenglong ◽

Li Xucheng

Keyword(s):

Machine Learning ◽

Sampling Techniques ◽

Classification Problems ◽

Imbalanced Classification

Download Full-text

Cost-Sensitive Learning of Fuzzy Rules for Imbalanced Classification Problems Using FURIA

International Journal of Uncertainty Fuzziness and Knowledge-Based Systems ◽

10.1142/s0218488514500330 ◽

2014 ◽

Vol 22 (05) ◽

pp. 643-675 ◽

Cited By ~ 6

Author(s):

Ana Palacios ◽

Krzysztof Trawiński ◽

Oscar Cordón ◽

Luciano Sánchez

Keyword(s):

Machine Learning ◽

State Of The Art ◽

Numerical Study ◽

Fuzzy Rules ◽

Classification Algorithms ◽

Classification Problems ◽

Cost Sensitive Learning ◽

Imbalanced Classification ◽

Machine Learning Methods ◽

Data Level

This paper is intended to verify that cost-sensitive learning is a competitive approach for learning fuzzy rules in certain imbalanced classification problems. It will be shown that there exist cost matrices whose use in combination with a suitable classifier allows for improving the results of some popular data-level techniques. The well known FURIA algorithm is extended to take advantage of this definition. A numerical study is carried out to compare the proposed cost-sensitive FURIA to other state-of-the-art classification algorithms, based on fuzzy rules and on other classical machine learning methods, on 64 different imbalanced datasets.

Download Full-text

A review and experimental analysis of active learning over crowdsourced data

Artificial Intelligence Review ◽

10.1007/s10462-021-10021-3 ◽

2021 ◽

Author(s):

Burcu Sayin ◽

Evgeny Krivosheev ◽

Jie Yang ◽

Andrea Passerini ◽

Fabio Casati

Keyword(s):

Machine Learning ◽

Active Learning ◽

Learning Strategies ◽

Effective Means ◽

Training Data ◽

Learning Approaches ◽

Data Sampling ◽

Classification Problems ◽

Crowdsourced Data ◽

Hybrid Classification

AbstractTraining data creation is increasingly a key bottleneck for developing machine learning, especially for deep learning systems. Active learning provides a cost-effective means for creating training data by selecting the most informative instances for labeling. Labels in real applications are often collected from crowdsourcing, which engages online crowds for data labeling at scale. Despite the importance of using crowdsourced data in the active learning process, an analysis of how the existing active learning approaches behave over crowdsourced data is currently missing. This paper aims to fill this gap by reviewing the existing active learning approaches and then testing a set of benchmarking ones on crowdsourced datasets. We provide a comprehensive and systematic survey of the recent research on active learning in the hybrid human–machine classification setting, where crowd workers contribute labels (often noisy) to either directly classify data instances or to train machine learning models. We identify three categories of state of the art active learning methods according to whether and how predefined queries employed for data sampling, namely fixed-strategy approaches, dynamic-strategy approaches, and strategy-free approaches. We then conduct an empirical study on their cost-effectiveness, showing that the performance of the existing active learning approaches is affected by many factors in hybrid classification contexts, such as the noise level of data, label fusion technique used, and the specific characteristics of the task. Finally, we discuss challenges and identify potential directions to design active learning strategies for hybrid classification problems.

Download Full-text

A Comparative Study of Different Machine Learning Algorithms for Disease Prediction

International Journal of Advanced Research in Computer Science and Software Engineering ◽

10.23956/ijarcsse/v7i7/0177 ◽

2017 ◽

Vol 7 (7) ◽

pp. 172

Author(s):

Anantvir Singh Romana

Keyword(s):

Machine Learning ◽

Subsequent Treatment ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Support Vector ◽

Disease Prediction ◽

Classification Problems ◽

Learning Techniques ◽

Neural Network Classifiers ◽

Diagnostic Detection

Accurate diagnostic detection of the disease in a patient is critical and may alter the subsequent treatment and increase the chances of survival rate. Machine learning techniques have been instrumental in disease detection and are currently being used in various classification problems due to their accurate prediction performance. Various techniques may provide different desired accuracies and it is therefore imperative to use the most suitable method which provides the best desired results. This research seeks to provide comparative analysis of Support Vector Machine, Naïve bayes, J48 Decision Tree and neural network classifiers breast cancer and diabetes datsets.

Download Full-text

Supervised Classifier Approach for Intrusion Detection on KDD with Optimal MapReduce Framework Model in Cloud Computing

Recent Patents on Computer Science ◽

10.2174/1573401315666190619113510 ◽

2019 ◽

Vol 12 ◽

Author(s):

M. Ilayaraja ◽

S. Hemalatha ◽

P. Manickam ◽

K. Sathesh Kumar ◽

K. Shankar

Keyword(s):

Machine Learning ◽

Cloud Computing ◽

Intrusion Detection ◽

Decision Tree ◽

Learning Strategies ◽

Nearest Neighbor ◽

Detection System ◽

K Nearest Neighbor ◽

Mapreduce Model ◽

The Web

Cloud computing is characterized as the arrangement of assets or administrations accessible through the web to the clients on their request by cloud providers. It communicates everything as administrations over the web in view of the client request, for example operating system, organize equipment, storage, assets, and software. Nowadays, Intrusion Detection System (IDS) plays a powerful system, which deals with the influence of experts to get actions when the system is hacked under some intrusions. Most intrusion detection frameworks are created in light of machine learning strategies. Since the datasets, this utilized as a part of intrusion detection is Knowledge Discovery in Database (KDD). In this paper detect or classify the intruded data utilizing Machine Learning (ML) with the MapReduce model. The primary face considers Hadoop MapReduce model to reduce the extent of database ideal weight decided for reducer model and second stage utilizing Decision Tree (DT) classifier to detect the data. This DT classifier comprises utilizing an appropriate classifier to decide the class labels for the non-homogeneous leaf nodes. The decision tree fragment gives a coarse section profile while the leaf level classifier can give data about the qualities that influence the label inside a portion. From the proposed result accuracy for detection is 96.21% contrasted with existing classifiers, for example, Neural Network (NN), Naive Bayes (NB) and K Nearest Neighbor (KNN).

Download Full-text

Transformer Oil Quality Assessment Using Random Forest with Feature Engineering

Energies ◽

10.3390/en14071809 ◽

2021 ◽

Vol 14 (7) ◽

pp. 1809

Author(s):

Mohammed El Amine Senoussaoui ◽

Mostefa Brahami ◽

Issouf Fofana

Keyword(s):

Machine Learning ◽

Random Forest ◽

Oil Quality ◽

Principal Component ◽

Condition Assessment ◽

Classification Performance ◽

Transformer Oil ◽

Classification Model ◽

Insulation Degradation ◽

Transformer Oils

Machine learning is widely used as a panacea in many engineering applications including the condition assessment of power transformers. Most statistics attribute the main cause of transformer failure to insulation degradation. Thus, a new, simple, and effective machine-learning approach was proposed to monitor the condition of transformer oils based on some aging indicators. The proposed approach was used to compare the performance of two machine-learning classifiers: J48 decision tree and random forest. The service-aged transformer oils were classified into four groups: the oils that can be maintained in service, the oils that should be reconditioned or filtered, the oils that should be reclaimed, and the oils that must be discarded. From the two algorithms, random forest exhibited a better performance and high accuracy with only a small amount of data. Good performance was achieved through not only the application of the proposed algorithm but also the approach of data preprocessing. Before feeding the classification model, the available data were transformed using the simple k-means method. Subsequently, the obtained data were filtered through correlation-based feature selection (CFsSubset). The resulting features were again retransformed by conducting the principal component analysis and were passed through the CFsSubset filter. The transformation and filtration of the data improved the classification performance of the adopted algorithms, especially random forest. Another advantage of the proposed method is the decrease in the number of the datasets required for the condition assessment of transformer oils, which is valuable for transformer condition monitoring.

Download Full-text

Using Machine Learning for Quantum Annealing Accuracy Prediction

Algorithms ◽

10.3390/a14060187 ◽

2021 ◽

Vol 14 (6) ◽

pp. 187

Author(s):

Aaron Barbosa ◽

Elijah Pelofske ◽

Georg Hahn ◽

Hristo N. Djidjev

Keyword(s):

Machine Learning ◽

Maximum Clique ◽

Classification Model ◽

Maximum Clique Problem ◽

Problem Instance ◽

Np Hard ◽

Machine Learning Classification ◽

Hard Problems ◽

Problem Instances ◽

D Wave

Quantum annealers, such as the device built by D-Wave Systems, Inc., offer a way to compute solutions of NP-hard problems that can be expressed in Ising or quadratic unconstrained binary optimization (QUBO) form. Although such solutions are typically of very high quality, problem instances are usually not solved to optimality due to imperfections of the current generations quantum annealers. In this contribution, we aim to understand some of the factors contributing to the hardness of a problem instance, and to use machine learning models to predict the accuracy of the D-Wave 2000Q annealer for solving specific problems. We focus on the maximum clique problem, a classic NP-hard problem with important applications in network analysis, bioinformatics, and computational chemistry. By training a machine learning classification model on basic problem characteristics such as the number of edges in the graph, or annealing parameters, such as the D-Wave’s chain strength, we are able to rank certain features in the order of their contribution to the solution hardness, and present a simple decision tree which allows to predict whether a problem will be solvable to optimality with the D-Wave 2000Q. We extend these results by training a machine learning regression model that predicts the clique size found by D-Wave.

Download Full-text

A Leaf Disease Classification Model in Betel Vine Using Machine Learning Techniques

2021 2nd International Conference on Robotics, Electrical and Signal Processing Techniques (ICREST) ◽

10.1109/icrest51555.2021.9331142 ◽

2021 ◽

Author(s):

Md Zahid Hasan ◽

Nahid Zeba ◽

Md. Abdul Malek ◽

Sanjida Sultana Reya

Keyword(s):

Machine Learning ◽

Disease Classification ◽

Classification Model ◽

Machine Learning Techniques ◽

Leaf Disease ◽

Learning Techniques

Download Full-text