Discriminatory Target Learning: Mining Significant Dependence Relationships from Labeled and Unlabeled Data

Zhi-Yi Duan; Li-Min Wang; Musa Mammadov; Hua Lou; Ming-Hui Sun

doi:10.3390/e21050537

Discriminatory Target Learning: Mining Significant Dependence Relationships from Labeled and Unlabeled Data

Entropy ◽

10.3390/e21050537 ◽

2019 ◽

Vol 21 (5) ◽

pp. 537

Author(s):

Zhi-Yi Duan ◽

Li-Min Wang ◽

Musa Mammadov ◽

Hua Lou ◽

Ming-Hui Sun

Keyword(s):

Classification Performance ◽

Training Data ◽

Machine Learning Techniques ◽

Experimental Comparison ◽

Final Model ◽

Conditional Dependence ◽

Bayesian Network Classifiers ◽

Learning Techniques ◽

Class Labels ◽

Unlabeled Instance

Machine learning techniques have shown superior predictive power, among which Bayesian network classifiers (BNCs) have remained of great interest due to its capacity to demonstrate complex dependence relationships. Most traditional BNCs tend to build only one model to fit training instances by analyzing independence between attributes using conditional mutual information. However, for different class labels, the conditional dependence relationships may be different rather than invariant when attributes take different values, which may result in classification bias. To address this issue, we propose a novel framework, called discriminatory target learning, which can be regarded as a tradeoff between probabilistic model learned from unlabeled instance at the uncertain end and that learned from labeled training data at the certain end. The final model can discriminately represent the dependence relationships hidden in unlabeled instance with respect to different possible class labels. Taking k-dependence Bayesian classifier as an example, experimental comparison on 42 publicly available datasets indicated that the final model achieved competitive classification performance compared to state-of-the-art learners such as Random forest and averaged one-dependence estimators.

Download Full-text

A Comparison of Resampling Techniques for Medical Data Using Machine Learning

Journal of Information & Knowledge Management ◽

10.1142/s021964922040016x ◽

2020 ◽

Vol 19 (01) ◽

pp. 2040016

Author(s):

Fahad Alahmari

Keyword(s):

Machine Learning ◽

Class Imbalance ◽

Medical Application ◽

Autism Spectrum ◽

Classification Performance ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Learning Techniques ◽

Under Sampling ◽

Class Labels

Data imbalance with respect to the class labels has been recognised as a challenging problem for machine learning techniques as it has a direct impact on the classification model’s performance. In an imbalanced dataset, most of the instances belong to one class, while far fewer instances are associated with the remaining classes. Most of the machine learning algorithms tend to favour the majority class and ignore the minority classes leading to classification models being generated that cannot be generalised. This paper investigates the problem of class imbalance for a medical application related to autism spectrum disorder (ASD) screening to identify the ideal data resampling method that can stabilise classification performance. To achieve the aim, experimental analyses to measure the performance of different oversampling and under-sampling techniques have been conducted on a real imbalanced ASD dataset related to adults. The results produced by multiple classifiers on the considered datasets showed superiority in terms of specificity, sensitivity, and precision, among others, when adopting oversampling techniques in the pre-processing phase.

Download Full-text

Predicting Facial Biotypes Using Continuous Bayesian Network Classifiers

Complexity ◽

10.1155/2018/4075656 ◽

2018 ◽

Vol 2018 ◽

pp. 1-14 ◽

Cited By ~ 1

Author(s):

Gonzalo A. Ruz ◽

Pamela Araya-Díaz

Keyword(s):

Bayesian Networks ◽

Bayesian Network ◽

Classification Problem ◽

Classification Performance ◽

Machine Learning Techniques ◽

Support Vector ◽

Bayesian Network Classifiers ◽

Tree Construction ◽

Learning Techniques ◽

Vector Machines

Bayesian networks are useful machine learning techniques that are able to combine quantitative modeling, through probability theory, with qualitative modeling, through graph theory for visualization. We apply Bayesian network classifiers to the facial biotype classification problem, an important stage during orthodontic treatment planning. For this, we present adaptations of classical Bayesian networks classifiers to handle continuous attributes; also, we propose an incremental tree construction procedure for tree like Bayesian network classifiers. We evaluate the performance of the proposed adaptations and compare them with other continuous Bayesian network classifiers approaches as well as support vector machines. The results under the classification performance measures, accuracy and kappa, showed the effectiveness of the continuous Bayesian network classifiers, especially for the case when a reduced number of attributes were used. Additionally, the resulting networks allowed visualizing the probability relations amongst the attributes under this classification problem, a useful tool for decision-making for orthodontists.

Download Full-text

Building Bayesian network classifiers through a Bayesian complexity monitoring system

Proceedings of the Institution of Mechanical Engineers Part C Journal of Mechanical Engineering Science ◽

10.1243/09544062jmes1243 ◽

2008 ◽

Vol 223 (3) ◽

pp. 743-755 ◽

Cited By ~ 10

Author(s):

G A Ruz ◽

D T Pham

Keyword(s):

Bayesian Network ◽

Training Data ◽

Machine Learning Techniques ◽

Data Sets ◽

Learning Method ◽

Benchmark Data ◽

Bayesian Network Classifiers ◽

Tree Construction ◽

Learning Techniques ◽

Bayesian Network Classifier

Nowadays, the need for practical yet efficient machine learning techniques for engineering applications are highly in demand. A new learning method for building Bayesian network classifiers is presented in this article. The proposed method augments the naive Bayesian (NB) classifier by using the Chow and Liu tree construction method, but introducing a Bayesian approach to control the accuracy and complexity of the resulting network, which yields simple structures that are not necessarily a spanning tree. Experiments by using benchmark data sets show that the number of augmenting edges by using the proposed learning method depends on the number of training data used. The classification accuracy was better, or at least equal, to the NB and the tree augmented NB models when tested on 10 benchmark data sets. The evaluation on a real industrial application showed that the simple Bayesian network classifier outperformed the C4.5 and the random forest algorithms and achieved competitive results against C5.0 and a neural network.

Download Full-text

Comparative Analysis of Machine Learning Techniques Using Predictive Modeling

Recent Advances in Computer Science and Communications ◽

10.2174/2666255813999200904164539 ◽

2020 ◽

Vol 13 ◽

Author(s):

Ritu Khandelwal ◽

Hemlata Goyal ◽

Rajveer Singh Shekhawat

Keyword(s):

Machine Learning ◽

Comparative Analysis ◽

Data Science ◽

Training Data ◽

Machine Learning Techniques ◽

Future Trends ◽

Data Set ◽

Learning Stage ◽

Learning Techniques ◽

Different Types

Introduction: Machine learning is an intelligent technology that works as a bridge between businesses and data science. With the involvement of data science, the business goal focuses on findings to get valuable insights on available data. The large part of Indian Cinema is Bollywood which is a multi-million dollar industry. This paper attempts to predict whether the upcoming Bollywood Movie would be Blockbuster, Superhit, Hit, Average or Flop. For this Machine Learning techniques (classification and prediction) will be applied. To make classifier or prediction model first step is the learning stage in which we need to give the training data set to train the model by applying some technique or algorithm and after that different rules are generated which helps to make a model and predict future trends in different types of organizations. Methods: All the techniques related to classification and Prediction such as Support Vector Machine(SVM), Random Forest, Decision Tree, Naïve Bayes, Logistic Regression, Adaboost, and KNN will be applied and try to find out efficient and effective results. All these functionalities can be applied with GUI Based workflows available with various categories such as data, Visualize, Model, and Evaluate. Result: To make classifier or prediction model first step is learning stage in which we need to give the training data set to train the model by applying some technique or algorithm and after that different rules are generated which helps to make a model and predict future trends in different types of organizations Conclusion: This paper focuses on Comparative Analysis that would be performed based on different parameters such as Accuracy, Confusion Matrix to identify the best possible model for predicting the movie Success. By using Advertisement Propaganda, they can plan for the best time to release the movie according to the predicted success rate to gain higher benefits. Discussion: Data Mining is the process of discovering different patterns from large data sets and from that various relationships are also discovered to solve various problems that come in business and helps to predict the forthcoming trends. This Prediction can help Production Houses for Advertisement Propaganda and also they can plan their costs and by assuring these factors they can make the movie more profitable.

Download Full-text

Application of deep and machine learning techniques for multi-label classification performance on psychotic disorder diseases

Informatics in Medicine Unlocked ◽

10.1016/j.imu.2021.100545 ◽

2021 ◽

Vol 23 ◽

pp. 100545

Author(s):

Israel Elujide ◽

Stephen G. Fashoto ◽

Bunmi Fashoto ◽

Elliot Mbunge ◽

Sakinat O. Folorunso ◽

...

Keyword(s):

Machine Learning ◽

Psychotic Disorder ◽

Classification Performance ◽

Machine Learning Techniques ◽

Learning Techniques

Download Full-text

Structure Extension of Tree-Augmented Naive Bayes

Entropy ◽

10.3390/e21080721 ◽

2019 ◽

Vol 21 (8) ◽

pp. 721 ◽

Cited By ~ 1

Author(s):

YuGuang Long ◽

LiMin Wang ◽

MingHui Sun

Keyword(s):

Naive Bayes ◽

Classification Performance ◽

Naïve Bayes ◽

Training Data ◽

Conditional Probability Distribution ◽

Independence Assumption ◽

Bayesian Network Classifiers ◽

Leibler Divergence ◽

The Difference ◽

Structure Extension

Due to the simplicity and competitive classification performance of the naive Bayes (NB), researchers have proposed many approaches to improve NB by weakening its attribute independence assumption. Through the theoretical analysis of Kullback–Leibler divergence, the difference between NB and its variations lies in different orders of conditional mutual information represented by these augmenting edges in the tree-shaped network structure. In this paper, we propose to relax the independence assumption by further generalizing tree-augmented naive Bayes (TAN) from 1-dependence Bayesian network classifiers (BNC) to arbitrary k-dependence. Sub-models of TAN that are built to respectively represent specific conditional dependence relationships may “best match” the conditional probability distribution over the training data. Extensive experimental results reveal that the proposed algorithm achieves bias-variance trade-off and substantially better generalization performance than state-of-the-art classifiers such as logistic regression.

Download Full-text

A sentiment analysis system for social media using machine learning techniques: Social enablement

Digital Scholarship in the Humanities ◽

10.1093/llc/fqy037 ◽

2018 ◽

Vol 34 (3) ◽

pp. 569-581 ◽

Cited By ~ 1

Author(s):

Sujata Rani ◽

Parteek Kumar

Keyword(s):

Machine Learning ◽

Social Media ◽

Sentiment Analysis ◽

Media Analysis ◽

Training Data ◽

Machine Learning Techniques ◽

Support Vector ◽

Analysis Tool ◽

Data Set ◽

Learning Techniques

Abstract In this article, an innovative approach to perform the sentiment analysis (SA) has been presented. The proposed system handles the issues of Romanized or abbreviated text and spelling variations in the text to perform the sentiment analysis. The training data set of 3,000 movie reviews and tweets has been manually labeled by native speakers of Hindi in three classes, i.e. positive, negative, and neutral. The system uses WEKA (Waikato Environment for Knowledge Analysis) tool to convert these string data into numerical matrices and applies three machine learning techniques, i.e. Naive Bayes (NB), J48, and support vector machine (SVM). The proposed system has been tested on 100 movie reviews and tweets, and it has been observed that SVM has performed best in comparison to other classifiers, and it has an accuracy of 68% for movie reviews and 82% in case of tweets. The results of the proposed system are very promising and can be used in emerging applications like SA of product reviews and social media analysis. Additionally, the proposed system can be used in other cultural/social benefits like predicting/fighting human riots.

Download Full-text

Artificially Generated Training Data-sets for Supervised Machine Learning Techniques in Magnetic Resonance Imaging: An Example in Myocardial Segmentation

2019 Computing in Cardiology Conference (CinC) ◽

10.22489/cinc.2019.220 ◽

2019 ◽

Author(s):

Christos Xanthis ◽

Kostas Haris ◽

Dimitrios Filos ◽

Anthony Aletras

Keyword(s):

Magnetic Resonance Imaging ◽

Machine Learning ◽

Magnetic Resonance ◽

Training Data ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Data Sets ◽

Resonance Imaging ◽

Learning Techniques ◽

Myocardial Segmentation

Download Full-text

Training a deep learning model for single-cell segmentation without manual annotation

Scientific Reports ◽

10.1038/s41598-021-03299-4 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Nizam Ud Din ◽

Ji Yu

Keyword(s):

Machine Learning ◽

Training Data ◽

Machine Learning Techniques ◽

Cell Segmentation ◽

Learning Techniques ◽

Signal Characteristics ◽

Human Operators ◽

Deep Learning Model ◽

Microscopy Images ◽

Segmentation Models

AbstractAdvances in the artificial neural network have made machine learning techniques increasingly more important in image analysis tasks. Recently, convolutional neural networks (CNN) have been applied to the problem of cell segmentation from microscopy images. However, previous methods used a supervised training paradigm in order to create an accurate segmentation model. This strategy requires a large amount of manually labeled cellular images, in which accurate segmentations at pixel level were produced by human operators. Generating training data is expensive and a major hindrance in the wider adoption of machine learning based methods for cell segmentation. Here we present an alternative strategy that trains CNNs without any human-labeled data. We show that our method is able to produce accurate segmentation models, and is applicable to both fluorescence and bright-field images, and requires little to no prior knowledge of the signal characteristics.

Download Full-text

TEJU: Fraud Detection and Improving Classification Performance for Bankruptcy Datasets Using Machine Learning Techniques

SSRN Electronic Journal ◽

10.2139/ssrn.3356511 ◽

2019 ◽

Author(s):

Srikanth Panigrahi ◽

Kolla Saitejaswi ◽

Dharmaiah Devarapalli

Keyword(s):

Machine Learning ◽

Fraud Detection ◽

Classification Performance ◽

Machine Learning Techniques ◽

Learning Techniques

Download Full-text