scholarly journals Discriminatory Target Learning: Mining Significant Dependence Relationships from Labeled and Unlabeled Data

Entropy ◽  
2019 ◽  
Vol 21 (5) ◽  
pp. 537
Author(s):  
Zhi-Yi Duan ◽  
Li-Min Wang ◽  
Musa Mammadov ◽  
Hua Lou ◽  
Ming-Hui Sun

Machine learning techniques have shown superior predictive power, among which Bayesian network classifiers (BNCs) have remained of great interest due to its capacity to demonstrate complex dependence relationships. Most traditional BNCs tend to build only one model to fit training instances by analyzing independence between attributes using conditional mutual information. However, for different class labels, the conditional dependence relationships may be different rather than invariant when attributes take different values, which may result in classification bias. To address this issue, we propose a novel framework, called discriminatory target learning, which can be regarded as a tradeoff between probabilistic model learned from unlabeled instance at the uncertain end and that learned from labeled training data at the certain end. The final model can discriminately represent the dependence relationships hidden in unlabeled instance with respect to different possible class labels. Taking k-dependence Bayesian classifier as an example, experimental comparison on 42 publicly available datasets indicated that the final model achieved competitive classification performance compared to state-of-the-art learners such as Random forest and averaged one-dependence estimators.

2020 ◽  
Vol 19 (01) ◽  
pp. 2040016
Author(s):  
Fahad Alahmari

Data imbalance with respect to the class labels has been recognised as a challenging problem for machine learning techniques as it has a direct impact on the classification model’s performance. In an imbalanced dataset, most of the instances belong to one class, while far fewer instances are associated with the remaining classes. Most of the machine learning algorithms tend to favour the majority class and ignore the minority classes leading to classification models being generated that cannot be generalised. This paper investigates the problem of class imbalance for a medical application related to autism spectrum disorder (ASD) screening to identify the ideal data resampling method that can stabilise classification performance. To achieve the aim, experimental analyses to measure the performance of different oversampling and under-sampling techniques have been conducted on a real imbalanced ASD dataset related to adults. The results produced by multiple classifiers on the considered datasets showed superiority in terms of specificity, sensitivity, and precision, among others, when adopting oversampling techniques in the pre-processing phase.


Complexity ◽  
2018 ◽  
Vol 2018 ◽  
pp. 1-14 ◽  
Author(s):  
Gonzalo A. Ruz ◽  
Pamela Araya-Díaz

Bayesian networks are useful machine learning techniques that are able to combine quantitative modeling, through probability theory, with qualitative modeling, through graph theory for visualization. We apply Bayesian network classifiers to the facial biotype classification problem, an important stage during orthodontic treatment planning. For this, we present adaptations of classical Bayesian networks classifiers to handle continuous attributes; also, we propose an incremental tree construction procedure for tree like Bayesian network classifiers. We evaluate the performance of the proposed adaptations and compare them with other continuous Bayesian network classifiers approaches as well as support vector machines. The results under the classification performance measures, accuracy and kappa, showed the effectiveness of the continuous Bayesian network classifiers, especially for the case when a reduced number of attributes were used. Additionally, the resulting networks allowed visualizing the probability relations amongst the attributes under this classification problem, a useful tool for decision-making for orthodontists.


Author(s):  
G A Ruz ◽  
D T Pham

Nowadays, the need for practical yet efficient machine learning techniques for engineering applications are highly in demand. A new learning method for building Bayesian network classifiers is presented in this article. The proposed method augments the naive Bayesian (NB) classifier by using the Chow and Liu tree construction method, but introducing a Bayesian approach to control the accuracy and complexity of the resulting network, which yields simple structures that are not necessarily a spanning tree. Experiments by using benchmark data sets show that the number of augmenting edges by using the proposed learning method depends on the number of training data used. The classification accuracy was better, or at least equal, to the NB and the tree augmented NB models when tested on 10 benchmark data sets. The evaluation on a real industrial application showed that the simple Bayesian network classifier outperformed the C4.5 and the random forest algorithms and achieved competitive results against C5.0 and a neural network.


Author(s):  
Ritu Khandelwal ◽  
Hemlata Goyal ◽  
Rajveer Singh Shekhawat

Introduction: Machine learning is an intelligent technology that works as a bridge between businesses and data science. With the involvement of data science, the business goal focuses on findings to get valuable insights on available data. The large part of Indian Cinema is Bollywood which is a multi-million dollar industry. This paper attempts to predict whether the upcoming Bollywood Movie would be Blockbuster, Superhit, Hit, Average or Flop. For this Machine Learning techniques (classification and prediction) will be applied. To make classifier or prediction model first step is the learning stage in which we need to give the training data set to train the model by applying some technique or algorithm and after that different rules are generated which helps to make a model and predict future trends in different types of organizations. Methods: All the techniques related to classification and Prediction such as Support Vector Machine(SVM), Random Forest, Decision Tree, Naïve Bayes, Logistic Regression, Adaboost, and KNN will be applied and try to find out efficient and effective results. All these functionalities can be applied with GUI Based workflows available with various categories such as data, Visualize, Model, and Evaluate. Result: To make classifier or prediction model first step is learning stage in which we need to give the training data set to train the model by applying some technique or algorithm and after that different rules are generated which helps to make a model and predict future trends in different types of organizations Conclusion: This paper focuses on Comparative Analysis that would be performed based on different parameters such as Accuracy, Confusion Matrix to identify the best possible model for predicting the movie Success. By using Advertisement Propaganda, they can plan for the best time to release the movie according to the predicted success rate to gain higher benefits. Discussion: Data Mining is the process of discovering different patterns from large data sets and from that various relationships are also discovered to solve various problems that come in business and helps to predict the forthcoming trends. This Prediction can help Production Houses for Advertisement Propaganda and also they can plan their costs and by assuring these factors they can make the movie more profitable.


2021 ◽  
Vol 23 ◽  
pp. 100545
Author(s):  
Israel Elujide ◽  
Stephen G. Fashoto ◽  
Bunmi Fashoto ◽  
Elliot Mbunge ◽  
Sakinat O. Folorunso ◽  
...  

Entropy ◽  
2019 ◽  
Vol 21 (8) ◽  
pp. 721 ◽  
Author(s):  
YuGuang Long ◽  
LiMin Wang ◽  
MingHui Sun

Due to the simplicity and competitive classification performance of the naive Bayes (NB), researchers have proposed many approaches to improve NB by weakening its attribute independence assumption. Through the theoretical analysis of Kullback–Leibler divergence, the difference between NB and its variations lies in different orders of conditional mutual information represented by these augmenting edges in the tree-shaped network structure. In this paper, we propose to relax the independence assumption by further generalizing tree-augmented naive Bayes (TAN) from 1-dependence Bayesian network classifiers (BNC) to arbitrary k-dependence. Sub-models of TAN that are built to respectively represent specific conditional dependence relationships may “best match” the conditional probability distribution over the training data. Extensive experimental results reveal that the proposed algorithm achieves bias-variance trade-off and substantially better generalization performance than state-of-the-art classifiers such as logistic regression.


2018 ◽  
Vol 34 (3) ◽  
pp. 569-581 ◽  
Author(s):  
Sujata Rani ◽  
Parteek Kumar

Abstract In this article, an innovative approach to perform the sentiment analysis (SA) has been presented. The proposed system handles the issues of Romanized or abbreviated text and spelling variations in the text to perform the sentiment analysis. The training data set of 3,000 movie reviews and tweets has been manually labeled by native speakers of Hindi in three classes, i.e. positive, negative, and neutral. The system uses WEKA (Waikato Environment for Knowledge Analysis) tool to convert these string data into numerical matrices and applies three machine learning techniques, i.e. Naive Bayes (NB), J48, and support vector machine (SVM). The proposed system has been tested on 100 movie reviews and tweets, and it has been observed that SVM has performed best in comparison to other classifiers, and it has an accuracy of 68% for movie reviews and 82% in case of tweets. The results of the proposed system are very promising and can be used in emerging applications like SA of product reviews and social media analysis. Additionally, the proposed system can be used in other cultural/social benefits like predicting/fighting human riots.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Nizam Ud Din ◽  
Ji Yu

AbstractAdvances in the artificial neural network have made machine learning techniques increasingly more important in image analysis tasks. Recently, convolutional neural networks (CNN) have been applied to the problem of cell segmentation from microscopy images. However, previous methods used a supervised training paradigm in order to create an accurate segmentation model. This strategy requires a large amount of manually labeled cellular images, in which accurate segmentations at pixel level were produced by human operators. Generating training data is expensive and a major hindrance in the wider adoption of machine learning based methods for cell segmentation. Here we present an alternative strategy that trains CNNs without any human-labeled data. We show that our method is able to produce accurate segmentation models, and is applicable to both fluorescence and bright-field images, and requires little to no prior knowledge of the signal characteristics.


Sign in / Sign up

Export Citation Format

Share Document