A Framework for Supervised Classification Performance Analysis with Information-Theoretic Methods

2020 ◽  
Vol 32 (11) ◽  
pp. 2075-2087
Author(s):  
Francisco J. Valverde-Albacete ◽  
Carmen Pelaez-Moreno
2021 ◽  
Vol 33 (1) ◽  
pp. 244-268
Author(s):  
Tomoya Sakai ◽  
Gang Niu ◽  
Masashi Sugiyama

Recent advances in weakly supervised classification allow us to train a classifier from only positive and unlabeled (PU) data. However, existing PU classification methods typically require an accurate estimate of the class-prior probability, a critical bottleneck particularly for high-dimensional data. This problem has been commonly addressed by applying principal component analysis in advance, but such unsupervised dimension reduction can collapse the underlying class structure. In this letter, we propose a novel representation learning method from PU data based on the information-maximization principle. Our method does not require class-prior estimation and thus can be used as a preprocessing method for PU classification. Through experiments, we demonstrate that our method, combined with deep neural networks, highly improves the accuracy of PU class-prior estimation, leading to state-of-the-art PU classification performance.


2013 ◽  
Vol 427-429 ◽  
pp. 2309-2312
Author(s):  
Hai Bin Mei ◽  
Ming Hua Zhang

Alert classifiers built with the supervised classification technique require large amounts of labeled training alerts. Preparing for such training data is very difficult and expensive. Thus accuracy and feasibility of current classifiers are greatly restricted. This paper employs semi-supervised learning to build alert classification model to reduce the number of needed labeled training alerts. Alert context properties are also introduced to improve the classification performance. Experiments have demonstrated the accuracy and feasibility of our approach.


2010 ◽  
Vol 23 (6) ◽  
pp. 713-727 ◽  
Author(s):  
Robin A.A. Ince ◽  
Riccardo Senatore ◽  
Ehsan Arabzadeh ◽  
Fernando Montani ◽  
Mathew E. Diamond ◽  
...  

Author(s):  
Joanne Lee ◽  
Wendy K. Tam Cho ◽  
George Judge

This chapter examines and searches for evidence of fraud in two clinical data sets from a highly publicized case of scientific misconduct. In this case, data were falsified by Eric Poehlman, a faculty member at the University of Vermont, who pleaded guilty to fabricating more than a decade of data, some connected to federal grants from the National Institutes of Health. Poehlman had authored influential studies on many topics; including obesity, menopause, lipids, and aging. The chapter's classical Benford analysis along with a presentation of a more general class of Benford-like distributions highlights interesting insights into this and similar cases. In addition, this chapter demonstrates how information-theoretic methods and other data-adaptive methods are promising tools for generating benchmark distributions of first significant digits (FSDs) and examining data sets for departures from expectations.


Sign in / Sign up

Export Citation Format

Share Document