Inferring the outcomes of rejected loans: an application of semisupervised clustering

Zhiyong Li; Xinyi Hu; Ke Li; Fanyin Zhou; Feng Shen

doi:10.1111/rssa.12534

On the Use of Fuzzy Constraints in Semisupervised Clustering

IEEE Transactions on Fuzzy Systems ◽

10.1109/tfuzz.2015.2466085 ◽

2016 ◽

Vol 24 (4) ◽

pp. 992-999 ◽

Cited By ~ 9

Author(s):

Irene Diaz-Valenzuela ◽

M. Amparo Vila ◽

Maria J. Martin-Bautista

Keyword(s):

Fuzzy Constraints ◽

Semisupervised Clustering

Download Full-text

A Fuzzy Semisupervised Clustering Method: Application to the Classification of Scientific Publications

Information Processing and Management of Uncertainty in Knowledge-Based Systems - Communications in Computer and Information Science ◽

10.1007/978-3-319-08795-5_19 ◽

2014 ◽

pp. 179-188 ◽

Cited By ~ 5

Author(s):

Irene Diaz-Valenzuela ◽

Maria J. Martin-Bautista ◽

Maria-Amparo Vila

Keyword(s):

Clustering Method ◽

Scientific Publications ◽

Semisupervised Clustering

Download Full-text

Learning Choquet-Integral-Based Metrics for Semisupervised Clustering

IEEE Transactions on Fuzzy Systems ◽

10.1109/tfuzz.2011.2123899 ◽

2011 ◽

Vol 19 (3) ◽

pp. 562-574 ◽

Cited By ~ 49

Author(s):

Gleb Beliakov ◽

Simon James ◽

Gang Li

Keyword(s):

Choquet Integral ◽

Semisupervised Clustering

Download Full-text

Active Semisupervised Clustering Algorithm with Label Propagation for Imbalanced and Multidensity Datasets

Mathematical Problems in Engineering ◽

10.1155/2013/641927 ◽

2013 ◽

Vol 2013 ◽

pp. 1-10 ◽

Cited By ~ 3

Author(s):

Mingwei Leng ◽

Jianjun Cheng ◽

Jinjin Wang ◽

Zhengquan Zhang ◽

Hanhai Zhou ◽

...

Keyword(s):

Clustering Algorithm ◽

Clustering Algorithms ◽

Label Propagation ◽

Data Selection ◽

Imbalanced Datasets ◽

Active Mechanism ◽

Real World Applications ◽

Stable Performance ◽

Active Data ◽

Semisupervised Clustering

The accuracy of most of the existing semisupervised clustering algorithms based on small size of labeled dataset is low when dealing with multidensity and imbalanced datasets, and labeling data is quite expensive and time consuming in many real-world applications. This paper focuses on active data selection and semisupervised clustering algorithm in multidensity and imbalanced datasets and proposes an active semisupervised clustering algorithm. The proposed algorithm uses an active mechanism for data selection to minimize the amount of labeled data, and it utilizes multithreshold to expand labeled datasets on multidensity and imbalanced datasets. Three standard datasets and one synthetic dataset are used to demonstrate the proposed algorithm, and the experimental results show that the proposed semisupervised clustering algorithm has a higher accuracy and a more stable performance in comparison to other clustering and semisupervised clustering algorithms, especially when the datasets are multidensity and imbalanced.

Download Full-text

Software Quality Analysis of Unlabeled Program Modules With Semisupervised Clustering

IEEE Transactions on Systems Man and Cybernetics - Part A Systems and Humans ◽

10.1109/tsmca.2006.889473 ◽

2007 ◽

Vol 37 (2) ◽

pp. 201-211 ◽

Cited By ~ 46

Author(s):

Naeem Seliya ◽

Taghi M. Khoshgoftaar

Keyword(s):

Software Quality ◽

Quality Analysis ◽

Semisupervised Clustering ◽

Program Modules

Download Full-text

Semisupervised Clustering with Metric Learning using Relative Comparisons

IEEE Transactions on Knowledge and Data Engineering ◽

10.1109/tkde.2007.190715 ◽

2008 ◽

Vol 20 (4) ◽

pp. 496-503 ◽

Cited By ~ 42

Author(s):

N. Kumar ◽

K. Kummamuru

Keyword(s):

Metric Learning ◽

Semisupervised Clustering ◽

Relative Comparisons

Download Full-text

A large margin nearest cluster metric based semisupervised clustering algorithm for brain fibers

The 2014 5th International Conference on Game Theory for Networks ◽

10.1109/gamenets.2014.7043717 ◽

2014 ◽

Author(s):

Meiyu Huang ◽

Yiqiang Chen ◽

Junfa Liu ◽

Wen Ji

Keyword(s):

Clustering Algorithm ◽

Large Margin ◽

Semisupervised Clustering

Download Full-text

Preprocessing Method for Encrypted Traffic Based on Semisupervised Clustering

Security and Communication Networks ◽

10.1155/2020/8824659 ◽

2020 ◽

Vol 2020 ◽

pp. 1-13

Author(s):

Rongfeng Zheng ◽

Jiayong Liu ◽

Weina Niu ◽

Liang Liu ◽

Kai Li ◽

...

Keyword(s):

Network Traffic ◽

Clustering Algorithm ◽

Network Flows ◽

Spatial Clustering ◽

Clustering Algorithms ◽

Communication Channels ◽

Transport Layer ◽

Clustering Model ◽

Network Intrusion ◽

Semisupervised Clustering

The explosive growth in network traffic in recent times has resulted in increased processing pressure on network intrusion detection systems. In addition, there is a lack of reliable methods for preprocessing network traffic generated by benign applications that do not steal users’ data from their devices. To alleviate these problems, this study analyzed the differences between benign and malicious traffic produced by benign applications and malware, respectively. To fully express these differences, this study proposed a new set of statistical features for training a clustering model. Furthermore, to mine the communication channels generated by benign applications in batches, a semisupervised clustering method was adopted. Using a small number of labeled samples, our method aggregated historical network traffic into two types of clusters. The cluster that did not contain labeled malicious samples was regarded as a benign traffic cluster. The experimental results were compared using four types of clustering algorithms. The density-based spatial clustering of applications with noise (DBSCAN) clustering algorithm was selected to mine benign communication channels. We also compared our method with two other methods, and the results demonstrated that the benign channels mined through our method were more reliable. Finally, using our method, 1,811 benign transport layer security (TLS) channels were mined from 18,357 TLS communication channels. The number of flows carried by these benign channels comprised 65.37% of the entire network flows, and no malicious flow was included in our results, which proves the effectiveness of our method.

Download Full-text

Use of Semisupervised Clustering and Feature-Selection Techniques for Identification of Co-expressed Genes

IEEE Journal of Biomedical and Health Informatics ◽

10.1109/jbhi.2015.2451735 ◽

2016 ◽

Vol 20 (4) ◽

pp. 1171-1177 ◽

Cited By ~ 8

Author(s):

Sriparna Saha ◽

Abhay Kumar Alok ◽

Asif Ekbal

Keyword(s):

Feature Selection ◽

Semisupervised Clustering ◽

Feature Selection Techniques

Download Full-text

Mixture Modeling with Pairwise, Instance-Level Class Constraints

Neural Computation ◽

10.1162/0899766054796914 ◽

2005 ◽

Vol 17 (11) ◽

pp. 2482-2507 ◽

Cited By ~ 14

Author(s):

Qi Zhao ◽

David J. Miller

Keyword(s):

Synthetic Data ◽

Ground Truth ◽

Mixture Modeling ◽

Data Set ◽

Ground Truth Data ◽

New Class ◽

Recent Approach ◽

Semisupervised Clustering ◽

The One ◽

Number Of Classes

The goal of semisupervised clustering/mixture modeling is to learn the underlying groups comprising a given data set when there is also some form of instance-level supervision available, usually in the form of labels or pairwise sample constraints. Most prior work with constraints assumes the number of classes is known, with each learned cluster assumed to be a class and, hence, subject to the given class constraints. When the number of classes is unknown or when the one-cluster-per-class assumption is not valid, the use of constraints may actually be deleterious to learning the ground-truth data groups. We address this by (1) allowing allocation of multiple mixture components to individual classes and (2) estimating both the number of components and the number of classes. We also address new class discovery, with components void of constraints treated as putative unknown classes. For both real-world and synthetic data, our method is shown to accurately estimate the number of classes and to give favorable comparison with the recent approach of Shental, Bar-Hillel, Hertz, and Weinshall (2003).

Download Full-text