semisupervised clustering
Recently Published Documents


TOTAL DOCUMENTS

35
(FIVE YEARS 8)

H-INDEX

9
(FIVE YEARS 0)

2021 ◽  
Vol 2021 ◽  
pp. 1-6
Author(s):  
Wenliang Gao ◽  
Yuanyuan Li ◽  
Chujie Fang ◽  
Wei Fan ◽  
Haonan Peng

Clustering analysis is one of the most important technologies for single-cell data mining. It is widely used in the division of different gene sequences, the identification of functional genes, and the detection of new cell types. Although the traditional unsupervised clustering method does not require label data, the distribution of the original data, the setting of hyperparameters, and other factors all affect the effectiveness of the clustering algorithm. While in some cases the type of some cells is known, it is hoped to achieve high accuracy if the prior information about those cells is utilized sufficiently. In this study, we propose SCMAG (a semisupervised single-cell clustering method based on a matrix aggregation graph convolutional neural network) that takes into full consideration the prior information for single-cell data. To evaluate the performance of the proposed semisupervised clustering method, we test on different single-cell datasets and compare with the current semisupervised clustering algorithm in recognizing cell types on various real scRNA-seq data; the results show that it is a more accurate and significant model.


2021 ◽  
Vol 2021 ◽  
pp. 1-11
Author(s):  
Yajing Wang ◽  
Juan Ma ◽  
Ashutosh Sharma ◽  
Pradeep Kumar Singh ◽  
Gurjot Singh Gaba ◽  
...  

Intrusion detection is crucial in computer network security issues; therefore, this work is aimed at maximizing network security protection and its improvement by proposing various preventive techniques. Outlier detection and semisupervised clustering algorithms based on shared nearest neighbors are proposed in this work to address intrusion detection by converting it into a problem of mining outliers using the network behavior dataset. The algorithm uses shared nearest neighbors as similarity, judges whether it is an outlier according to the number of nearest neighbors of a data point, and performs semisupervised clustering on the dataset where outliers are deleted. In the process of semisupervised clustering, vast prior knowledge is added, and the dataset is clustered according to the principle of graph segmentation. The novelty of the proposed algorithm lies in outlier detection while effectively avoiding the dependence on parameters, thus eliminating the influence of outliers on clustering. This article uses real datasets: lypmphography and glass for simulation purposes. The simulation results show that the algorithm proposed in this paper can effectively detect outliers and has a good clustering effect. Furthermore, the experimentation reveals that the outlier detection-based SCA-SNN algorithm has the best practical effect on the dataset without outliers, clearly validating the clustering performance of the outlier detection-based SCA-SNN algorithm. Furthermore, compared to the other state-of-the-art anomaly detection method, it was revealed that the anomaly detection technology based on outlier mining does not require a training process. Thus, they overcome the current anomaly detection problems caused due to incomplete normal patterns in training samples.


Author(s):  
Luca Frigau ◽  
Giulia Contu ◽  
Francesco Mola ◽  
Claudio Conversano

2021 ◽  
pp. ASN.2020101418
Author(s):  
Thibaut Vaulet ◽  
Gillian Divard ◽  
Olivier Thaunat ◽  
Evelyne Lerut ◽  
Aleksandar Senev ◽  
...  

BackgroundOver the past decades, an international group of experts iteratively developed a consensus classification of kidney transplant rejection phenotypes, known as the Banff classification. Data-driven clustering of kidney transplant histologic data could simplify the complex and discretionary rules of the Banff classification, while improving the association with graft failure.MethodsThe data consisted of a training set of 3510 kidney-transplant biopsies from an observational cohort of 936 recipients. Independent validation of the results was performed on an external set of 3835 biopsies from 1989 patients. On the basis of acute histologic lesion scores and the presence of donor-specific HLA antibodies, stable clustering was achieved on the basis of a consensus of 400 different clustering partitions. Additional information on kidney-transplant failure was introduced with a weighted Euclidean distance.ResultsBased on the proportion of ambiguous clustering, six clinically meaningful cluster phenotypes were identified. There was significant overlap with the existing Banff classification (adjusted rand index, 0.48). However, the data-driven approach eliminated intermediate and mixed phenotypes and created acute rejection clusters that are each significantly associated with graft failure. Finally, a novel visualization tool presents disease phenotypes and severity in a continuous manner, as a complement to the discrete clusters.ConclusionsA semisupervised clustering approach for the identification of clinically meaningful novel phenotypes of kidney transplant rejection has been developed and validated. The approach has the potential to offer a more quantitative evaluation of rejection subtypes and severity, especially in situations in which the current histologic categorization is ambiguous.


2021 ◽  
pp. 1-35
Author(s):  
Takuya Shimada ◽  
Han Bao ◽  
Issei Sato ◽  
Masashi Sugiyama

Pairwise similarities and dissimilarities between data points are often obtained more easily than full labels of data in real-world classification problems. To make use of such pairwise information, an empirical risk minimization approach has been proposed, where an unbiased estimator of the classification risk is computed from only pairwise similarities and unlabeled data. However, this approach has not yet been able to handle pairwise dissimilarities. Semisupervised clustering methods can incorporate both similarities and dissimilarities into their framework; however, they typically require strong geometrical assumptions on the data distribution such as the manifold assumption, which may cause severe performance deterioration. In this letter, we derive an unbiased estimator of the classification risk based on all of similarities and dissimilarities and unlabeled data. We theoretically establish an estimation error bound and experimentally demonstrate the practical usefulness of our empirical risk minimization method.


2021 ◽  
Vol 2021 ◽  
pp. 1-17
Author(s):  
Zhenlun Yang

The aim of this work is to develop a common automatic computer method to distinguish human individuals with abnormal gait patterns from those with normal gait patterns. As long as the silhouette gait images of the subjects are obtainable, the proposed method is capable of providing online anomaly gait detection result without additional work on analyzing the gait features of the target subjects before ahead. Moreover, the proposed method does not need any parameter settings by users and can start producing detection results under the work by only collecting a very small number of gait samples, even though none of those gait samples are abnormal. Therefore, the proposed method can provide fast and simple deployment for various anomaly gait detection application scenarios. The proposed method is composed of two main modules: (1) feature extraction from gait images and (2) anomaly detection via binary classification. In the first module, a new representation of the most frequently involved area of the silhouette gait images called full gait energy image (F-GEI) is proposed. Furthermore, based on the F-GEI, a novel and simple method characterizing individual walking properties is developed to extract gait features from individual subjects. In the second module, based on the very limited prior knowledge on the target dataset, a semisupervised clustering algorithm is proposed to perform the binary classification for detecting the gait anomaly of each subject. The performance of the proposed gait anomaly detection method was evaluated on the human gaits dataset in comparison with three state-of-the-art methods. The experiment results show that the proposed method is an effective and efficient gait anomaly detection method in terms of accuracy, robustness, and computational efficiency.


2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
Zhikui Chen ◽  
Chaojie Li ◽  
Jing Gao ◽  
Jianing Zhang ◽  
Peng Li

Deep embedding clustering (DEC) attracts much attention due to its outperforming performance attributed to the end-to-end clustering. However, DEC cannot make use of small amount of a priori knowledge contained in data of increasing volume. To tackle this challenge, a semisupervised deep embedded clustering algorithm with adaptive labels is proposed to cluster those data in a semisupervised end-to-end manner on the basis of a little priori knowledge. Specifically, a deep semisupervised clustering network is designed based on the autoencoder paradigm and deep clustering, which well mine the clustering representation and clustering assignment by preventing the shift of labels in DEC. Then, to train parameters of the deep semisupervised clustering network, a back-propagation-based algorithm with adaptive labels is introduced based on the pretrain and fine-tune strategies. Finally, extensive experiments on representative datasets are conducted to evaluate the performance of the proposed method in terms of clustering accuracy and normalized mutual information. Results show the proposed method outperforms the state-of-the-art methods of DEC.


2020 ◽  
Vol 2020 ◽  
pp. 1-13
Author(s):  
Rongfeng Zheng ◽  
Jiayong Liu ◽  
Weina Niu ◽  
Liang Liu ◽  
Kai Li ◽  
...  

The explosive growth in network traffic in recent times has resulted in increased processing pressure on network intrusion detection systems. In addition, there is a lack of reliable methods for preprocessing network traffic generated by benign applications that do not steal users’ data from their devices. To alleviate these problems, this study analyzed the differences between benign and malicious traffic produced by benign applications and malware, respectively. To fully express these differences, this study proposed a new set of statistical features for training a clustering model. Furthermore, to mine the communication channels generated by benign applications in batches, a semisupervised clustering method was adopted. Using a small number of labeled samples, our method aggregated historical network traffic into two types of clusters. The cluster that did not contain labeled malicious samples was regarded as a benign traffic cluster. The experimental results were compared using four types of clustering algorithms. The density-based spatial clustering of applications with noise (DBSCAN) clustering algorithm was selected to mine benign communication channels. We also compared our method with two other methods, and the results demonstrated that the benign channels mined through our method were more reliable. Finally, using our method, 1,811 benign transport layer security (TLS) channels were mined from 18,357 TLS communication channels. The number of flows carried by these benign channels comprised 65.37% of the entire network flows, and no malicious flow was included in our results, which proves the effectiveness of our method.


Sign in / Sign up

Export Citation Format

Share Document