Uncertain Interval Data EFCM-ID Clustering Algorithm Based on Machine Learning

2019 ◽  
Vol 31 (2) ◽  
pp. 339-347
Author(s):  
Yimin Mao ◽  
Yinping Liu ◽  
Muhammad Asim Khan ◽  
Jiawei Wang ◽  
Dinghui Mao ◽  
...  

In clustering problems based on fuzzy c-means (FCM) for uncertain interval data, points within the interval are usually assumed to have uniform distribution, resulting in the difficulty of accurately describing the interval. Furthermore, the clustering results are considerably affected by the initial clustering centers, and the update speed of the membership degree is slow. To address these problems, a new clustering algorithm called uncertain FCM for interval data (EFCM-ID) is presented. On the basis of a quartile, a median quartile-spacing distance measurement for generally distributed interval data based on machine learning is designed to precisely determine these data. Simultaneously, we sample the whole dataset and consider the density centers as the initial clustering centers to increase accuracy. We call this method samplingbased density-center selection (SDCS). To reduce the running time, a new measurement based on competitive-learning theory to update the membership is developed. It accelerates the update speed by different degrees according to value of the membership degree. Experiments conducted on synthetic interval datasets show the feasibility of EFCM-ID.

Algorithms ◽  
2021 ◽  
Vol 14 (9) ◽  
pp. 258
Author(s):  
Tran Dinh Khang ◽  
Manh-Kien Tran ◽  
Michael Fowler

Clustering is an unsupervised machine learning method with many practical applications that has gathered extensive research interest. It is a technique of dividing data elements into clusters such that elements in the same cluster are similar. Clustering belongs to the group of unsupervised machine learning techniques, meaning that there is no information about the labels of the elements. However, when knowledge of data points is known in advance, it will be beneficial to use a semi-supervised algorithm. Within many clustering techniques available, fuzzy C-means clustering (FCM) is a common one. To make the FCM algorithm a semi-supervised method, it was proposed in the literature to use an auxiliary matrix to adjust the membership grade of the elements to force them into certain clusters during the computation. In this study, instead of using the auxiliary matrix, we proposed to use multiple fuzzification coefficients to implement the semi-supervision component. After deriving the proposed semi-supervised fuzzy C-means clustering algorithm with multiple fuzzification coefficients (sSMC-FCM), we demonstrated the convergence of the algorithm and validated the efficiency of the method through a numerical example.


Author(s):  
Anshumala Jaiswal

In Marketing world, rapidly increasing competition makes it difficult to sustain in this field, marketers have to take decisions that satisfy their customers. Growth of an organization is highly depended on right decisions by the organization. For that, they have to collect deep knowledge about their customer's needs. Substantial amount of data of customers is collected daily. To manage such a huge data is not a piece of cake. An idea is to segment customers in different groups and go through each group and find the potential group among pool of customers. If it is done manually, it will require lot of human efforts and also consume lot of time. For reducing the human efforts, machine learning plays an important role. One can find various patterns which is used to analyze customers database using machine learning algorithms. Using clustering technique, customers can be segmented on the basis of some similarities. One of the best procedures for clustering technique is by using K-means algorithm. The k-means clustering algorithm is one of the widely used data clustering methods where the datasets having “n” data points are partitioned into “k” groups or cluster [1].in this paper. K is number of clusters or groups or segments and elbow method is used for determining value of K.


Sensors ◽  
2021 ◽  
Vol 21 (2) ◽  
pp. 474
Author(s):  
Bowen Liu ◽  
Zhaoying Liu ◽  
Yujian Li ◽  
Ting Zhang ◽  
Zhilin Zhang

Clustering nonlinearly separable datasets is always an important problem in unsupervised machine learning. Graph cut models provide good clustering results for nonlinearly separable datasets, but solving graph cut models is an NP hard problem. A novel graph-based clustering algorithm is proposed for nonlinearly separable datasets. The proposed method solves the min cut model by iteratively computing only one simple formula. Experimental results on synthetic and benchmark datasets indicate the potential of the proposed method, which is able to cluster nonlinearly separable datasets with less running time.


2019 ◽  
Vol 17 (02) ◽  
pp. 1950016 ◽  
Author(s):  
Sanjay Chakraborty ◽  
Soharab Hossain Shaikh ◽  
Sudhindu Bikash Mandal ◽  
Ranjan Ghosh ◽  
Amlan Chakrabarti

Traditional machine learning shares several benefits with quantum information processing field. The study of machine learning with quantum mechanics is called quantum machine learning. Data clustering is an important tool for machine learning where quantum computing plays a vital role in its inherent speed up capability. In this paper, a hybrid quantum algorithm for data clustering (quantum walk-based hybrid clustering (QWBHC)) is introduced where one-dimensional discrete time quantum walks (DTQW) play the central role to update the positions of data points according to their probability distributions. A quantum oracle is also designed and it is mainly implemented on a finite [Formula: see text]-regular bipartite graph where data points are initially distributed as a predefined set of clusters. An overview of a quantum walk (QW) based clustering algorithm on 1D lattice structure is also introduced and described in this paper. In order to search the nearest neighbors, a unitary and reversible DTQW gives a quadratic speed up over the traditional classical random walk. This paper also demonstrates the comparisons of our proposed hybrid quantum clustering algorithm with some state-of-the-art clustering algorithms in terms of clustering accuracy and time complexity analysis. The proposed quantum oracle needs [Formula: see text] queries to mark the nearest data points among clusters and modify the existing clusters. Finally, the proposed QWBHC algorithm achieves [Formula: see text] performance.


2019 ◽  
Vol 219 (2) ◽  
pp. 1163-1180
Author(s):  
Weilin Huang

SUMMARY Seismic signal recognition can serve as a powerful auxiliary tool for analysing and processing ever-larger volumes of seismic data. It can facilitate many subsequent procedures such as first-break picking, statics correction, denoising, signal detection, events tracking, structural interpretation, inversion and imaging. In this study, I propose an automatic technique of seismic signal recognition taking advantage of unsupervised machine learning. In the proposed technique, seismic signal recognition is considered as a problem of clustering data points. All the seismic sampling points in time domain are clustered into two clusters, that is, signal or non-signal. The hierarchical clustering algorithm is used to group these sampling points. Four attributes, that is, two short-term-average-to-long-term-average ratios, variance and envelope are investigated in the clustering process. In addition, to quantitatively evaluate the performance of seismic signal recognition properly, I propose two new statistical indicators, namely, the rate between the total energies of original and recognized signals (RTE), and the rate between the average energies of original and recognized signals (RAE). A large number of numerical experiments show that when the signal is slightly corrupted by noise, the proposed technique performs very well, with recognizing accuracy, precision and RTE of nearly 1 (i.e. 100 per cent), recall greater than 0.8 and RAE about 1–1.3. When the signal is moderately corrupted by noise, the proposed technique can hold recognizing accuracy about 0.9, recognizing precision nearly to 1, RTE about 0.9, recall around 0.6 and RAE about 1.5. Applications of the proposed technique to real microseismic data induced from hydraulic fracturing and reflection seismic data demonstrate its feasibility and encouraging prospect.


2020 ◽  
Vol 15 ◽  
Author(s):  
Shuwen Zhang ◽  
Qiang Su ◽  
Qin Chen

Abstract: Major animal diseases pose a great threat to animal husbandry and human beings. With the deepening of globalization and the abundance of data resources, the prediction and analysis of animal diseases by using big data are becoming more and more important. The focus of machine learning is to make computers learn how to learn from data and use the learned experience to analyze and predict. Firstly, this paper introduces the animal epidemic situation and machine learning. Then it briefly introduces the application of machine learning in animal disease analysis and prediction. Machine learning is mainly divided into supervised learning and unsupervised learning. Supervised learning includes support vector machines, naive bayes, decision trees, random forests, logistic regression, artificial neural networks, deep learning, and AdaBoost. Unsupervised learning has maximum expectation algorithm, principal component analysis hierarchical clustering algorithm and maxent. Through the discussion of this paper, people have a clearer concept of machine learning and understand its application prospect in animal diseases.


2021 ◽  
Vol 3 (1) ◽  
Author(s):  
Zhikuan Zhao ◽  
Jack K. Fitzsimons ◽  
Patrick Rebentrost ◽  
Vedran Dunjko ◽  
Joseph F. Fitzsimons

AbstractMachine learning has recently emerged as a fruitful area for finding potential quantum computational advantage. Many of the quantum-enhanced machine learning algorithms critically hinge upon the ability to efficiently produce states proportional to high-dimensional data points stored in a quantum accessible memory. Even given query access to exponentially many entries stored in a database, the construction of which is considered a one-off overhead, it has been argued that the cost of preparing such amplitude-encoded states may offset any exponential quantum advantage. Here we prove using smoothed analysis that if the data analysis algorithm is robust against small entry-wise input perturbation, state preparation can always be achieved with constant queries. This criterion is typically satisfied in realistic machine learning applications, where input data is subjective to moderate noise. Our results are equally applicable to the recent seminal progress in quantum-inspired algorithms, where specially constructed databases suffice for polylogarithmic classical algorithm in low-rank cases. The consequence of our finding is that for the purpose of practical machine learning, polylogarithmic processing time is possible under a general and flexible input model with quantum algorithms or quantum-inspired classical algorithms in the low-rank cases.


2021 ◽  
Author(s):  
Olusegun Peter Awe ◽  
Daniel Adebowale Babatunde ◽  
Sangarapillai Lambotharan ◽  
Basil AsSadhan

AbstractWe address the problem of spectrum sensing in decentralized cognitive radio networks using a parametric machine learning method. In particular, to mitigate sensing performance degradation due to the mobility of the secondary users (SUs) in the presence of scatterers, we propose and investigate a classifier that uses a pilot based second order Kalman filter tracker for estimating the slowly varying channel gain between the primary user (PU) transmitter and the mobile SUs. Using the energy measurements at SU terminals as feature vectors, the algorithm is initialized by a K-means clustering algorithm with two centroids corresponding to the active and inactive status of PU transmitter. Under mobility, the centroid corresponding to the active PU status is adapted according to the estimates of the channels given by the Kalman filter and an adaptive K-means clustering technique is used to make classification decisions on the PU activity. Furthermore, to address the possibility that the SU receiver might experience location dependent co-channel interference, we have proposed a quadratic polynomial regression algorithm for estimating the noise plus interference power in the presence of mobility which can be used for adapting the centroid corresponding to inactive PU status. Simulation results demonstrate the efficacy of the proposed algorithm.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Baicheng Lyu ◽  
Wenhua Wu ◽  
Zhiqiang Hu

AbstractWith the widely application of cluster analysis, the number of clusters is gradually increasing, as is the difficulty in selecting the judgment indicators of cluster numbers. Also, small clusters are crucial to discovering the extreme characteristics of data samples, but current clustering algorithms focus mainly on analyzing large clusters. In this paper, a bidirectional clustering algorithm based on local density (BCALoD) is proposed. BCALoD establishes the connection between data points based on local density, can automatically determine the number of clusters, is more sensitive to small clusters, and can reduce the adjusted parameters to a minimum. On the basis of the robustness of cluster number to noise, a denoising method suitable for BCALoD is proposed. Different cutoff distance and cutoff density are assigned to each data cluster, which results in improved clustering performance. Clustering ability of BCALoD is verified by randomly generated datasets and city light satellite images.


Sign in / Sign up

Export Citation Format

Share Document