Iterative Min Cut Clustering Based on Graph Cuts

Bowen Liu; Zhaoying Liu; Yujian Li; Ting Zhang; Zhilin Zhang

doi:10.3390/s21020474

Iterative Min Cut Clustering Based on Graph Cuts

Sensors ◽

10.3390/s21020474 ◽

2021 ◽

Vol 21 (2) ◽

pp. 474

Author(s):

Bowen Liu ◽

Zhaoying Liu ◽

Yujian Li ◽

Ting Zhang ◽

Zhilin Zhang

Keyword(s):

Machine Learning ◽

Simple Formula ◽

Clustering Algorithm ◽

Graph Cut ◽

Hard Problem ◽

Running Time ◽

Benchmark Datasets ◽

Np Hard Problem ◽

Graph Based Clustering ◽

Min Cut

Clustering nonlinearly separable datasets is always an important problem in unsupervised machine learning. Graph cut models provide good clustering results for nonlinearly separable datasets, but solving graph cut models is an NP hard problem. A novel graph-based clustering algorithm is proposed for nonlinearly separable datasets. The proposed method solves the min cut model by iteratively computing only one simple formula. Experimental results on synthetic and benchmark datasets indicate the potential of the proposed method, which is able to cluster nonlinearly separable datasets with less running time.

Download Full-text

SIZES OF ORDERED DECISION TREES

International Journal of Foundations of Computer Science ◽

10.1142/s0129054102001205 ◽

2002 ◽

Vol 13 (03) ◽

pp. 445-458 ◽

Cited By ~ 1

Author(s):

HANS ZANTEMA ◽

HANS L. BODLAENDER

Keyword(s):

Machine Learning ◽

Decision Tree ◽

Decision Trees ◽

Blow Up ◽

Inductive Inference ◽

Hard Problem ◽

Minimal Size ◽

Knowledge Based ◽

Np Hard Problem ◽

Decision Tables

Decision tables provide a natural framework for knowledge acquisition and representation in the area of knowledge based information systems. Decision trees provide a standard method for inductive inference in the area of machine learning. In this paper we show how decision tables can be considered as ordered decision trees: decision trees satisfying an ordering restriction on the nodes. Every decision tree can be represented by an equivalent ordered decision tree, but we show that doing so may exponentially blow up sizes, even if the choice of the order is left free. Our main result states that finding an ordered decision tree of minimal size that represents the same function as a given ordered decision tree is an NP-hard problem; in earlier work we obtained a similar result for unordered decision trees.

Download Full-text

Board Games AI

Encyclopedia of Information Science and Technology, Fourth Edition ◽

10.4018/978-1-5225-2255-3.ch013 ◽

2018 ◽

pp. 144-155

Author(s):

Tad Gonsalves

Keyword(s):

Machine Learning ◽

Machine Learning Algorithms ◽

Hard Problem ◽

Game Playing ◽

Evaluation Function ◽

Board Game ◽

Game Tree ◽

Board Games ◽

Np Hard Problem ◽

Alpha Beta

The classical area of AI application is the board games. This chapter introduces the two most prominent AI approaches used in developing board game agents – the MinMax algorithm and Machine Learning and explains their usage in playing games like tic-tac-toe, checkers, othello, chess, go, etc., against human opponents. The game tree is essentially a directed graph, where the nodes represent the positions in the game and the edges the moves. Even a simple board game like tic-tac toe (noughts and crosses) has as many as 255,168 leaf nodes in the game tree. Traversing the complete game tree becomes an NP-hard problem. Alpha-beta pruning is used to estimate the short-cuts through the game tree. The board game strategy depends on the evaluation function, which is a heuristic indicating how good the player's current move is in winning the game. Machine learning algorithms try to evolve or learn the agent's game playing strategy based on the evaluation function.

Download Full-text

Free versus bound entanglement, a NP-hard problem tackled by machine learning

Scientific Reports ◽

10.1038/s41598-021-98523-6 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Beatrix C. Hiesmayr

Keyword(s):

Machine Learning ◽

Large Family ◽

Machine Learning Algorithms ◽

Entangled States ◽

Hard Problem ◽

Np Hard ◽

Entanglement Witnesses ◽

Np Hard Problem ◽

Bound Entanglement ◽

Ppt Criterion

AbstractEntanglement detection in high dimensional systems is a NP-hard problem since it is lacking an efficient way. Given a bipartite quantum state of interest free entanglement can be detected efficiently by the PPT-criterion (Peres-Horodecki criterion), in contrast to detecting bound entanglement, i.e. a curious form of entanglement that can also not be distilled into maximally (free) entangled states. Only a few bound entangled states have been found, typically by constructing dedicated entanglement witnesses, so naturally the question arises how large is the volume of those states. We define a large family of magically symmetric states of bipartite qutrits for which we find $$82\%$$ 82 % to be free entangled, $$2\%$$ 2 % to be certainly separable and as much as $$10\%$$ 10 % to be bound entangled, which shows that this kind of entanglement is not rare. Via various machine learning algorithms we can confirm that the remaining $$6\%$$ 6 % of states are more likely to belonging to the set of separable states than bound entangled states. Most important we find via dimension reduction algorithms that there is a strong two-dimensional (linear) sub-structure in the set of bound entangled states. This revealed structure opens a novel path to find and characterize bound entanglement towards solving the long-standing problem of what the existence of bound entanglement is implying.

Download Full-text

Uncertain Interval Data EFCM-ID Clustering Algorithm Based on Machine Learning

Journal of Robotics and Mechatronics ◽

10.20965/jrm.2019.p0339 ◽

2019 ◽

Vol 31 (2) ◽

pp. 339-347

Author(s):

Yimin Mao ◽

Yinping Liu ◽

Muhammad Asim Khan ◽

Jiawei Wang ◽

Dinghui Mao ◽

...

Keyword(s):

Machine Learning ◽

Uniform Distribution ◽

Learning Theory ◽

Clustering Algorithm ◽

Interval Data ◽

Membership Degree ◽

Running Time ◽

Spacing Distance ◽

Data Points ◽

Clustering Problems

In clustering problems based on fuzzy c-means (FCM) for uncertain interval data, points within the interval are usually assumed to have uniform distribution, resulting in the difficulty of accurately describing the interval. Furthermore, the clustering results are considerably affected by the initial clustering centers, and the update speed of the membership degree is slow. To address these problems, a new clustering algorithm called uncertain FCM for interval data (EFCM-ID) is presented. On the basis of a quartile, a median quartile-spacing distance measurement for generally distributed interval data based on machine learning is designed to precisely determine these data. Simultaneously, we sample the whole dataset and consider the density centers as the initial clustering centers to increase accuracy. We call this method samplingbased density-center selection (SDCS). To reduce the running time, a new measurement based on competitive-learning theory to update the membership is developed. It accelerates the update speed by different degrees according to value of the membership degree. Experiments conducted on synthetic interval datasets show the feasibility of EFCM-ID.

Download Full-text

Board Games AI

Advanced Methodologies and Technologies in Artificial Intelligence, Computer Simulation, and Human-Computer Interaction - Advances in Computer and Electrical Engineering ◽

10.4018/978-1-5225-7368-5.ch006 ◽

2019 ◽

pp. 68-80 ◽

Cited By ~ 1

Author(s):

Tad Gonsalves

Keyword(s):

Machine Learning ◽

Machine Learning Algorithms ◽

Hard Problem ◽

Game Playing ◽

Evaluation Function ◽

Board Game ◽

Game Tree ◽

Board Games ◽

Np Hard Problem ◽

Alpha Beta

The classical area of AI application is the board game. This chapter introduces the two most prominent AI approaches used in developing board game agents—the MinMax algorithm and machine learning—and explains their usage in playing games like Tic-Tac-Toe, Checkers, Othello, Chess, Go, etc. against human opponents. The game tree is essentially a directed graph, where the nodes represent the positions in the game and the edges the moves. Even a simple board game like Tic-Tac Toe (naughts and crosses) has as many as 255,168 leaf nodes in the game tree. Traversing the complete game tree becomes an NP-hard problem. Alpha-beta pruning is used to estimate the short-cuts through the game tree. The board game strategy depends on the evaluation function, which is a heuristic indicating how good the player's current move is in winning the game. Machine learning algorithms try to evolve or learn the agent's game playing strategy based on the evaluation function.

Download Full-text

The Parallel Seeding Algorithm for k-Means Problem with Penalties

Asia Pacific Journal of Operational Research ◽

10.1142/s0217595920400059 ◽

2020 ◽

Vol 37 (04) ◽

pp. 2040005

Author(s):

Min Li ◽

Dachuan Xu ◽

Jun Yue ◽

Dongmei Zhang

Keyword(s):

Machine Learning ◽

Computational Geometry ◽

Theoretical Analysis ◽

Hard Problem ◽

Np Hard ◽

Np Hard Problem ◽

Point Set ◽

Data Point

As a classic NP-hard problem in machine learning and computational geometry, the [Formula: see text]-means problem aims to partition a data point set into [Formula: see text] clusters such that the sum of the squared distance from each point to its nearest center is minimized. The [Formula: see text]-means problem with penalties, denoted by [Formula: see text]-MPWP, generalizing the [Formula: see text]-means problem, allows that some points can be paid some penalties instead of being clustered. In this paper, we study the seeding algorithm of [Formula: see text]-MPWP and propose a parallel seeding algorithm for [Formula: see text]-MPWP along with the corresponding theoretical analysis.

Download Full-text

Application of Machine Learning in Animal Disease Analysis and Prediction

Current Bioinformatics ◽

10.2174/1574893615999200728195613 ◽

2020 ◽

Vol 15 ◽

Author(s):

Shuwen Zhang ◽

Qiang Su ◽

Qin Chen

Keyword(s):

Machine Learning ◽

Unsupervised Learning ◽

Supervised Learning ◽

Clustering Algorithm ◽

Principal Component ◽

Support Vector ◽

Animal Disease ◽

Human Beings ◽

Animal Diseases ◽

Disease Analysis

Abstract: Major animal diseases pose a great threat to animal husbandry and human beings. With the deepening of globalization and the abundance of data resources, the prediction and analysis of animal diseases by using big data are becoming more and more important. The focus of machine learning is to make computers learn how to learn from data and use the learned experience to analyze and predict. Firstly, this paper introduces the animal epidemic situation and machine learning. Then it briefly introduces the application of machine learning in animal disease analysis and prediction. Machine learning is mainly divided into supervised learning and unsupervised learning. Supervised learning includes support vector machines, naive bayes, decision trees, random forests, logistic regression, artificial neural networks, deep learning, and AdaBoost. Unsupervised learning has maximum expectation algorithm, principal component analysis hierarchical clustering algorithm and maxent. Through the discussion of this paper, people have a clearer concept of machine learning and understand its application prospect in animal diseases.

Download Full-text

An IoT-Focused Intrusion Detection System Approach Based on Preprocessing Characterization for Cybersecurity Datasets

Sensors ◽

10.3390/s21020656 ◽

2021 ◽

Vol 21 (2) ◽

pp. 656

Author(s):

Xavier Larriva-Novo ◽

Víctor A. Villagrá ◽

Mario Vega-Barbas ◽

Diego Rivera ◽

Mario Sanz Rodrigo

Keyword(s):

Machine Learning ◽

Intrusion Detection ◽

High Performance ◽

Learning Algorithm ◽

Detection System ◽

Machine Learning Algorithms ◽

Statistical Characteristics ◽

Detection Techniques ◽

Traffic Characteristics ◽

Benchmark Datasets

Security in IoT networks is currently mandatory, due to the high amount of data that has to be handled. These systems are vulnerable to several cybersecurity attacks, which are increasing in number and sophistication. Due to this reason, new intrusion detection techniques have to be developed, being as accurate as possible for these scenarios. Intrusion detection systems based on machine learning algorithms have already shown a high performance in terms of accuracy. This research proposes the study and evaluation of several preprocessing techniques based on traffic categorization for a machine learning neural network algorithm. This research uses for its evaluation two benchmark datasets, namely UGR16 and the UNSW-NB15, and one of the most used datasets, KDD99. The preprocessing techniques were evaluated in accordance with scalar and normalization functions. All of these preprocessing models were applied through different sets of characteristics based on a categorization composed by four groups of features: basic connection features, content characteristics, statistical characteristics and finally, a group which is composed by traffic-based features and connection direction-based traffic characteristics. The objective of this research is to evaluate this categorization by using various data preprocessing techniques to obtain the most accurate model. Our proposal shows that, by applying the categorization of network traffic and several preprocessing techniques, the accuracy can be enhanced by up to 45%. The preprocessing of a specific group of characteristics allows for greater accuracy, allowing the machine learning algorithm to correctly classify these parameters related to possible attacks.

Download Full-text

Second order Kalman filtering channel estimation and machine learning methods for spectrum sensing in cognitive radio networks

Wireless Networks ◽

10.1007/s11276-021-02627-w ◽

2021 ◽

Author(s):

Olusegun Peter Awe ◽

Daniel Adebowale Babatunde ◽

Sangarapillai Lambotharan ◽

Basil AsSadhan

Keyword(s):

Machine Learning ◽

Kalman Filter ◽

Cognitive Radio ◽

Spectrum Sensing ◽

Cognitive Radio Networks ◽

Clustering Algorithm ◽

Polynomial Regression ◽

Primary User ◽

Radio Networks ◽

Second Order

AbstractWe address the problem of spectrum sensing in decentralized cognitive radio networks using a parametric machine learning method. In particular, to mitigate sensing performance degradation due to the mobility of the secondary users (SUs) in the presence of scatterers, we propose and investigate a classifier that uses a pilot based second order Kalman filter tracker for estimating the slowly varying channel gain between the primary user (PU) transmitter and the mobile SUs. Using the energy measurements at SU terminals as feature vectors, the algorithm is initialized by a K-means clustering algorithm with two centroids corresponding to the active and inactive status of PU transmitter. Under mobility, the centroid corresponding to the active PU status is adapted according to the estimates of the channels given by the Kalman filter and an adaptive K-means clustering technique is used to make classification decisions on the PU activity. Furthermore, to address the possibility that the SU receiver might experience location dependent co-channel interference, we have proposed a quadratic polynomial regression algorithm for estimating the noise plus interference power in the presence of mobility which can be used for adapting the centroid corresponding to inactive PU status. Simulation results demonstrate the efficacy of the proposed algorithm.

Download Full-text

A Survey on Causal Inference

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3444944 ◽

2021 ◽

Vol 15 (5) ◽

pp. 1-46

Author(s):

Liuyi Yao ◽

Zhixuan Chu ◽

Sheng Li ◽

Yaliang Li ◽

Jing Gao ◽

...

Keyword(s):

Machine Learning ◽

Causal Inference ◽

Observational Data ◽

Causal Effect ◽

Research Direction ◽

Estimation Methods ◽

Potential Outcome ◽

Outcome Framework ◽

Benchmark Datasets ◽

Inference Methods

Causal inference is a critical research topic across many domains, such as statistics, computer science, education, public policy, and economics, for decades. Nowadays, estimating causal effect from observational data has become an appealing research direction owing to the large amount of available data and low budget requirement, compared with randomized controlled trials. Embraced with the rapidly developed machine learning area, various causal effect estimation methods for observational data have sprung up. In this survey, we provide a comprehensive review of causal inference methods under the potential outcome framework, one of the well-known causal inference frameworks. The methods are divided into two categories depending on whether they require all three assumptions of the potential outcome framework or not. For each category, both the traditional statistical methods and the recent machine learning enhanced methods are discussed and compared. The plausible applications of these methods are also presented, including the applications in advertising, recommendation, medicine, and so on. Moreover, the commonly used benchmark datasets as well as the open-source codes are also summarized, which facilitate researchers and practitioners to explore, evaluate and apply the causal inference methods.

Download Full-text