Identifying strong lenses with unsupervised machine learning using convolutional autoencoder

Ting-Yun Cheng; Nan Li; Christopher J Conselice; Alfonso Aragón-Salamanca; Simon Dye; Robert B Metcalf

doi:10.1093/mnras/staa1015

Identifying strong lenses with unsupervised machine learning using convolutional autoencoder

Monthly Notices of the Royal Astronomical Society ◽

10.1093/mnras/staa1015 ◽

2020 ◽

Vol 494 (3) ◽

pp. 3750-3765 ◽

Cited By ~ 4

Author(s):

Ting-Yun Cheng ◽

Nan Li ◽

Christopher J Conselice ◽

Alfonso Aragón-Salamanca ◽

Simon Dye ◽

...

Keyword(s):

Machine Learning ◽

Clustering Algorithm ◽

Binary Classification ◽

Gaussian Mixture ◽

Gravitational Lenses ◽

Imaging Data ◽

Training Set ◽

Unsupervised Machine Learning ◽

Starting Point ◽

Convolutional Autoencoder

ABSTRACT In this paper, we develop a new unsupervised machine learning technique comprised of a feature extractor, a convolutional autoencoder, and a clustering algorithm consisting of a Bayesian Gaussian mixture model. We apply this technique to visual band space-based simulated imaging data from the Euclid Space Telescope using data from the strong gravitational lenses finding challenge. Our technique promisingly captures a variety of lensing features such as Einstein rings with different radii, distorted arc structures, etc., without using predefined labels. After the clustering process, we obtain several classification clusters separated by different visual features which are seen in the images. Our method successfully picks up ∼63 per cent of lensing images from all lenses in the training set. With the assumed probability proposed in this study, this technique reaches an accuracy of 77.25 ± 0.48 per cent in binary classification using the training set. Additionally, our unsupervised clustering process can be used as the preliminary classification for future surveys of lenses to efficiently select targets and to speed up the labelling process. As the starting point of the astronomical application using this technique, we not only explore the application to gravitationally lensed systems, but also discuss the limitations and potential future uses of this technique.

Download Full-text

A Knowledge Extraction Pipeline between Supervised and Unsupervised Machine Learning Using Gaussian Mixture Models for Anomaly Detection

Journal of Computing Science and Engineering ◽

10.5626/jcse.2021.15.1.1 ◽

2021 ◽

Vol 15 (1) ◽

pp. 1-17

Author(s):

Reda Chefira ◽

Said Rakrak

Keyword(s):

Machine Learning ◽

Anomaly Detection ◽

Mixture Models ◽

Gaussian Mixture Models ◽

Gaussian Mixture ◽

Knowledge Extraction ◽

Unsupervised Machine Learning

Download Full-text

A Novel Semi-Supervised Fuzzy C-Means Clustering Algorithm Using Multiple Fuzzification Coefficients

Algorithms ◽

10.3390/a14090258 ◽

2021 ◽

Vol 14 (9) ◽

pp. 258

Author(s):

Tran Dinh Khang ◽

Manh-Kien Tran ◽

Michael Fowler

Keyword(s):

Machine Learning ◽

Clustering Algorithm ◽

Machine Learning Techniques ◽

Unsupervised Machine Learning ◽

Practical Applications ◽

Fuzzy C Means ◽

Learning Techniques ◽

Fuzzy C Means Clustering ◽

Data Points ◽

Data Elements

Clustering is an unsupervised machine learning method with many practical applications that has gathered extensive research interest. It is a technique of dividing data elements into clusters such that elements in the same cluster are similar. Clustering belongs to the group of unsupervised machine learning techniques, meaning that there is no information about the labels of the elements. However, when knowledge of data points is known in advance, it will be beneficial to use a semi-supervised algorithm. Within many clustering techniques available, fuzzy C-means clustering (FCM) is a common one. To make the FCM algorithm a semi-supervised method, it was proposed in the literature to use an auxiliary matrix to adjust the membership grade of the elements to force them into certain clusters during the computation. In this study, instead of using the auxiliary matrix, we proposed to use multiple fuzzification coefficients to implement the semi-supervision component. After deriving the proposed semi-supervised fuzzy C-means clustering algorithm with multiple fuzzification coefficients (sSMC-FCM), we demonstrated the convergence of the algorithm and validated the efficiency of the method through a numerical example.

Download Full-text

Building an Effective Intrusion Detection System Using the Modified Density Peak Clustering Algorithm and Deep Belief Networks

Applied Sciences ◽

10.3390/app9020238 ◽

2019 ◽

Vol 9 (2) ◽

pp. 238 ◽

Cited By ~ 21

Author(s):

Yanqing Yang ◽

Kangfeng Zheng ◽

Chunhua Wu ◽

Xinxin Niu ◽

Yixian Yang

Keyword(s):

Machine Learning ◽

Intrusion Detection ◽

Clustering Algorithm ◽

Fuzzy Membership ◽

Belief Networks ◽

Deep Belief Networks ◽

Training Set ◽

Density Peak ◽

Proposed Model ◽

Density Peak Clustering

Machine learning plays an important role in building intrusion detection systems. However, with the increase of data capacity and data dimension, the ability of shallow machine learning is becoming more limited. In this paper, we propose a fuzzy aggregation approach using the modified density peak clustering algorithm (MDPCA) and deep belief networks (DBNs). To reduce the size of the training set and the imbalance of the samples, MDPCA is used to divide the training set into several subsets with similar sets of attributes. Each subset is used to train its own sub-DBNs classifier. These sub-DBN classifiers can learn and explore high-level abstract features, automatically reduce data dimensions, and perform classification well. According to the nearest neighbor criterion, the fuzzy membership weights of each test sample in each sub-DBNs classifier are calculated. The output of all sub-DBNs classifiers is aggregated based on fuzzy membership weights. Experimental results on the NSL-KDD and UNSW-NB15 datasets show that our proposed model has higher overall accuracy, recall, precision and F1-score than other well-known classification methods. Furthermore, the proposed model achieves better performance in terms of accuracy, detection rate and false positive rate compared to the state-of-the-art intrusion detection methods.

Download Full-text

GMMR: A Gaussian mixture model based unsupervised machine learning approach for optimal routing in opportunistic IoT networks

Computer Communications ◽

10.1016/j.comcom.2018.12.001 ◽

2019 ◽

Vol 134 ◽

pp. 138-148 ◽

Cited By ~ 15

Author(s):

Vidushi Vashishth ◽

Anshuman Chhabra ◽

Deepak Kumar Sharma

Keyword(s):

Machine Learning ◽

Gaussian Mixture Model ◽

Mixture Model ◽

Gaussian Mixture ◽

Optimal Routing ◽

Learning Approach ◽

Unsupervised Machine Learning ◽

Model Based ◽

Machine Learning Approach

Download Full-text

Automatic microseismic event picking via unsupervised machine learning

Geophysical Journal International ◽

10.1093/gji/ggaa186 ◽

2020 ◽

Vol 222 (3) ◽

pp. 1750-1764 ◽

Cited By ~ 1

Author(s):

Yangkang Chen

Keyword(s):

Machine Learning ◽

Clustering Algorithm ◽

Learning Algorithm ◽

State Of The Art ◽

The State ◽

Training Data ◽

Supervised Machine Learning ◽

Machine Learning Algorithm ◽

Unsupervised Machine Learning ◽

Earthquake Data

SUMMARY Effective and efficient arrival picking plays an important role in microseismic and earthquake data processing and imaging. Widely used short-term-average long-term-average ratio (STA/LTA) based arrival picking algorithms suffer from the sensitivity to moderate-to-strong random ambient noise. To make the state-of-the-art arrival picking approaches effective, microseismic data need to be first pre-processed, for example, removing sufficient amount of noise, and second analysed by arrival pickers. To conquer the noise issue in arrival picking for weak microseismic or earthquake event, I leverage the machine learning techniques to help recognizing seismic waveforms in microseismic or earthquake data. Because of the dependency of supervised machine learning algorithm on large volume of well-designed training data, I utilize an unsupervised machine learning algorithm to help cluster the time samples into two groups, that is, waveform points and non-waveform points. The fuzzy clustering algorithm has been demonstrated to be effective for such purpose. A group of synthetic, real microseismic and earthquake data sets with different levels of complexity show that the proposed method is much more robust than the state-of-the-art STA/LTA method in picking microseismic events, even in the case of moderately strong background noise.

Download Full-text

Using k-means clustering in international location decision

Journal of Global Operations and Strategic Sourcing ◽

10.1108/jgoss-11-2017-0056 ◽

2018 ◽

Vol 11 (3) ◽

pp. 274-300

Author(s):

Waqas Khalid ◽

Zaza Nadja Lee Herbert-Hansen

Keyword(s):

Machine Learning ◽

Clustering Algorithm ◽

Decision Makers ◽

Location Decision ◽

Unsupervised Machine Learning ◽

Content Type ◽

Facility Locations ◽

Decision Support Framework ◽

Dynamic Location ◽

New Framework

PurposeThis paper aims to investigate the application of unsupervised machine learning in the international location decision (ILD). This paper addresses the need for a fast, quantitative and dynamic location decision framework.Design/methodology/approachUnsupervised machine learning technique, i.e. k-means clustering, is used to carry out the analysis. In total, 24 different indicators of 94 countries, categorized into five groups, have been used in the analysis. After the clustering, the clusters have been compared and scored to select the feasible countries.FindingsA new framework is developed based on k-means clustering that can be used in ILD. This method provides a quantitative output without personal subjectivity. The indicators can be easily added or extracted based on the preferences of the decision-makers. Hence, it was found out that the unsupervised machine learning, i.e. k-means clustering, is a fast and flexible decision support framework that can be used in ILD.Research limitations/implicationsLimitations include the generality of selected indicators and clustering algorithm used. The use of other methods and parameters may lead to alternate results.Originality/valueThe framework developed through the research intends to assist the decision-makers in deciding on the facility locations. The framework can be used in international and national domains. It provides a quantitative, fast and flexible way to shortlist the potential locations. Other methods can also be used to further decide on the specific location.

Download Full-text

Unsupervised Machine Learning on Domes in the Lunar Gardner Region: Implications for Dome Classification and Local Magmatic Activities on the Moon

Remote Sensing ◽

10.3390/rs13050845 ◽

2021 ◽

Vol 13 (5) ◽

pp. 845

Author(s):

Yuchao Chen ◽

Qian Huang ◽

Jiannan Zhao ◽

Xiangyun Hu

Keyword(s):

Machine Learning ◽

Clustering Algorithm ◽

Formation Processes ◽

The Moon ◽

Unsupervised Machine Learning ◽

Formation Stage ◽

Fuzzy C Means Clustering ◽

Effusion Rates ◽

The Relationship

Lunar volcanic domes are essential windows into the local magmatic activities on the Moon. Classification of domes is a useful way to figure out the relationship between dome appearances and formation processes. Previous studies of dome classification were manually or semi-automatically carried out either qualitatively or quantitively. We applied an unsupervised machine-learning method to domes that are annularly or radially distributed around Gardner, a unique central-vent volcano located in the northern part of the Mare Tranquillitatis. High-resolution lunar imaging and spectral data were used to extract morphometric and spectral properties of domes in both the Gardner volcano and its surrounding region in the Mare Tranquillitatis. An integrated robust Fuzzy C-Means clustering algorithm was performed on 120 combinations of five morphometric (diameter, area, height, surface volume, and slope) and two elemental features (FeO and TiO2 contents) to find the optimum combination. Rheological features of domes and their dike formation parameters were calculated for dome-forming lava explanations. Results show that diameter, area, surface volume, and slope are the selected optimum features for dome clustering. 54 studied domes can be grouped into four dome clusters (DC1 to DC4). DC1 domes are relatively small, steep, and close to the Gardner volcano, with forming lavas of high viscosities and low effusion rates, representing the latest Eratosthenian dome formation stage of the Gardner volcano. Domes of DC2 to DC4 are relatively large, smooth, and widely distributed, with forming lavas of low viscosities and high effusion rates, representing magmatic activities varying from Imbrian to Eratosthenian in the northern Mare Tranquillitatis. The integrated algorithm provides a new and independent way to figure out the representative properties of lunar domes and helps us further clarify the relationship between dome clusters and local magma activities of the Moon.

Download Full-text

Lack of Global Convergence and the Formation of Multiple Welfare Clubs across Countries: An Unsupervised Machine Learning Approach

Economies ◽

10.3390/economies7030074 ◽

2019 ◽

Vol 7 (3) ◽

pp. 74

Author(s):

Carlos Mendez

Keyword(s):

Machine Learning ◽

Local Convergence ◽

Clustering Algorithm ◽

Unsupervised Machine Learning ◽

Long Run ◽

Convergence Clubs ◽

Sigma Convergence ◽

Strong Notion ◽

Beta Convergence ◽

Beyond Gdp

The cross-country convergence hypothesis is one of the central topics of long-run macroeconomics. This paper revisits this hypothesis in a context beyond GDP. It uses a novel welfare index that incorporates measures of consumption, leisure, life expectancy, and inequality. Based on a sample of 128 countries over the 1980–2007 period, the lack of global sigma and beta convergence is first documented. Next, the paper incorporates some recent developments from the unsupervised machine learning literature to evaluate the existence of local convergence. In particular, the application of a distribution-based clustering algorithm suggests the formation of three local convergence clubs. Under this classification, beta convergence is recovered for each club. However, only the core members of the richest club appear to be reducing their welfare differences in a way that is consistent with the strong notion of sigma convergence. Overall, these results re-emphasize the finding that beta convergence is necessary, but not sufficient for sigma convergence, even within convergence clubs and in a context beyond GDP.

Download Full-text

Contextual classification for smart machining based on unsupervised machine learning by Gaussian mixture model

International Journal of Computer Integrated Manufacturing ◽

10.1080/0951192x.2020.1775302 ◽

2020 ◽

Vol 33 (10-11) ◽

pp. 1042-1054 ◽

Cited By ~ 1

Author(s):

Zhiqiang Wang ◽

Mathieu Ritou ◽

Catherine Da Cunha ◽

Benoît Furet

Keyword(s):

Machine Learning ◽

Gaussian Mixture Model ◽

Mixture Model ◽

Gaussian Mixture ◽

Unsupervised Machine Learning ◽

Smart Machining

Download Full-text

Comparison of Multi-class and Binary Classification Machine Learning Models in Identifying Strong Gravitational Lenses

Publications of the Astronomical Society of the Pacific ◽

10.1088/1538-3873/ab747b ◽

2020 ◽

Vol 132 (1010) ◽

pp. 044501 ◽

Cited By ~ 1

Author(s):

Hossen Teimoorinia ◽

Robert D. Toyonaga ◽

Sebastien Fabbro ◽

Connor Bottrell

Keyword(s):

Machine Learning ◽

Binary Classification ◽

Gravitational Lenses ◽

Learning Models ◽

Machine Learning Models

Download Full-text