An investigation on the factors affecting machine learning classifications in gamma-ray astronomy

Shengda Luo; Alex P Leung; C Y Hui; K L Li

doi:10.1093/mnras/staa166

An investigation on the factors affecting machine learning classifications in gamma-ray astronomy

Monthly Notices of the Royal Astronomical Society ◽

10.1093/mnras/staa166 ◽

2020 ◽

Vol 492 (4) ◽

pp. 5377-5390 ◽

Cited By ~ 3

Author(s):

Shengda Luo ◽

Alex P Leung ◽

C Y Hui ◽

K L Li

Keyword(s):

Machine Learning ◽

Gamma Ray ◽

Small Sample Size ◽

Classification Performance ◽

Small Sample ◽

Classification Model ◽

Machine Learning Techniques ◽

Large Area ◽

Actual Performance ◽

Statistical Fluctuations

ABSTRACT We have investigated a number of factors that can have significant impacts on the classification performance of gamma-ray sources detected by Fermi Large Area Telescope (LAT) with machine learning techniques. We show that a framework of automatic feature selection can construct a simple model with a small set of features that yields better performance over previous results. Secondly, because of the small sample size of the training/test sets of certain classes in gamma-ray, nested re-sampling and cross-validations are suggested for quantifying the statistical fluctuations of the quoted accuracy. We have also constructed a test set by cross-matching the identified active galactic nuclei (AGNs) and the pulsars (PSRs) in the Fermi-LAT 8-yr point source catalogue (4FGL) with those unidentified sources in the previous 3rd Fermi-LAT Source Catalog (3FGL). Using this cross-matched set, we show that some features used for building classification model with the identified source can suffer from the problem of covariate shift, which can be a result of various observational effects. This can possibly hamper the actual performance when one applies such model in classifying unidentified sources. Using our framework, both AGN/PSR and young pulsar (YNG)/millisecond pulsar (MSP) classifiers are automatically updated with the new features and the enlarged training samples in 4FGL catalogue incorporated. Using a two-layer model with these updated classifiers, we have selected 20 promising MSP candidates with confidence scores $\gt 98{{\ \rm per\ cent}}$ from the unidentified sources in 4FGL catalogue that can provide inputs for a multiwavelength identification campaign.

Download Full-text

Optimizing neural network techniques in classifying Fermi-LAT gamma-ray sources

Monthly Notices of the Royal Astronomical Society ◽

10.1093/mnras/stz2920 ◽

2019 ◽

Vol 490 (4) ◽

pp. 4770-4777 ◽

Cited By ~ 3

Author(s):

M Kovačević ◽

G Chiaro ◽

S Cutini ◽

G Tosti

Keyword(s):

Neural Network ◽

Machine Learning ◽

Gamma Ray ◽

Classification Performance ◽

Previous Method ◽

Machine Learning Method ◽

Learning Method ◽

Large Area ◽

Γ Ray ◽

Automatic Technique

ABSTRACT Machine learning is an automatic technique that is revolutionizing scientific research, with innovative applications and wide use in astrophysics. The aim of this study was to develop an optimized version of an Artificial Neural Network machine learning method for classifying blazar candidates of uncertain type detected by the Fermi Large Area Telescope γ-ray instrument. The final result of this study increased the classification performance by about 80 ${{\ \rm per\ cent}}$ with respect to previous method, leaving only 15 unclassified blazars out of 573 blazar candidates of uncertain type listed in the LAT 4-year Source Catalog.

Download Full-text

A Novel Virtual Sample Generation Method to Overcome the Small Sample Size Problem in Computer Aided Medical Diagnosing

Algorithms ◽

10.3390/a12080160 ◽

2019 ◽

Vol 12 (8) ◽

pp. 160 ◽

Cited By ~ 3

Author(s):

Mohammad Wedyan ◽

Alessandro Crippa ◽

Adel Al-Jumaily

Keyword(s):

Sample Size ◽

Small Sample Size ◽

Classification Performance ◽

Small Sample ◽

Classification Model ◽

Clinical Samples ◽

Small Sample Size Problem ◽

Virtual Sample ◽

Size Problem ◽

Virtual Samples

Deep neural networks are successful learning tools for building nonlinear models. However, a robust deep learning-based classification model needs a large dataset. Indeed, these models are often unstable when they use small datasets. To solve this issue, which is particularly critical in light of the possible clinical applications of these predictive models, researchers have developed approaches such as virtual sample generation. Virtual sample generation significantly improves learning and classification performance when working with small samples. The main objective of this study is to evaluate the ability of the proposed virtual sample generation to overcome the small sample size problem, which is a feature of the automated detection of a neurodevelopmental disorder, namely autism spectrum disorder. Results show that our method enhances diagnostic accuracy from 84%–95% using virtual samples generated on the basis of five actual clinical samples. The present findings show the feasibility of using the proposed technique to improve classification performance even in cases of clinical samples of limited size. Accounting for concerns in relation to small sample sizes, our technique represents a meaningful step forward in terms of pattern recognition methodology, particularly when it is applied to diagnostic classifications of neurodevelopmental disorders. Besides, the proposed technique has been tested with other available benchmark datasets. The experimental outcomes showed that the accuracy of the classification that used virtual samples was superior to the one that used original training data without virtual samples.

Download Full-text

Binary Spectrum Feature for Improved Classiﬁer Performance

10.36227/techrxiv.12993122 ◽

2020 ◽

Author(s):

Nalika Ulapane ◽

Karthick Thiyagarajan ◽

sarath kodagoda

Keyword(s):

Machine Learning ◽

Classification Performance ◽

Feature Reduction ◽

Sensor Data ◽

Machine Learning Techniques ◽

Support Vector ◽

Svm Classifier ◽

Monitoring Task ◽

Classifier Performance ◽

Spectrum Feature

<div>Classiﬁcation has become a vital task in modern machine learning and Artiﬁcial Intelligence applications, including smart sensing. Numerous machine learning techniques are available to perform classiﬁcation. Similarly, numerous practices, such as feature selection (i.e., selection of a subset of descriptor variables that optimally describe the output), are available to improve classiﬁer performance. In this paper, we consider the case of a given supervised learning classiﬁcation task that has to be performed making use of continuous-valued features. It is assumed that an optimal subset of features has already been selected. Therefore, no further feature reduction, or feature addition, is to be carried out. Then, we attempt to improve the classiﬁcation performance by passing the given feature set through a transformation that produces a new feature set which we have named the “Binary Spectrum”. Via a case study example done on some Pulsed Eddy Current sensor data captured from an infrastructure monitoring task, we demonstrate how the classiﬁcation accuracy of a Support Vector Machine (SVM) classiﬁer increases through the use of this Binary Spectrum feature, indicating the feature transformation’s potential for broader usage.</div><div><br></div>

Download Full-text

Transformer Oil Quality Assessment Using Random Forest with Feature Engineering

Energies ◽

10.3390/en14071809 ◽

2021 ◽

Vol 14 (7) ◽

pp. 1809

Author(s):

Mohammed El Amine Senoussaoui ◽

Mostefa Brahami ◽

Issouf Fofana

Keyword(s):

Machine Learning ◽

Random Forest ◽

Oil Quality ◽

Principal Component ◽

Condition Assessment ◽

Classification Performance ◽

Transformer Oil ◽

Classification Model ◽

Insulation Degradation ◽

Transformer Oils

Machine learning is widely used as a panacea in many engineering applications including the condition assessment of power transformers. Most statistics attribute the main cause of transformer failure to insulation degradation. Thus, a new, simple, and effective machine-learning approach was proposed to monitor the condition of transformer oils based on some aging indicators. The proposed approach was used to compare the performance of two machine-learning classifiers: J48 decision tree and random forest. The service-aged transformer oils were classified into four groups: the oils that can be maintained in service, the oils that should be reconditioned or filtered, the oils that should be reclaimed, and the oils that must be discarded. From the two algorithms, random forest exhibited a better performance and high accuracy with only a small amount of data. Good performance was achieved through not only the application of the proposed algorithm but also the approach of data preprocessing. Before feeding the classification model, the available data were transformed using the simple k-means method. Subsequently, the obtained data were filtered through correlation-based feature selection (CFsSubset). The resulting features were again retransformed by conducting the principal component analysis and were passed through the CFsSubset filter. The transformation and filtration of the data improved the classification performance of the adopted algorithms, especially random forest. Another advantage of the proposed method is the decrease in the number of the datasets required for the condition assessment of transformer oils, which is valuable for transformer condition monitoring.

Download Full-text

A Leaf Disease Classification Model in Betel Vine Using Machine Learning Techniques

2021 2nd International Conference on Robotics, Electrical and Signal Processing Techniques (ICREST) ◽

10.1109/icrest51555.2021.9331142 ◽

2021 ◽

Author(s):

Md Zahid Hasan ◽

Nahid Zeba ◽

Md. Abdul Malek ◽

Sanjida Sultana Reya

Keyword(s):

Machine Learning ◽

Disease Classification ◽

Classification Model ◽

Machine Learning Techniques ◽

Leaf Disease ◽

Learning Techniques

Download Full-text

Model-independent calibrations of gamma-ray bursts using machine learning

Monthly Notices of the Royal Astronomical Society ◽

10.1093/mnras/stab795 ◽

2021 ◽

Vol 503 (3) ◽

pp. 4581-4600

Author(s):

Orlando Luongo ◽

Marco Muccino

Keyword(s):

Machine Learning ◽

Dark Energy ◽

Gamma Ray ◽

Gamma Ray Bursts ◽

Bayesian Regression ◽

Machine Learning Techniques ◽

Acoustic Oscillations ◽

Chi Square ◽

Model Independent ◽

Chi Square Analysis

ABSTRACT We alleviate the circularity problem, whereby gamma-ray bursts are not perfect distance indicators, by means of a new model-independent technique based on Bézier polynomials. We use the well consolidate Amati and Combo correlations. We consider improved calibrated catalogues of mock data from differential Hubble rate points. To get our mock data, we use those machine learning scenarios that well adapt to gamma-ray bursts, discussing in detail how we handle small amounts of data from our machine learning techniques. We explore only three machine learning treatments, i.e. linear regression, neural network, and random forest, emphasizing quantitative statistical motivations behind these choices. Our calibration strategy consists in taking Hubble’s data, creating the mock compilation using machine learning and calibrating the aforementioned correlations through Bézier polynomials with a standard chi-square analysis first and then by means of a hierarchical Bayesian regression procedure. The corresponding catalogues, built up from the two correlations, have been used to constrain dark energy scenarios. We thus employ Markov chain Monte Carlo numerical analyses based on the most recent Pantheon supernova data, baryonic acoustic oscillations, and our gamma-ray burst data. We test the standard ΛCDM model and the Chevallier–Polarski–Linder parametrization. We discuss the recent H0 tension in view of our results. Moreover, we highlight a further severe tension over Ωm and we conclude that a slight evolving dark energy model is possible.

Download Full-text

Application of deep and machine learning techniques for multi-label classification performance on psychotic disorder diseases

Informatics in Medicine Unlocked ◽

10.1016/j.imu.2021.100545 ◽

2021 ◽

Vol 23 ◽

pp. 100545

Author(s):

Israel Elujide ◽

Stephen G. Fashoto ◽

Bunmi Fashoto ◽

Elliot Mbunge ◽

Sakinat O. Folorunso ◽

...

Keyword(s):

Machine Learning ◽

Psychotic Disorder ◽

Classification Performance ◽

Machine Learning Techniques ◽

Learning Techniques

Download Full-text

Classification of multiwavelength transients with Machine Learning

Monthly Notices of the Royal Astronomical Society ◽

10.1093/mnras/staa3873 ◽

2020 ◽

Author(s):

K Sooknunan ◽

M Lochner ◽

Bruce A Bassett ◽

H V Peiris ◽

R Fender ◽

...

Keyword(s):

Machine Learning ◽

Small Sample ◽

Light Curves ◽

Machine Learning Techniques ◽

Optical Data ◽

Test Time ◽

Test Accuracy ◽

Training Set ◽

The Impact

Abstract With the advent of powerful telescopes such as the Square Kilometer Array and the Vera C. Rubin Observatory, we are entering an era of multiwavelength transient astronomy that will lead to a dramatic increase in data volume. Machine learning techniques are well suited to address this data challenge and rapidly classify newly detected transients. We present a multiwavelength classification algorithm consisting of three steps: (1) interpolation and augmentation of the data using Gaussian processes; (2) feature extraction using wavelets; (3) classification with random forests. Augmentation provides improved performance at test time by balancing the classes and adding diversity into the training set. In the first application of machine learning to the classification of real radio transient data, we apply our technique to the Green Bank Interferometer and other radio light curves. We find we are able to accurately classify most of the eleven classes of radio variables and transients after just eight hours of observations, achieving an overall test accuracy of 78%. We fully investigate the impact of the small sample size of 82 publicly available light curves and use data augmentation techniques to mitigate the effect. We also show that on a significantly larger simulated representative training set that the algorithm achieves an overall accuracy of 97%, illustrating that the method is likely to provide excellent performance on future surveys. Finally, we demonstrate the effectiveness of simultaneous multiwavelength observations by showing how incorporating just one optical data point into the analysis improves the accuracy of the worst performing class by 19%.

Download Full-text

Identification with machine learning techniques of a classification model for the degree of damage to rubber-textile conveyor belts with the aim to achieve sustainability

Engineering Failure Analysis ◽

10.1016/j.engfailanal.2021.105564 ◽

2021 ◽

pp. 105564

Author(s):

Andrejiova Miriam ◽

Anna Grincova ◽

Daniela Marasova

Keyword(s):

Machine Learning ◽

Classification Model ◽

Machine Learning Techniques ◽

Learning Techniques ◽

Conveyor Belts ◽

Degree Of Damage

Download Full-text

A Macrocause Classification Model for Violent Crime Analysis in the Field of Public Safety Based on Machine Learning Techniques

10.1109/isc253183.2021.9562842 ◽

2021 ◽

Author(s):

Ramiro de Vasconcelos dos Santos Junior ◽

Joao Vitor Venceslau Coelho ◽

Nelio Alessandro Azevedo Cacho

Keyword(s):

Machine Learning ◽

Violent Crime ◽

Public Safety ◽

Classification Model ◽

Machine Learning Techniques ◽

Crime Analysis ◽

Learning Techniques

Download Full-text