scholarly journals An investigation on the factors affecting machine learning classifications in gamma-ray astronomy

2020 ◽  
Vol 492 (4) ◽  
pp. 5377-5390 ◽  
Author(s):  
Shengda Luo ◽  
Alex P Leung ◽  
C Y Hui ◽  
K L Li

ABSTRACT We have investigated a number of factors that can have significant impacts on the classification performance of gamma-ray sources detected by Fermi Large Area Telescope (LAT) with machine learning techniques. We show that a framework of automatic feature selection can construct a simple model with a small set of features that yields better performance over previous results. Secondly, because of the small sample size of the training/test sets of certain classes in gamma-ray, nested re-sampling and cross-validations are suggested for quantifying the statistical fluctuations of the quoted accuracy. We have also constructed a test set by cross-matching the identified active galactic nuclei (AGNs) and the pulsars (PSRs) in the Fermi-LAT 8-yr point source catalogue (4FGL) with those unidentified sources in the previous 3rd Fermi-LAT Source Catalog (3FGL). Using this cross-matched set, we show that some features used for building classification model with the identified source can suffer from the problem of covariate shift, which can be a result of various observational effects. This can possibly hamper the actual performance when one applies such model in classifying unidentified sources. Using our framework, both AGN/PSR and young pulsar (YNG)/millisecond pulsar (MSP) classifiers are automatically updated with the new features and the enlarged training samples in 4FGL catalogue incorporated. Using a two-layer model with these updated classifiers, we have selected 20 promising MSP candidates with confidence scores $\gt 98{{\ \rm per\ cent}}$ from the unidentified sources in 4FGL catalogue that can provide inputs for a multiwavelength identification campaign.

2019 ◽  
Vol 490 (4) ◽  
pp. 4770-4777 ◽  
Author(s):  
M Kovačević ◽  
G Chiaro ◽  
S Cutini ◽  
G Tosti

ABSTRACT Machine learning is an automatic technique that is revolutionizing scientific research, with innovative applications and wide use in astrophysics. The aim of this study was to develop an optimized version of an Artificial Neural Network machine learning method for classifying blazar candidates of uncertain type detected by the Fermi Large Area Telescope γ-ray instrument. The final result of this study increased the classification performance by about 80 ${{\ \rm per\ cent}}$ with respect to previous method, leaving only 15 unclassified blazars out of 573 blazar candidates of uncertain type listed in the LAT 4-year Source Catalog.


Algorithms ◽  
2019 ◽  
Vol 12 (8) ◽  
pp. 160 ◽  
Author(s):  
Mohammad Wedyan ◽  
Alessandro Crippa ◽  
Adel Al-Jumaily

Deep neural networks are successful learning tools for building nonlinear models. However, a robust deep learning-based classification model needs a large dataset. Indeed, these models are often unstable when they use small datasets. To solve this issue, which is particularly critical in light of the possible clinical applications of these predictive models, researchers have developed approaches such as virtual sample generation. Virtual sample generation significantly improves learning and classification performance when working with small samples. The main objective of this study is to evaluate the ability of the proposed virtual sample generation to overcome the small sample size problem, which is a feature of the automated detection of a neurodevelopmental disorder, namely autism spectrum disorder. Results show that our method enhances diagnostic accuracy from 84%–95% using virtual samples generated on the basis of five actual clinical samples. The present findings show the feasibility of using the proposed technique to improve classification performance even in cases of clinical samples of limited size. Accounting for concerns in relation to small sample sizes, our technique represents a meaningful step forward in terms of pattern recognition methodology, particularly when it is applied to diagnostic classifications of neurodevelopmental disorders. Besides, the proposed technique has been tested with other available benchmark datasets. The experimental outcomes showed that the accuracy of the classification that used virtual samples was superior to the one that used original training data without virtual samples.


2020 ◽  
Author(s):  
Nalika Ulapane ◽  
Karthick Thiyagarajan ◽  
sarath kodagoda

<div>Classification has become a vital task in modern machine learning and Artificial Intelligence applications, including smart sensing. Numerous machine learning techniques are available to perform classification. Similarly, numerous practices, such as feature selection (i.e., selection of a subset of descriptor variables that optimally describe the output), are available to improve classifier performance. In this paper, we consider the case of a given supervised learning classification task that has to be performed making use of continuous-valued features. It is assumed that an optimal subset of features has already been selected. Therefore, no further feature reduction, or feature addition, is to be carried out. Then, we attempt to improve the classification performance by passing the given feature set through a transformation that produces a new feature set which we have named the “Binary Spectrum”. Via a case study example done on some Pulsed Eddy Current sensor data captured from an infrastructure monitoring task, we demonstrate how the classification accuracy of a Support Vector Machine (SVM) classifier increases through the use of this Binary Spectrum feature, indicating the feature transformation’s potential for broader usage.</div><div><br></div>


Energies ◽  
2021 ◽  
Vol 14 (7) ◽  
pp. 1809
Author(s):  
Mohammed El Amine Senoussaoui ◽  
Mostefa Brahami ◽  
Issouf Fofana

Machine learning is widely used as a panacea in many engineering applications including the condition assessment of power transformers. Most statistics attribute the main cause of transformer failure to insulation degradation. Thus, a new, simple, and effective machine-learning approach was proposed to monitor the condition of transformer oils based on some aging indicators. The proposed approach was used to compare the performance of two machine-learning classifiers: J48 decision tree and random forest. The service-aged transformer oils were classified into four groups: the oils that can be maintained in service, the oils that should be reconditioned or filtered, the oils that should be reclaimed, and the oils that must be discarded. From the two algorithms, random forest exhibited a better performance and high accuracy with only a small amount of data. Good performance was achieved through not only the application of the proposed algorithm but also the approach of data preprocessing. Before feeding the classification model, the available data were transformed using the simple k-means method. Subsequently, the obtained data were filtered through correlation-based feature selection (CFsSubset). The resulting features were again retransformed by conducting the principal component analysis and were passed through the CFsSubset filter. The transformation and filtration of the data improved the classification performance of the adopted algorithms, especially random forest. Another advantage of the proposed method is the decrease in the number of the datasets required for the condition assessment of transformer oils, which is valuable for transformer condition monitoring.


2021 ◽  
Vol 503 (3) ◽  
pp. 4581-4600
Author(s):  
Orlando Luongo ◽  
Marco Muccino

ABSTRACT We alleviate the circularity problem, whereby gamma-ray bursts are not perfect distance indicators, by means of a new model-independent technique based on Bézier polynomials. We use the well consolidate Amati and Combo correlations. We consider improved calibrated catalogues of mock data from differential Hubble rate points. To get our mock data, we use those machine learning scenarios that well adapt to gamma-ray bursts, discussing in detail how we handle small amounts of data from our machine learning techniques. We explore only three machine learning treatments, i.e. linear regression, neural network, and random forest, emphasizing quantitative statistical motivations behind these choices. Our calibration strategy consists in taking Hubble’s data, creating the mock compilation using machine learning and calibrating the aforementioned correlations through Bézier polynomials with a standard chi-square analysis first and then by means of a hierarchical Bayesian regression procedure. The corresponding catalogues, built up from the two correlations, have been used to constrain dark energy scenarios. We thus employ Markov chain Monte Carlo numerical analyses based on the most recent Pantheon supernova data, baryonic acoustic oscillations, and our gamma-ray burst data. We test the standard ΛCDM model and the Chevallier–Polarski–Linder parametrization. We discuss the recent H0 tension in view of our results. Moreover, we highlight a further severe tension over Ωm and we conclude that a slight evolving dark energy model is possible.


2021 ◽  
Vol 23 ◽  
pp. 100545
Author(s):  
Israel Elujide ◽  
Stephen G. Fashoto ◽  
Bunmi Fashoto ◽  
Elliot Mbunge ◽  
Sakinat O. Folorunso ◽  
...  

Author(s):  
K Sooknunan ◽  
M Lochner ◽  
Bruce A Bassett ◽  
H V Peiris ◽  
R Fender ◽  
...  

Abstract With the advent of powerful telescopes such as the Square Kilometer Array and the Vera C. Rubin Observatory, we are entering an era of multiwavelength transient astronomy that will lead to a dramatic increase in data volume. Machine learning techniques are well suited to address this data challenge and rapidly classify newly detected transients. We present a multiwavelength classification algorithm consisting of three steps: (1) interpolation and augmentation of the data using Gaussian processes; (2) feature extraction using wavelets; (3) classification with random forests. Augmentation provides improved performance at test time by balancing the classes and adding diversity into the training set. In the first application of machine learning to the classification of real radio transient data, we apply our technique to the Green Bank Interferometer and other radio light curves. We find we are able to accurately classify most of the eleven classes of radio variables and transients after just eight hours of observations, achieving an overall test accuracy of 78%. We fully investigate the impact of the small sample size of 82 publicly available light curves and use data augmentation techniques to mitigate the effect. We also show that on a significantly larger simulated representative training set that the algorithm achieves an overall accuracy of 97%, illustrating that the method is likely to provide excellent performance on future surveys. Finally, we demonstrate the effectiveness of simultaneous multiwavelength observations by showing how incorporating just one optical data point into the analysis improves the accuracy of the worst performing class by 19%.


Sign in / Sign up

Export Citation Format

Share Document