QSO photometric redshifts using machine learning and neural networks

S J Curran; J P Moss; Y C Perrott

doi:10.1093/mnras/stab485

QSO photometric redshifts using machine learning and neural networks

Monthly Notices of the Royal Astronomical Society ◽

10.1093/mnras/stab485 ◽

2021 ◽

Vol 503 (2) ◽

pp. 2639-2650

Author(s):

S J Curran ◽

J P Moss ◽

Y C Perrott

Keyword(s):

Machine Learning ◽

Near Infrared ◽

Sloan Digital Sky Survey ◽

Machine Learning Techniques ◽

Data Sets ◽

Photometric Redshifts ◽

Stellar Objects ◽

Wide Range ◽

Large Continuum ◽

Scientific Value

ABSTRACT The scientific value of the next generation of large continuum surveys would be greatly increased if the redshifts of the newly detected sources could be rapidly and reliably estimated. Given the observational expense of obtaining spectroscopic redshifts for the large number of new detections expected, there has been substantial recent work on using machine learning techniques to obtain photometric redshifts. Here, we compare the accuracy of the predicted photometric redshifts obtained from deep learning (DL) with the k-nearest neighbour (kNN) and the decision tree regression (DTR) algorithms. We find using a combination of near-infrared, visible, and ultraviolet magnitudes, trained upon a sample of Sloan Digital Sky Survey quasi-stellar objects, that the kNN and DL algorithms produce the best self-validation result with a standard deviation of σΔz = 0.24 (σΔz(norm) = 0.11). Testing on various subsamples, we find that the DL algorithm generally has lower values of σΔz, in addition to exhibiting a better performance in other measures. Our DL method, which uses an easy to implement off-the-shelf algorithm with neither filtering nor removal of outliers, performs similarly to other, more complex, algorithms, resulting in an accuracy of Δz < 0.1 up to z ∼ 2.5. Applying the DL algorithm trained on our 70 000 strong sample to other independent (radio-selected) data sets, we find σΔz ≤ 0.36 (σΔz(norm) ≤ 0.17) over a wide range of radio flux densities. This indicates much potential in using this method to determine photometric redshifts of quasars detected with the Square Kilometre Array.

Download Full-text

Proximal Methods for Plant Stress Detection Using Optical Sensors and Machine Learning

Biosensors ◽

10.3390/bios10120193 ◽

2020 ◽

Vol 10 (12) ◽

pp. 193

Author(s):

Alanna V. Zubler ◽

Jeong-Yeol Yoon

Keyword(s):

Machine Learning ◽

Optical Sensors ◽

Near Infrared ◽

Plant Stress ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Detection Methods ◽

Stress Detection ◽

Recent Advances ◽

Wide Range

Plant stresses have been monitored using the imaging or spectrometry of plant leaves in the visible (red-green-blue or RGB), near-infrared (NIR), infrared (IR), and ultraviolet (UV) wavebands, often augmented by fluorescence imaging or fluorescence spectrometry. Imaging at multiple specific wavelengths (multi-spectral imaging) or across a wide range of wavelengths (hyperspectral imaging) can provide exceptional information on plant stress and subsequent diseases. Digital cameras, thermal cameras, and optical filters have become available at a low cost in recent years, while hyperspectral cameras have become increasingly more compact and portable. Furthermore, smartphone cameras have dramatically improved in quality, making them a viable option for rapid, on-site stress detection. Due to these developments in imaging technology, plant stresses can be monitored more easily using handheld and field-deployable methods. Recent advances in machine learning algorithms have allowed for images and spectra to be analyzed and classified in a fully automated and reproducible manner, without the need for complicated image or spectrum analysis methods. This review will highlight recent advances in portable (including smartphone-based) detection methods for biotic and abiotic stresses, discuss data processing and machine learning techniques that can produce results for stress identification and classification, and suggest future directions towards the successful translation of these methods into practical use.

Download Full-text

QSO photometric redshifts from SDSS, WISE, and GALEX colours

Monthly Notices of the Royal Astronomical Society Letters ◽

10.1093/mnrasl/slaa012 ◽

2020 ◽

Vol 493 (1) ◽

pp. L70-L75

Author(s):

S J Curran

Keyword(s):

Near Infrared ◽

Empirical Method ◽

Machine Learning Techniques ◽

Radio Sources ◽

Photometric Redshifts ◽

Stellar Objects ◽

Vast Number ◽

Quasi Stellar Objects ◽

Continuum Radio ◽

Non Gaussian

ABSTRACT Machine learning techniques, specifically the k-nearest neighbour algorithm applied to optical band colours, have had some success in predicting photometric redshifts of quasi-stellar objects (QSOs): Although the mean of differences between the spectroscopic and photometric redshifts, Δ$z$, is close to zero, the distribution of these differences remains wide and distinctly non-Gaussian. As per our previous empirical estimate of photometric redshifts, we find that the predictions can be significantly improved by adding colours from other wavebands, namely the near-infrared and ultraviolet. Self-testing this, by using half of the 33 643 strong QSO sample to train the algorithm, results in a significantly narrower spread in Δ$z$ for the remaining half of the sample. Using the whole QSO sample to train the algorithm, the same set of magnitudes return a similar spread in Δ$z$ for a sample of radio sources (quasars). Although the matching coincidence is relatively low (739 of the 3663 sources having photometry in the relevant bands), this is still significantly larger than from the empirical method (2 per cent) and thus may provide a method with which to obtain redshifts for the vast number of continuum radio sources expected to be detected with the next generation of large radio telescopes.

Download Full-text

Spectroscopic observations of the machine-learning selected anomaly catalogue from the AllWISE Sky Survey

Astronomy and Astrophysics ◽

10.1051/0004-6361/202038439 ◽

2020 ◽

Vol 642 ◽

pp. A103

Author(s):

A. Solarz ◽

R. Thomas ◽

F. M. Montenegro-Montes ◽

M. Gromadzki ◽

E. Donoso ◽

...

Keyword(s):

Machine Learning ◽

Near Infrared ◽

Learning Algorithm ◽

Current Data ◽

Sloan Digital Sky Survey ◽

Support Vector ◽

Wide Field ◽

Training Set ◽

Stellar Objects ◽

Sky Survey

We present the results of a programme to search and identify the nature of unusual sources within the All-sky Wide-field Infrared Survey Explorer (WISE) that is based on a machine-learning algorithm for anomaly detection, namely one-class support vector machines (OCSVM). Designed to detect sources deviating from a training set composed of known classes, this algorithm was used to create a model for the expected data based on WISE objects with spectroscopic identifications in the Sloan Digital Sky Survey. Subsequently, it marked as anomalous those sources whose WISE photometry was shown to be inconsistent with this model. We report the results from optical and near-infrared spectroscopy follow-up observations of a subset of 36 bright (gAB < 19.5) objects marked as “anomalous” by the OCSVM code to verify its performance. Among the observed objects, we identified three main types of sources: (i) low redshift (z ∼ 0.03 − 0.15) galaxies containing large amounts of hot dust (53%), including three Wolf-Rayet galaxies; (ii) broad-line quasi-stellar objects (QSOs) (33%) including low-ionisation broad absorption line (LoBAL) quasars and a rare QSO with strong and narrow ultraviolet iron emission; (iii) Galactic objects in dusty phases of their evolution (3%). The nature of four of these objects (11%) remains undetermined due to low signal-to-noise or featureless spectra. The current data show that the algorithm works well at detecting rare but not necessarily unknown objects among the brightest candidates. They mostly represent peculiar sub-types of otherwise well-known sources. To search for even more unusual sources, a more complete and balanced training set should be created after including these rare sub-species of otherwise abundant source classes, such as LoBALs. Such an iterative approach will ideally bring us closer to improving the strategy design for the detection of rarer sources contained within the vast data store of the AllWISE survey.

Download Full-text

Efficient Prediction of Structural and Electronic Properties of Hybrid 2D Materials Using DFT and Machine Learning

10.26434/chemrxiv.6254756.v1 ◽

2018 ◽

Author(s):

Sherif Tawfik ◽

Olexandr Isayev ◽

Catherine Stampfl ◽

Joseph Shapter ◽

David Winkler ◽

...

Keyword(s):

Machine Learning ◽

Band Gap ◽

Density Functional ◽

2D Materials ◽

Van Der Waals ◽

Building Blocks ◽

Machine Learning Techniques ◽

Interlayer Distance ◽

Computational Screening ◽

Wide Range

Materials constructed from different van der Waals two-dimensional (2D) heterostructures offer a wide range of benefits, but these systems have been little studied because of their experimental and computational complextiy, and because of the very large number of possible combinations of 2D building blocks. The simulation of the interface between two different 2D materials is computationally challenging due to the lattice mismatch problem, which sometimes necessitates the creation of very large simulation cells for performing density-functional theory (DFT) calculations. Here we use a combination of DFT, linear regression and machine learning techniques in order to rapidly determine the interlayer distance between two different 2D heterostructures that are stacked in a bilayer heterostructure, as well as the band gap of the bilayer. Our work provides an excellent proof of concept by quickly and accurately predicting a structural property (the interlayer distance) and an electronic property (the band gap) for a large number of hybrid 2D materials. This work paves the way for rapid computational screening of the vast parameter space of van der Waals heterostructures to identify new hybrid materials with useful and interesting properties.

Download Full-text

Hierarchical Matching and Regression with Application to Photometric Redshift Estimation

Proceedings of the International Astronomical Union ◽

10.1017/s1743921317001569 ◽

2016 ◽

Vol 12 (S325) ◽

pp. 145-155

Author(s):

Fionn Murtagh

Keyword(s):

Information Structure ◽

Data Analytics ◽

A Priori ◽

Sloan Digital Sky Survey ◽

Photometric Redshifts ◽

Merging Galaxies ◽

Photometric Redshift ◽

Sky Survey ◽

Wide Range ◽

Regression Problems

AbstractThis work emphasizes that heterogeneity, diversity, discontinuity, and discreteness in data is to be exploited in classification and regression problems. A global a priori model may not be desirable. For data analytics in cosmology, this is motivated by the variety of cosmological objects such as elliptical, spiral, active, and merging galaxies at a wide range of redshifts. Our aim is matching and similarity-based analytics that takes account of discrete relationships in the data. The information structure of the data is represented by a hierarchy or tree where the branch structure, rather than just the proximity, is important. The representation is related to p-adic number theory. The clustering or binning of the data values, related to the precision of the measurements, has a central role in this methodology. If used for regression, our approach is a method of cluster-wise regression, generalizing nearest neighbour regression. Both to exemplify this analytics approach, and to demonstrate computational benefits, we address the well-known photometric redshift or ‘photo-z’ problem, seeking to match Sloan Digital Sky Survey (SDSS) spectroscopic and photometric redshifts.

Download Full-text

Improving Reliability Estimation for Individual Numeric Predictions: A Machine Learning Approach

INFORMS Journal on Computing ◽

10.1287/ijoc.2020.1019 ◽

2021 ◽

Author(s):

Gediminas Adomavicius ◽

Yaqiong Wang

Keyword(s):

Machine Learning ◽

General Purpose ◽

Reliability Estimation ◽

Machine Learning Techniques ◽

Data Sets ◽

Real World Data ◽

Learning Techniques ◽

Reliability Indicator ◽

Machine Learning Approach ◽

Prediction Reliability

Numerical predictive modeling is widely used in different application domains. Although many modeling techniques have been proposed, and a number of different aggregate accuracy metrics exist for evaluating the overall performance of predictive models, other important aspects, such as the reliability (or confidence and uncertainty) of individual predictions, have been underexplored. We propose to use estimated absolute prediction error as the indicator of individual prediction reliability, which has the benefits of being intuitive and providing highly interpretable information to decision makers, as well as allowing for more precise evaluation of reliability estimation quality. As importantly, the proposed reliability indicator allows the reframing of reliability estimation itself as a canonical numeric prediction problem, which makes the proposed approach general-purpose (i.e., it can work in conjunction with any outcome prediction model), alleviates the need for distributional assumptions, and enables the use of advanced, state-of-the-art machine learning techniques to learn individual prediction reliability patterns directly from data. Extensive experimental results on multiple real-world data sets show that the proposed machine learning-based approach can significantly improve individual prediction reliability estimation as compared with a number of baselines from prior work, especially in more complex predictive scenarios.

Download Full-text

Integration of image segmentation and fuzzy theory to improve the accuracy of damage detection areas in traffic accidents

Journal Of Big Data ◽

10.1186/s40537-021-00539-2 ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Majid Amirfakhrian ◽

Mahboub Parhizkar

Keyword(s):

Machine Learning ◽

Image Processing ◽

Machine Vision ◽

Traffic Accidents ◽

Fuzzy Theory ◽

Machine Learning Techniques ◽

Industrial Setting ◽

Technological Advances ◽

Learning Techniques ◽

Wide Range

AbstractIn the next decade, machine vision technology will have an enormous impact on industrial works because of the latest technological advances in this field. These advances are so significant that the use of this technology is now essential. Machine vision is the process of using a wide range of technologies and methods in providing automated inspections in an industrial setting based on imaging, process control, and robot guidance. One of the applications of machine vision is to diagnose traffic accidents. Moreover, car vision is utilized for detecting the amount of damage to vehicles during traffic accidents. In this article, using image processing and machine learning techniques, a new method is presented to improve the accuracy of detecting damaged areas in traffic accidents. Evaluating the proposed method and comparing it with previous works showed that the proposed method is more accurate in identifying damaged areas and it has a shorter execution time.

Download Full-text

Decision Tree: A Machine Learning for Intrusion Detection

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.f1234.0486s419 ◽

2019 ◽

Vol 8 (6S4) ◽

pp. 1126-1130

Keyword(s):

Machine Learning ◽

Intrusion Detection ◽

Detection System ◽

Research Work ◽

Machine Learning Techniques ◽

Data Sets ◽

Legitimate User ◽

Learning Techniques ◽

Three Stages

The Intrusion is a major threat to unauthorized data or legal network using the legitimate user identity or any of the back doors and vulnerabilities in the network. IDS mechanisms are developed to detect the intrusions at various levels. The objective of the research work is to improve the Intrusion Detection System performance by applying machine learning techniques based on decision trees for detection and classification of attacks. The methodology adapted will process the datasets in three stages. The experimentation is conducted on KDDCUP99 data sets based on number of features. The Bayesian three modes are analyzed for different sized data sets based upon total number of attacks. The time consumed by the classifier to build the model is analyzed and the accuracy is done.

Download Full-text

Channel-independent recreation of artefactual signals in chronically recorded local field potentials using machine learning

Brain Informatics ◽

10.1186/s40708-021-00149-x ◽

2022 ◽

Vol 9 (1) ◽

Author(s):

Marcos Fabietti ◽

Mufti Mahmud ◽

Ahmad Lotfi

Keyword(s):

Machine Learning ◽

Open Access ◽

Short Term Memory ◽

The Body ◽

Data Sets ◽

Additional Information ◽

Machine Learning Model ◽

Signal Characteristics ◽

Wide Range ◽

Memory Network

AbstractAcquisition of neuronal signals involves a wide range of devices with specific electrical properties. Combined with other physiological sources within the body, the signals sensed by the devices are often distorted. Sometimes these distortions are visually identifiable, other times, they overlay with the signal characteristics making them very difficult to detect. To remove these distortions, the recordings are visually inspected and manually processed. However, this manual annotation process is time-consuming and automatic computational methods are needed to identify and remove these artefacts. Most of the existing artefact removal approaches rely on additional information from other recorded channels and fail when global artefacts are present or the affected channels constitute the majority of the recording system. Addressing this issue, this paper reports a novel channel-independent machine learning model to accurately identify and replace the artefactual segments present in the signals. Discarding these artifactual segments by the existing approaches causes discontinuities in the reproduced signals which may introduce errors in subsequent analyses. To avoid this, the proposed method predicts multiple values of the artefactual region using long–short term memory network to recreate the temporal and spectral properties of the recorded signal. The method has been tested on two open-access data sets and incorporated into the open-access SANTIA (SigMate Advanced: a Novel Tool for Identification of Artefacts in Neuronal Signals) toolbox for community use.

Download Full-text

Artificially Generated Training Data-sets for Supervised Machine Learning Techniques in Magnetic Resonance Imaging: An Example in Myocardial Segmentation

2019 Computing in Cardiology Conference (CinC) ◽

10.22489/cinc.2019.220 ◽

2019 ◽

Author(s):

Christos Xanthis ◽

Kostas Haris ◽

Dimitrios Filos ◽

Anthony Aletras

Keyword(s):

Magnetic Resonance Imaging ◽

Machine Learning ◽

Magnetic Resonance ◽

Training Data ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Data Sets ◽

Resonance Imaging ◽

Learning Techniques ◽

Myocardial Segmentation

Download Full-text