Identifying new X-ray binary candidates in M31 using random forest classification

R M Arnason; P Barmby; N Vulic

doi:10.1093/mnras/staa207

Identifying new X-ray binary candidates in M31 using random forest classification

Monthly Notices of the Royal Astronomical Society ◽

10.1093/mnras/staa207 ◽

2020 ◽

Vol 492 (4) ◽

pp. 5075-5088 ◽

Cited By ~ 1

Author(s):

R M Arnason ◽

P Barmby ◽

N Vulic

Keyword(s):

Random Forest ◽

High Probability ◽

Binary Classification ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Photon Flux ◽

X Ray ◽

Random Forest Classification ◽

Forest Classification ◽

Cent Level

ABSTRACT Identifying X-ray binary (XRB) candidates in nearby galaxies requires distinguishing them from possible contaminants including foreground stars and background active galactic nuclei. This work investigates the use of supervised machine learning algorithms to identify high-probability XRB candidates. Using a catalogue of 943 Chandra X-ray sources in the Andromeda galaxy, we trained and tested several classification algorithms using the X-ray properties of 163 sources with previously known types. Amongst the algorithms tested, we find that random forest classifiers give the best performance and work better in a binary classification (XRB/non-XRB) context compared to the use of multiple classes. Evaluating our method by comparing with classifications from visible-light and hard X-ray observations as part of the Panchromatic Hubble Andromeda Treasury, we find compatibility at the 90 per cent level, although we caution that the number of source in common is rather small. The estimated probability that an object is an XRB agrees well between the random forest binary and multiclass approaches and we find that the classifications with the highest confidence are in the XRB class. The most discriminating X-ray bands for classification are the 1.7–2.8, 0.5–1.0, 2.0–4.0, and 2.0–7.0 keV photon flux ratios. Of the 780 unclassified sources in the Andromeda catalogue, we identify 16 new high-probability XRB candidates and tabulate their properties for follow-up.

Download Full-text

Ranking near-native candidate protein structures via random forest classification

BMC Bioinformatics ◽

10.1186/s12859-019-3257-8 ◽

2019 ◽

Vol 20 (S25) ◽

Cited By ~ 1

Author(s):

Hongjie Wu ◽

Hongmei Huang ◽

Weizhong Lu ◽

Qiming Fu ◽

Yijie Ding ◽

...

Keyword(s):

Random Forest ◽

Binary Classification ◽

Protein Structures ◽

Three Dimensional ◽

Exact Order ◽

Large Set ◽

Clustering Methods ◽

Native Structure ◽

Random Forest Classification ◽

Forest Classification

Abstract Background In ab initio protein-structure predictions, a large set of structural decoys are often generated, with the requirement to select best five or three candidates from the decoys. The clustered central structures with the most number of neighbors are frequently regarded as the near-native protein structures with the lowest free energy; however, limitations in clustering methods and three-dimensional structural-distance assessments make identifying exact order of the best five or three near-native candidate structures difficult. Results To address this issue, we propose a method that re-ranks the candidate structures via random forest classification using intra- and inter-cluster features from the results of the clustering. Comparative analysis indicated that our method was better able to identify the order of the candidate structures as comparing with current methods SPICKR, Calibur, and Durandal. The results confirmed that the identification of the first model were closer to the native structure in 12 of 43 cases versus four for SPICKER, and the same as the native structure in up to 27 of 43 cases versus 14 for Calibur and up to eight of 43 cases versus two for Durandal. Conclusions In this study, we presented an improved method based on random forest classification to transform the problem of re-ranking the candidate structures by an binary classification. Our results indicate that this method is a powerful method for the problem and the effect of this method is better than other methods.

Download Full-text

Performance Analysis of Different Machine Learning Algorithms for Identifying and Classifying the Failures of Traction Motors

Journal of Physics Conference Series ◽

10.1088/1742-6596/2095/1/012058 ◽

2021 ◽

Vol 2095 (1) ◽

pp. 012058

Author(s):

Xiaoyu Xian ◽

Haichuan Tang ◽

Yin Tian ◽

Qi Liu ◽

Yuming Fan

Keyword(s):

Machine Learning ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Support Vector ◽

Random Forest Classification ◽

Machine Learning Classification ◽

Fault Type ◽

Traction Motors ◽

Forest Classification

Abstract This paper addresses electric motor fault diagnosis using supervised machine learning classification. A total of 15 distinct fault types are classified and multilabel strategies are used to classify concurrent faults. we explored, developed, and compared the performance of different types of binary (fault/non-fault), multi-class (fault type) and multi-label (single fault versus combination fault) classifiers. To evaluate the effectiveness of fault identification and classification, we used different supervised machine learning methods, including Random forest classification, support vector machine and neural network classification. Through experiment, we compared these methods over 4 classification regimes and finally summarize the most suitable machine learning algorithms for different aspects of health diagnosis in traction motors area.

Download Full-text

The Impact of Simulated Spectral Noise on Random Forest and Oblique Random Forest Classification Performance

Journal of Spectroscopy ◽

10.1155/2018/8316918 ◽

2018 ◽

Vol 2018 ◽

pp. 1-8 ◽

Cited By ~ 2

Author(s):

Na’eem Hoosen Agjee ◽

Onisimo Mutanga ◽

Kabir Peerbhay ◽

Riyad Ismail

Keyword(s):

Random Forest ◽

Classification Performance ◽

Machine Learning Algorithms ◽

Support Vector ◽

Random Forest Classification ◽

Node Splitting ◽

Forest Classification ◽

Vector Machines ◽

Classifier Performance ◽

The Impact

Hyperspectral datasets contain spectral noise, the presence of which adversely affects the classifier performance to generalize accurately. Despite machine learning algorithms being regarded as robust classifiers that generalize well under unfavourable noisy conditions, the extent of this is poorly understood. This study aimed to evaluate the influence of simulated spectral noise (10%, 20%, and 30%) on random forest (RF) and oblique random forest (oRF) classification performance using two node-splitting models (ridge regression (RR) and support vector machines (SVM)) to discriminate healthy and low infested water hyacinth plants. Results from this study showed that RF was slightly influenced by simulated noise with classification accuracies decreasing for week one and week two with the addition of 30% noise. In comparison to RF, oRF-RR and oRF-SVM yielded higher test accuracies (oRF-RR: 5.36%–7.15%; oRF-SVM: 3.58%–5.36%) and test kappa coefficients (oRF-RR: 10.72%–14.29%; oRF-SVM: 7.15%–10.72%). Notably, oRF-RR test accuracies and kappa coefficients remained consistent irrespective of simulated noise level for week one and week two while similar results were achieved for week three using oRF-SVM. Overall, this study has demonstrated that oRF-RR can be regarded a robust classification algorithm that is not influenced by noisy spectral conditions.

Download Full-text

Classification of Sentiment on Business Data for Decision Making using Supervised Machine Learning Methods

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.c6086.029320 ◽

2020 ◽

Vol 9 (3) ◽

pp. 3595-3600

Keyword(s):

Machine Learning ◽

Random Forest ◽

Sentiment Analysis ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Support Vector ◽

Product Review ◽

Data Set ◽

Random Forest Classification

Sentiment analysis is deals with the classification of sentiments expressed in a particular document. The analysis of user generated data by using sentiment analysis is very useful for knowing the opinion of a crowd. This paper is mainly aimed to tackle the problem of polarity categorization of sentiment analysis. A Detailed description of the sentiment analysis process is also given. Product review data set from UCI repository is used for analysis. This paper is giving a comparative analysis of four supervised machine learning algorithms namely Naive Bayes, Support Vector Machine, Decision Tree and Random Forest which are used for product review analysis. The result shows that, Random Forest classification algorithm provides better accuracy than other three algorithms

Download Full-text

SYBA: Bayesian estimation of synthetic accessibility of organic compounds

10.21203/rs.2.22597/v1 ◽

2020 ◽

Author(s):

Milan Voršilák ◽

Michal Kolář ◽

Ivan Čmelo ◽

Daniel Svozil

Keyword(s):

Random Forest ◽

Organic Compounds ◽

Binary Classification ◽

Roc Curves ◽

Classification Problem ◽

Biological Properties ◽

Random Forest Classification ◽

Forest Classification ◽

Physico Chemical ◽

Synthetic Accessibility

Abstract SYBA (SYnthetic Bayesian Accessibility) is a fragment based method for the rapid classification of organic compounds as easy- (ES) or hard-to-synthesize (HS). SYBA is based on the Bayesian analysis of the frequency of molecular fragments in the database of ES and HS molecules. It was trained on ES molecules available in the ZINC15 database and on HS molecules generated by the Nonpher methodology. SYBA was compared with a random forest, that was utilized as a baseline method, as well as with other two methods for synthetic accessibility assessment: SAScore and SCScore. When used with their suggested thresholds, SYBA improves over random forest classification, albeit marginally, and outperforms SAScore and SCScore. However, with thresholds optimized by the analysis of ROC curves, SAScore improves considerably and yields similar results as SYBA. Because SYBA is based merely on fragment contributions, it can be used for the analysis of the contribution of individual molecular parts to compound synthetic accessibility. Though SYBA was developed to quickly assess compound synthetic accessibility, its underlying Bayesian framework is a general approach that can be applied to any binary classification problem. Therefore, SYBA can be easily re-trained to classify compounds by other physico-chemical or biological properties. SYBA is publicly available at https://github.com/lich-uct/syba under the GNU General Public License.

Download Full-text

Epilepsy Detection Using Random Forest Classification Based on Locally Linear Embedding Algorithm

2020 5th International Conference on Control, Robotics and Cybernetics (CRC) ◽

10.1109/crc51253.2020.9253455 ◽

2020 ◽

Author(s):

Qing Hou ◽

Yang Liu ◽

Jun Liu ◽

Siqi Sun

Keyword(s):

Random Forest ◽

Locally Linear Embedding ◽

Random Forest Classification ◽

Forest Classification ◽

Linear Embedding ◽

Locally Linear

Download Full-text

Quantitative measurement of retinal ganglion cell populations via histology-based random forest classification

Experimental Eye Research ◽

10.1016/j.exer.2015.09.011 ◽

2016 ◽

Vol 146 ◽

pp. 370-385 ◽

Cited By ~ 10

Author(s):

Adam Hedberg-Buenz ◽

Mark A. Christopher ◽

Carly J. Lewis ◽

Kimberly A. Fernandes ◽

Laura M. Dutca ◽

...

Keyword(s):

Random Forest ◽

Ganglion Cell ◽

Retinal Ganglion Cell ◽

Quantitative Measurement ◽

Cell Populations ◽

Retinal Ganglion ◽

Random Forest Classification ◽

Forest Classification

Download Full-text

Chronic Kidney Disease for Collaborative Healthcare Data Analytics using Random Forest Classification Algorithms

2021 International Conference on Computer Communication and Informatics (ICCCI) ◽

10.1109/iccci50826.2021.9402574 ◽

2021 ◽

Author(s):

V. Shanmugarajeshwari ◽

M. Ilayaraja

Keyword(s):

Chronic Kidney Disease ◽

Random Forest ◽

Kidney Disease ◽

Data Analytics ◽

Classification Algorithms ◽

Random Forest Classification ◽

Healthcare Data ◽

Forest Classification

Download Full-text

Spatial sampling effect on data structure and Random Forest classification of tissue types in High Definition and Standard Definition FT-IR imaging

Chemometrics and Intelligent Laboratory Systems ◽

10.1016/j.chemolab.2021.104407 ◽

2021 ◽

pp. 104407

Author(s):

Danuta Liberda ◽

Karolina Kosowska ◽

Paulina Koziol ◽

Tomasz P. Wrobel

Keyword(s):

Data Structure ◽

Random Forest ◽

Spatial Sampling ◽

High Definition ◽

Standard Definition ◽

Sampling Effect ◽

Random Forest Classification ◽

Forest Classification ◽

Ft Ir

Download Full-text

Random forest classification of salt marsh vegetation habitats using quad-polarimetric airborne SAR, elevation and optical RS data

Remote Sensing of Environment ◽

10.1016/j.rse.2014.04.010 ◽

2014 ◽

Vol 149 ◽

pp. 118-129 ◽

Cited By ~ 108

Author(s):

Sybrand van Beijma ◽

Alexis Comber ◽

Alistair Lamb

Keyword(s):

Salt Marsh ◽

Random Forest ◽

Salt Marsh Vegetation ◽

Marsh Vegetation ◽

Random Forest Classification ◽

Forest Classification ◽

Airborne Sar

Download Full-text