scholarly journals Identifying new X-ray binary candidates in M31 using random forest classification

2020 ◽  
Vol 492 (4) ◽  
pp. 5075-5088 ◽  
Author(s):  
R M Arnason ◽  
P Barmby ◽  
N Vulic

ABSTRACT Identifying X-ray binary (XRB) candidates in nearby galaxies requires distinguishing them from possible contaminants including foreground stars and background active galactic nuclei. This work investigates the use of supervised machine learning algorithms to identify high-probability XRB candidates. Using a catalogue of 943 Chandra X-ray sources in the Andromeda galaxy, we trained and tested several classification algorithms using the X-ray properties of 163 sources with previously known types. Amongst the algorithms tested, we find that random forest classifiers give the best performance and work better in a binary classification (XRB/non-XRB) context compared to the use of multiple classes. Evaluating our method by comparing with classifications from visible-light and hard X-ray observations as part of the Panchromatic Hubble Andromeda Treasury, we find compatibility at the 90 per cent level, although we caution that the number of source in common is rather small. The estimated probability that an object is an XRB agrees well between the random forest binary and multiclass approaches and we find that the classifications with the highest confidence are in the XRB class. The most discriminating X-ray bands for classification are the 1.7–2.8, 0.5–1.0, 2.0–4.0, and 2.0–7.0 keV photon flux ratios. Of the 780 unclassified sources in the Andromeda catalogue, we identify 16 new high-probability XRB candidates and tabulate their properties for follow-up.

2019 ◽  
Vol 20 (S25) ◽  
Author(s):  
Hongjie Wu ◽  
Hongmei Huang ◽  
Weizhong Lu ◽  
Qiming Fu ◽  
Yijie Ding ◽  
...  

Abstract Background In ab initio protein-structure predictions, a large set of structural decoys are often generated, with the requirement to select best five or three candidates from the decoys. The clustered central structures with the most number of neighbors are frequently regarded as the near-native protein structures with the lowest free energy; however, limitations in clustering methods and three-dimensional structural-distance assessments make identifying exact order of the best five or three near-native candidate structures difficult. Results To address this issue, we propose a method that re-ranks the candidate structures via random forest classification using intra- and inter-cluster features from the results of the clustering. Comparative analysis indicated that our method was better able to identify the order of the candidate structures as comparing with current methods SPICKR, Calibur, and Durandal. The results confirmed that the identification of the first model were closer to the native structure in 12 of 43 cases versus four for SPICKER, and the same as the native structure in up to 27 of 43 cases versus 14 for Calibur and up to eight of 43 cases versus two for Durandal. Conclusions In this study, we presented an improved method based on random forest classification to transform the problem of re-ranking the candidate structures by an binary classification. Our results indicate that this method is a powerful method for the problem and the effect of this method is better than other methods.


2021 ◽  
Vol 2095 (1) ◽  
pp. 012058
Author(s):  
Xiaoyu Xian ◽  
Haichuan Tang ◽  
Yin Tian ◽  
Qi Liu ◽  
Yuming Fan

Abstract This paper addresses electric motor fault diagnosis using supervised machine learning classification. A total of 15 distinct fault types are classified and multilabel strategies are used to classify concurrent faults. we explored, developed, and compared the performance of different types of binary (fault/non-fault), multi-class (fault type) and multi-label (single fault versus combination fault) classifiers. To evaluate the effectiveness of fault identification and classification, we used different supervised machine learning methods, including Random forest classification, support vector machine and neural network classification. Through experiment, we compared these methods over 4 classification regimes and finally summarize the most suitable machine learning algorithms for different aspects of health diagnosis in traction motors area.


2018 ◽  
Vol 2018 ◽  
pp. 1-8 ◽  
Author(s):  
Na’eem Hoosen Agjee ◽  
Onisimo Mutanga ◽  
Kabir Peerbhay ◽  
Riyad Ismail

Hyperspectral datasets contain spectral noise, the presence of which adversely affects the classifier performance to generalize accurately. Despite machine learning algorithms being regarded as robust classifiers that generalize well under unfavourable noisy conditions, the extent of this is poorly understood. This study aimed to evaluate the influence of simulated spectral noise (10%, 20%, and 30%) on random forest (RF) and oblique random forest (oRF) classification performance using two node-splitting models (ridge regression (RR) and support vector machines (SVM)) to discriminate healthy and low infested water hyacinth plants. Results from this study showed that RF was slightly influenced by simulated noise with classification accuracies decreasing for week one and week two with the addition of 30% noise. In comparison to RF, oRF-RR and oRF-SVM yielded higher test accuracies (oRF-RR: 5.36%–7.15%; oRF-SVM: 3.58%–5.36%) and test kappa coefficients (oRF-RR: 10.72%–14.29%; oRF-SVM: 7.15%–10.72%). Notably, oRF-RR test accuracies and kappa coefficients remained consistent irrespective of simulated noise level for week one and week two while similar results were achieved for week three using oRF-SVM. Overall, this study has demonstrated that oRF-RR can be regarded a robust classification algorithm that is not influenced by noisy spectral conditions.


Sentiment analysis is deals with the classification of sentiments expressed in a particular document. The analysis of user generated data by using sentiment analysis is very useful for knowing the opinion of a crowd. This paper is mainly aimed to tackle the problem of polarity categorization of sentiment analysis. A Detailed description of the sentiment analysis process is also given. Product review data set from UCI repository is used for analysis. This paper is giving a comparative analysis of four supervised machine learning algorithms namely Naive Bayes, Support Vector Machine, Decision Tree and Random Forest which are used for product review analysis. The result shows that, Random Forest classification algorithm provides better accuracy than other three algorithms


2020 ◽  
Author(s):  
Milan Voršilák ◽  
Michal Kolář ◽  
Ivan Čmelo ◽  
Daniel Svozil

Abstract SYBA (SYnthetic Bayesian Accessibility) is a fragment based method for the rapid classification of organic compounds as easy- (ES) or hard-to-synthesize (HS). SYBA is based on the Bayesian analysis of the frequency of molecular fragments in the database of ES and HS molecules. It was trained on ES molecules available in the ZINC15 database and on HS molecules generated by the Nonpher methodology. SYBA was compared with a random forest, that was utilized as a baseline method, as well as with other two methods for synthetic accessibility assessment: SAScore and SCScore. When used with their suggested thresholds, SYBA improves over random forest classification, albeit marginally, and outperforms SAScore and SCScore. However, with thresholds optimized by the analysis of ROC curves, SAScore improves considerably and yields similar results as SYBA. Because SYBA is based merely on fragment contributions, it can be used for the analysis of the contribution of individual molecular parts to compound synthetic accessibility. Though SYBA was developed to quickly assess compound synthetic accessibility, its underlying Bayesian framework is a general approach that can be applied to any binary classification problem. Therefore, SYBA can be easily re-trained to classify compounds by other physico-chemical or biological properties. SYBA is publicly available at https://github.com/lich-uct/syba under the GNU General Public License.


2016 ◽  
Vol 146 ◽  
pp. 370-385 ◽  
Author(s):  
Adam Hedberg-Buenz ◽  
Mark A. Christopher ◽  
Carly J. Lewis ◽  
Kimberly A. Fernandes ◽  
Laura M. Dutca ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document