scholarly journals Photometric classification of Hyper Suprime-Cam transients using machine learning

2020 ◽  
Vol 72 (5) ◽  
Author(s):  
Ichiro Takahashi ◽  
Nao Suzuki ◽  
Naoki Yasuda ◽  
Akisato Kimura ◽  
Naonori Ueda ◽  
...  

Abstract The advancement of technology has resulted in a rapid increase in supernova (SN) discoveries. The Subaru/Hyper Suprime-Cam (HSC) transient survey, conducted from fall 2016 through spring 2017, yielded 1824 SN candidates. This gave rise to the need for fast type classification for spectroscopic follow-up and prompted us to develop a machine learning algorithm using a deep neural network with highway layers. This algorithm is trained by actual observed cadence and filter combinations such that we can directly input the observed data array without any interpretation. We tested our model with a dataset from the LSST classification challenge (Deep Drilling Field). Our classifier scores an area under the curve (AUC) of 0.996 for binary classification (SN Ia or non-SN Ia) and 95.3% accuracy for three-class classification (SN Ia, SN Ibc, or SN II). Application of our binary classification to HSC transient data yields an AUC score of 0.925. With two weeks of HSC data since the first detection, this classifier achieves 78.1% accuracy for binary classification, and the accuracy increases to 84.2% with the full dataset. This paper discusses the potential use of machine learning for SN type classification purposes.

2021 ◽  
Vol 18 (1) ◽  
pp. 3-8 ◽  
Author(s):  
Malik Yousef ◽  
Louise C. Showe ◽  
Izhar Ben Shlomo

Abstract COVID-19 pandemic has flooded all triage stations, making it difficult to carefully select those most likely infected. Data on total patients tested, infected, and hospitalized is fragmentary making it difficult to easily select those most likely to be infected. The Israeli Ministry of Health made public its registry of immediate clinical data and the respective status of infected/not infected for all viral DNA tests performed up to Apr. 18th, 2020 including almost 120,000 tests. We used a machine-learning algorithm to find out which immediate clinical elements mattered the most in identifying the true status of the tested persons including age or gender matter, to enable future better allocation of surveillance policy for those belonging to high-risk groups. In addition to the analyses applied on the first batch of the available data (Apr. 11th), we further tested the algorithm on the independent second batch (Apr. 12th to 18th). Fever, cough and headache were the most diagnostic, differing in degree of importance in different subgroups. Higher percentage of men were found positive (9.3 vs. 7.3%), but gender did not matter for the clinical presentation. The prediction power of the model was high, with accuracy of 0.84 and area under the curve 0.92. We provide a hand-held short checklist with verbal description of importance for the leading symptoms, which should expedite the triage and enable proper selection of people for further follow-up.


2021 ◽  
Vol 15 (1) ◽  
pp. 26-43
Author(s):  
Sikha Bagui ◽  
Keenal M. Shah ◽  
Yizhi Hu ◽  
Subhash Bagui

This study proposes a model for building intrusion detection systems. The dataset used, CICIDS 2017, contains 14 different attacks with 85 features for each attack. This high dimensionality of the data is a major challenge when building efficient intrusion detection systems, especially in today's big data environment, since a lot of the features are redundant. The main goal in this paper was to reduce the number of features and present a detailed discussion of the important features. For feature selection, information gain was used in an iterative way, and for classification, a machine learning algorithm, the J48 decision tree algorithm, was used. The important features for the classification of each attack were identified, and the features that were important for classifying multiple attacks were also identified and discussed.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Gabriel A. Colozza-Gama ◽  
Fabiano Callegari ◽  
Nikola Bešič ◽  
Ana C. de J. Paviza ◽  
Janete M. Cerutti

AbstractSomatic mutations in cancer driver genes can help diagnosis, prognosis and treatment decisions. Formalin-fixed paraffin-embedded (FFPE) specimen is the main source of DNA for somatic mutation detection. To overcome constraints of DNA isolated from FFPE, we compared pyrosequencing and ddPCR analysis for absolute quantification of BRAF V600E mutation in the DNA extracted from FFPE specimens and compared the results to the qualitative detection information obtained by Sanger Sequencing. Sanger sequencing was able to detect BRAF V600E mutation only when it was present in more than 15% total alleles. Although the sensitivity of ddPCR is higher than that observed for Sanger, it was less consistent than pyrosequencing, likely due to droplet classification bias of FFPE-derived DNA. To address the droplet allocation bias in ddPCR analysis, we have compared different algorithms for automated droplet classification and next correlated these findings with those obtained from pyrosequencing. By examining the addition of non-classifiable droplets (rain) in ddPCR, it was possible to obtain better qualitative classification of droplets and better quantitative classification compared to no rain droplets, when considering pyrosequencing results. Notable, only the Machine learning k-NN algorithm was able to automatically classify the samples, surpassing manual classification based on no-template controls, which shows promise in clinical practice.


2021 ◽  
Vol 10 (5) ◽  
pp. 992
Author(s):  
Martina Barchitta ◽  
Andrea Maugeri ◽  
Giuliana Favara ◽  
Paolo Marco Riela ◽  
Giovanni Gallo ◽  
...  

Patients in intensive care units (ICUs) were at higher risk of worsen prognosis and mortality. Here, we aimed to evaluate the ability of the Simplified Acute Physiology Score (SAPS II) to predict the risk of 7-day mortality, and to test a machine learning algorithm which combines the SAPS II with additional patients’ characteristics at ICU admission. We used data from the “Italian Nosocomial Infections Surveillance in Intensive Care Units” network. Support Vector Machines (SVM) algorithm was used to classify 3782 patients according to sex, patient’s origin, type of ICU admission, non-surgical treatment for acute coronary disease, surgical intervention, SAPS II, presence of invasive devices, trauma, impaired immunity, antibiotic therapy and onset of HAI. The accuracy of SAPS II for predicting patients who died from those who did not was 69.3%, with an Area Under the Curve (AUC) of 0.678. Using the SVM algorithm, instead, we achieved an accuracy of 83.5% and AUC of 0.896. Notably, SAPS II was the variable that weighted more on the model and its removal resulted in an AUC of 0.653 and an accuracy of 68.4%. Overall, these findings suggest the present SVM model as a useful tool to early predict patients at higher risk of death at ICU admission.


2021 ◽  
Vol 11 (3) ◽  
pp. 92
Author(s):  
Mehdi Berriri ◽  
Sofiane Djema ◽  
Gaëtan Rey ◽  
Christel Dartigues-Pallez

Today, many students are moving towards higher education courses that do not suit them and end up failing. The purpose of this study is to help provide counselors with better knowledge so that they can offer future students courses corresponding to their profile. The second objective is to allow the teaching staff to propose training courses adapted to students by anticipating their possible difficulties. This is possible thanks to a machine learning algorithm called Random Forest, allowing for the classification of the students depending on their results. We had to process data, generate models using our algorithm, and cross the results obtained to have a better final prediction. We tested our method on different use cases, from two classes to five classes. These sets of classes represent the different intervals with an average ranging from 0 to 20. Thus, an accuracy of 75% was achieved with a set of five classes and up to 85% for sets of two and three classes.


2021 ◽  
Vol 11 (9) ◽  
pp. 3836
Author(s):  
Valeri Gitis ◽  
Alexander Derendyaev ◽  
Konstantin Petrov ◽  
Eugene Yurkov ◽  
Sergey Pirogov ◽  
...  

Prostate cancer is the second most frequent malignancy (after lung cancer). Preoperative staging of PCa is the basis for the selection of adequate treatment tactics. In particular, an urgent problem is the classification of indolent and aggressive forms of PCa in patients with the initial stages of the tumor process. To solve this problem, we propose to use a new binary classification machine-learning method. The proposed method of monotonic functions uses a model in which the disease’s form is determined by the severity of the patient’s condition. It is assumed that the patient’s condition is the easier, the less the deviation of the indicators from the normal values inherent in healthy people. This assumption means that the severity (form) of the disease can be represented by monotonic functions from the values of the deviation of the patient’s indicators beyond the normal range. The method is used to solve the problem of classifying patients with indolent and aggressive forms of prostate cancer according to pretreatment data. The learning algorithm is nonparametric. At the same time, it allows an explanation of the classification results in the form of a logical function. To do this, you should indicate to the algorithm either the threshold value of the probability of successful classification of patients with an indolent form of PCa, or the threshold value of the probability of misclassification of patients with an aggressive form of PCa disease. The examples of logical rules given in the article show that they are quite simple and can be easily interpreted in terms of preoperative indicators of the form of the disease.


Hypertension ◽  
2021 ◽  
Vol 78 (5) ◽  
pp. 1595-1604
Author(s):  
Fabrizio Buffolo ◽  
Jacopo Burrello ◽  
Alessio Burrello ◽  
Daniel Heinrich ◽  
Christian Adolf ◽  
...  

Primary aldosteronism (PA) is the cause of arterial hypertension in 4% to 6% of patients, and 30% of patients with PA are affected by unilateral and surgically curable forms. Current guidelines recommend screening for PA ≈50% of patients with hypertension on the basis of individual factors, while some experts suggest screening all patients with hypertension. To define the risk of PA and tailor the diagnostic workup to the individual risk of each patient, we developed a conventional scoring system and supervised machine learning algorithms using a retrospective cohort of 4059 patients with hypertension. On the basis of 6 widely available parameters, we developed a numerical score and 308 machine learning-based models, selecting the one with the highest diagnostic performance. After validation, we obtained high predictive performance with our score (optimized sensitivity of 90.7% for PA and 92.3% for unilateral PA [UPA]). The machine learning-based model provided the highest performance, with an area under the curve of 0.834 for PA and 0.905 for diagnosis of UPA, with optimized sensitivity of 96.6% for PA, and 100.0% for UPA, at validation. The application of the predicting tools allowed the identification of a subgroup of patients with very low risk of PA (0.6% for both models) and null probability of having UPA. In conclusion, this score and the machine learning algorithm can accurately predict the individual pretest probability of PA in patients with hypertension and circumvent screening in up to 32.7% of patients using a machine learning-based model, without omitting patients with surgically curable UPA.


Sign in / Sign up

Export Citation Format

Share Document