Photometric classification of Hyper Suprime-Cam transients using machine learning

Ichiro Takahashi; Nao Suzuki; Naoki Yasuda; Akisato Kimura; Naonori Ueda; Masaomi Tanaka; Nozomu Tominaga; Naoki Yoshida

doi:10.1093/pasj/psaa082

Photometric classification of Hyper Suprime-Cam transients using machine learning

Publications of the Astronomical Society of Japan ◽

10.1093/pasj/psaa082 ◽

2020 ◽

Vol 72 (5) ◽

Author(s):

Ichiro Takahashi ◽

Nao Suzuki ◽

Naoki Yasuda ◽

Akisato Kimura ◽

Naonori Ueda ◽

...

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

Binary Classification ◽

Area Under The Curve ◽

Full Dataset ◽

Transient Data ◽

Type Classification ◽

Potential Use

Abstract The advancement of technology has resulted in a rapid increase in supernova (SN) discoveries. The Subaru/Hyper Suprime-Cam (HSC) transient survey, conducted from fall 2016 through spring 2017, yielded 1824 SN candidates. This gave rise to the need for fast type classification for spectroscopic follow-up and prompted us to develop a machine learning algorithm using a deep neural network with highway layers. This algorithm is trained by actual observed cadence and filter combinations such that we can directly input the observed data array without any interpretation. We tested our model with a dataset from the LSST classification challenge (Deep Drilling Field). Our classifier scores an area under the curve (AUC) of 0.996 for binary classification (SN Ia or non-SN Ia) and 95.3% accuracy for three-class classification (SN Ia, SN Ibc, or SN II). Application of our binary classification to HSC transient data yields an AUC score of 0.925. With two weeks of HSC data since the first detection, this classifier achieves 78.1% accuracy for binary classification, and the accuracy increases to 84.2% with the full dataset. This paper discusses the potential use of machine learning for SN type classification purposes.

Download Full-text

Likelihood contrasts: a machine learning algorithm for binary classification of longitudinal data

Scientific Reports ◽

10.1038/s41598-020-57924-9 ◽

2020 ◽

Vol 10 (1) ◽

Author(s):

Riku Klén ◽

Markku Karhunen ◽

Laura L. Elo

Keyword(s):

Machine Learning ◽

Longitudinal Data ◽

Learning Algorithm ◽

Binary Classification ◽

Machine Learning Algorithm

Download Full-text

Clinical presentation of COVID-19 – a model derived by a machine learning algorithm

Journal of Integrative Bioinformatics ◽

10.1515/jib-2020-0050 ◽

2021 ◽

Vol 18 (1) ◽

pp. 3-8 ◽

Cited By ~ 1

Author(s):

Malik Yousef ◽

Louise C. Showe ◽

Izhar Ben Shlomo

Keyword(s):

Machine Learning ◽

Clinical Presentation ◽

Learning Algorithm ◽

Area Under The Curve ◽

Risk Groups ◽

Verbal Description ◽

Machine Learning Algorithm ◽

Proper Selection ◽

Selection Of

Abstract COVID-19 pandemic has flooded all triage stations, making it difficult to carefully select those most likely infected. Data on total patients tested, infected, and hospitalized is fragmentary making it difficult to easily select those most likely to be infected. The Israeli Ministry of Health made public its registry of immediate clinical data and the respective status of infected/not infected for all viral DNA tests performed up to Apr. 18th, 2020 including almost 120,000 tests. We used a machine-learning algorithm to find out which immediate clinical elements mattered the most in identifying the true status of the tested persons including age or gender matter, to enable future better allocation of surveillance policy for those belonging to high-risk groups. In addition to the analyses applied on the first batch of the available data (Apr. 11th), we further tested the algorithm on the independent second batch (Apr. 12th to 18th). Fever, cough and headache were the most diagnostic, differing in degree of importance in different subgroups. Higher percentage of men were found positive (9.3 vs. 7.3%), but gender did not matter for the clinical presentation. The prediction power of the model was high, with accuracy of 0.84 and area under the curve 0.92. We provide a hand-held short checklist with verbal description of importance for the leading symptoms, which should expedite the triage and enable proper selection of people for further follow-up.

Download Full-text

Binary Classification of Network-Generated Flow Data Using a Machine Learning Algorithm

International Journal of Information Security and Privacy ◽

10.4018/ijisp.2021010102 ◽

2021 ◽

Vol 15 (1) ◽

pp. 26-43

Author(s):

Sikha Bagui ◽

Keenal M. Shah ◽

Yizhi Hu ◽

Subhash Bagui

Keyword(s):

Machine Learning ◽

Intrusion Detection ◽

Information Gain ◽

Learning Algorithm ◽

Binary Classification ◽

Intrusion Detection Systems ◽

Machine Learning Algorithm ◽

Detection Systems ◽

Data Environment

This study proposes a model for building intrusion detection systems. The dataset used, CICIDS 2017, contains 14 different attacks with 85 features for each attack. This high dimensionality of the data is a major challenge when building efficient intrusion detection systems, especially in today's big data environment, since a lot of the features are redundant. The main goal in this paper was to reduce the number of features and present a detailed discussion of the important features. For feature selection, information gain was used in an iterative way, and for classification, a machine learning algorithm, the J48 decision tree algorithm, was used. The important features for the classification of each attack were identified, and the features that were important for classifying multiple attacks were also identified and discussed.

Download Full-text

Machine learning algorithm improved automated droplet classification of ddPCR for detection of BRAF V600E in paraffin-embedded samples

Scientific Reports ◽

10.1038/s41598-021-92014-4 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Gabriel A. Colozza-Gama ◽

Fabiano Callegari ◽

Nikola Bešič ◽

Ana C. de J. Paviza ◽

Janete M. Cerutti

Keyword(s):

Machine Learning ◽

Sanger Sequencing ◽

Learning Algorithm ◽

Absolute Quantification ◽

Braf V600e Mutation ◽

Braf V600e ◽

Driver Genes ◽

Quantitative Classification ◽

Cancer Driver

AbstractSomatic mutations in cancer driver genes can help diagnosis, prognosis and treatment decisions. Formalin-fixed paraffin-embedded (FFPE) specimen is the main source of DNA for somatic mutation detection. To overcome constraints of DNA isolated from FFPE, we compared pyrosequencing and ddPCR analysis for absolute quantification of BRAF V600E mutation in the DNA extracted from FFPE specimens and compared the results to the qualitative detection information obtained by Sanger Sequencing. Sanger sequencing was able to detect BRAF V600E mutation only when it was present in more than 15% total alleles. Although the sensitivity of ddPCR is higher than that observed for Sanger, it was less consistent than pyrosequencing, likely due to droplet classification bias of FFPE-derived DNA. To address the droplet allocation bias in ddPCR analysis, we have compared different algorithms for automated droplet classification and next correlated these findings with those obtained from pyrosequencing. By examining the addition of non-classifiable droplets (rain) in ddPCR, it was possible to obtain better qualitative classification of droplets and better quantitative classification compared to no rain droplets, when considering pyrosequencing results. Notable, only the Machine learning k-NN algorithm was able to automatically classify the samples, surpassing manual classification based on no-template controls, which shows promise in clinical practice.

Download Full-text

Early Prediction of Seven-Day Mortality in Intensive Care Unit Using a Machine Learning Model: Results from the SPIN-UTI Project

Journal of Clinical Medicine ◽

10.3390/jcm10050992 ◽

2021 ◽

Vol 10 (5) ◽

pp. 992

Author(s):

Martina Barchitta ◽

Andrea Maugeri ◽

Giuliana Favara ◽

Paolo Marco Riela ◽

Giovanni Gallo ◽

...

Keyword(s):

Machine Learning ◽

Intensive Care ◽

Intensive Care Units ◽

Learning Algorithm ◽

Area Under The Curve ◽

Support Vector ◽

Icu Admission ◽

Risk Of Death ◽

Saps Ii ◽

Svm Algorithm

Patients in intensive care units (ICUs) were at higher risk of worsen prognosis and mortality. Here, we aimed to evaluate the ability of the Simplified Acute Physiology Score (SAPS II) to predict the risk of 7-day mortality, and to test a machine learning algorithm which combines the SAPS II with additional patients’ characteristics at ICU admission. We used data from the “Italian Nosocomial Infections Surveillance in Intensive Care Units” network. Support Vector Machines (SVM) algorithm was used to classify 3782 patients according to sex, patient’s origin, type of ICU admission, non-surgical treatment for acute coronary disease, surgical intervention, SAPS II, presence of invasive devices, trauma, impaired immunity, antibiotic therapy and onset of HAI. The accuracy of SAPS II for predicting patients who died from those who did not was 69.3%, with an Area Under the Curve (AUC) of 0.678. Using the SVM algorithm, instead, we achieved an accuracy of 83.5% and AUC of 0.896. Notably, SAPS II was the variable that weighted more on the model and its removal resulted in an AUC of 0.653 and an accuracy of 68.4%. Overall, these findings suggest the present SVM model as a useful tool to early predict patients at higher risk of death at ICU admission.

Download Full-text

Multi-Class Assessment Based on Random Forests

Education Sciences ◽

10.3390/educsci11030092 ◽

2021 ◽

Vol 11 (3) ◽

pp. 92

Author(s):

Mehdi Berriri ◽

Sofiane Djema ◽

Gaëtan Rey ◽

Christel Dartigues-Pallez

Keyword(s):

Higher Education ◽

Machine Learning ◽

Random Forests ◽

Learning Algorithm ◽

Teaching Staff ◽

Machine Learning Algorithm ◽

Process Data ◽

Training Courses ◽

Education Courses

Today, many students are moving towards higher education courses that do not suit them and end up failing. The purpose of this study is to help provide counselors with better knowledge so that they can offer future students courses corresponding to their profile. The second objective is to allow the teaching staff to propose training courses adapted to students by anticipating their possible difficulties. This is possible thanks to a machine learning algorithm called Random Forest, allowing for the classification of the students depending on their results. We had to process data, generate models using our algorithm, and cross the results obtained to have a better final prediction. We tested our method on different use cases, from two classes to five classes. These sets of classes represent the different intervals with an average ranging from 0 to 20. Thus, an accuracy of 75% was achieved with a set of five classes and up to 85% for sets of two and three classes.

Download Full-text

Binary Classification Model Based on Machine Learning Algorithm for the Short-Circuit Detection in Power System

Proceedings of the 2019 2nd International Conference on Algorithms, Computing and Artificial Intelligence ◽

10.1145/3377713.3377753 ◽

2019 ◽

Author(s):

Qiwei Lu ◽

Jinpei Cheng ◽

Dianlin Guo ◽

Mengmeng Su ◽

Xuewei Wu ◽

...

Keyword(s):

Machine Learning ◽

Power System ◽

Learning Algorithm ◽

Binary Classification ◽

Short Circuit ◽

Classification Model ◽

Machine Learning Algorithm ◽

Model Based

Download Full-text

Monotonic Functions Method and Its Application to Staging of Patients with Prostate Cancer According to Pretreatment Data

Applied Sciences ◽

10.3390/app11093836 ◽

2021 ◽

Vol 11 (9) ◽

pp. 3836

Author(s):

Valeri Gitis ◽

Alexander Derendyaev ◽

Konstantin Petrov ◽

Eugene Yurkov ◽

Sergey Pirogov ◽

...

Keyword(s):

Prostate Cancer ◽

Learning Algorithm ◽

Binary Classification ◽

Preoperative Staging ◽

Threshold Value ◽

Logical Function ◽

Adequate Treatment ◽

Monotonic Functions ◽

Selection Of

Prostate cancer is the second most frequent malignancy (after lung cancer). Preoperative staging of PCa is the basis for the selection of adequate treatment tactics. In particular, an urgent problem is the classification of indolent and aggressive forms of PCa in patients with the initial stages of the tumor process. To solve this problem, we propose to use a new binary classification machine-learning method. The proposed method of monotonic functions uses a model in which the disease’s form is determined by the severity of the patient’s condition. It is assumed that the patient’s condition is the easier, the less the deviation of the indicators from the normal values inherent in healthy people. This assumption means that the severity (form) of the disease can be represented by monotonic functions from the values of the deviation of the patient’s indicators beyond the normal range. The method is used to solve the problem of classifying patients with indolent and aggressive forms of prostate cancer according to pretreatment data. The learning algorithm is nonparametric. At the same time, it allows an explanation of the classification results in the form of a logical function. To do this, you should indicate to the algorithm either the threshold value of the probability of successful classification of patients with an indolent form of PCa, or the threshold value of the probability of misclassification of patients with an aggressive form of PCa disease. The examples of logical rules given in the article show that they are quite simple and can be easily interpreted in terms of preoperative indicators of the form of the disease.

Download Full-text

Automatic Classification of Sub-Techniques in Classical Cross-Country Skiing Using a Machine Learning Algorithm on Micro-Sensor Data

Sensors ◽

10.3390/s18010075 ◽

2017 ◽

Vol 18 (2) ◽

pp. 75 ◽

Cited By ~ 10

Author(s):

Ole Rindal ◽

Trine Seeberg ◽

Johannes Tjønnås ◽

Pål Haugnes ◽

Øyvind Sandbakk

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

Automatic Classification ◽

Sensor Data ◽

Machine Learning Algorithm ◽

Micro Sensor ◽

Cross Country Skiing ◽

Cross Country ◽

Classical Cross

Download Full-text

Clinical Score and Machine Learning-Based Model to Predict Diagnosis of Primary Aldosteronism in Arterial Hypertension

Hypertension ◽

10.1161/hypertensionaha.121.17444 ◽

2021 ◽

Vol 78 (5) ◽

pp. 1595-1604

Author(s):

Fabrizio Buffolo ◽

Jacopo Burrello ◽

Alessio Burrello ◽

Daniel Heinrich ◽

Christian Adolf ◽

...

Keyword(s):

Machine Learning ◽

Arterial Hypertension ◽

Primary Aldosteronism ◽

Learning Algorithm ◽

Area Under The Curve ◽

Clinical Score ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Individual Risk ◽

The Individual

Primary aldosteronism (PA) is the cause of arterial hypertension in 4% to 6% of patients, and 30% of patients with PA are affected by unilateral and surgically curable forms. Current guidelines recommend screening for PA ≈50% of patients with hypertension on the basis of individual factors, while some experts suggest screening all patients with hypertension. To define the risk of PA and tailor the diagnostic workup to the individual risk of each patient, we developed a conventional scoring system and supervised machine learning algorithms using a retrospective cohort of 4059 patients with hypertension. On the basis of 6 widely available parameters, we developed a numerical score and 308 machine learning-based models, selecting the one with the highest diagnostic performance. After validation, we obtained high predictive performance with our score (optimized sensitivity of 90.7% for PA and 92.3% for unilateral PA [UPA]). The machine learning-based model provided the highest performance, with an area under the curve of 0.834 for PA and 0.905 for diagnosis of UPA, with optimized sensitivity of 96.6% for PA, and 100.0% for UPA, at validation. The application of the predicting tools allowed the identification of a subgroup of patients with very low risk of PA (0.6% for both models) and null probability of having UPA. In conclusion, this score and the machine learning algorithm can accurately predict the individual pretest probability of PA in patients with hypertension and circumvent screening in up to 32.7% of patients using a machine learning-based model, without omitting patients with surgically curable UPA.

Download Full-text