scholarly journals DAuGAN: An Approach for Augmenting Time Series Imbalanced Datasets via Latent Space Sampling Using Adversarial Techniques

2021 ◽  
Vol 2021 ◽  
pp. 1-13
Author(s):  
Andrei Bratu ◽  
Gabriela Czibula

Data augmentation is a commonly used technique in data science for improving the robustness and performance of machine learning models. The purpose of the paper is to study the feasibility of generating synthetic data points of temporal nature towards this end. A general approach named DAuGAN (Data Augmentation using Generative Adversarial Networks) is presented for identifying poorly represented sections of a time series, studying the synthesis and integration of new data points, and performance improvement on a benchmark machine learning model. The problem is studied and applied in the domain of algorithmic trading, whose constraints are presented and taken into consideration. The experimental results highlight an improvement in performance on a benchmark reinforcement learning agent trained on a dataset enhanced with DAuGAN to trade a financial instrument.

Author(s):  
Md. Mehedi Hasan Shawon ◽  
Sumaiya Akter ◽  
Md. Kamrul Islam ◽  
Sabbir Ahmed ◽  
Md. Mosaddequr Rahman

Sensors ◽  
2021 ◽  
Vol 21 (11) ◽  
pp. 3726
Author(s):  
Ivan Vaccari ◽  
Vanessa Orani ◽  
Alessia Paglialonga ◽  
Enrico Cambiaso ◽  
Maurizio Mongelli

The application of machine learning and artificial intelligence techniques in the medical world is growing, with a range of purposes: from the identification and prediction of possible diseases to patient monitoring and clinical decision support systems. Furthermore, the widespread use of remote monitoring medical devices, under the umbrella of the “Internet of Medical Things” (IoMT), has simplified the retrieval of patient information as they allow continuous monitoring and direct access to data by healthcare providers. However, due to possible issues in real-world settings, such as loss of connectivity, irregular use, misuse, or poor adherence to a monitoring program, the data collected might not be sufficient to implement accurate algorithms. For this reason, data augmentation techniques can be used to create synthetic datasets sufficiently large to train machine learning models. In this work, we apply the concept of generative adversarial networks (GANs) to perform a data augmentation from patient data obtained through IoMT sensors for Chronic Obstructive Pulmonary Disease (COPD) monitoring. We also apply an explainable AI algorithm to demonstrate the accuracy of the synthetic data by comparing it to the real data recorded by the sensors. The results obtained demonstrate how synthetic datasets created through a well-structured GAN are comparable with a real dataset, as validated by a novel approach based on machine learning.


2021 ◽  
pp. 190-200
Author(s):  
Lesia Mochurad ◽  
Yaroslav Hladun

The paper considers the method for analysis of a psychophysical state of a person on psychomotor indicators – finger tapping test. The app for mobile phone that generalizes the classic tapping test is developed for experiments. Developed tool allows collecting samples and analyzing them like individual experiments and like dataset as a whole. The data based on statistical methods and optimization of hyperparameters is investigated for anomalies, and an algorithm for reducing their number is developed. The machine learning model is used to predict different features of the dataset. These experiments demonstrate the data structure obtained using finger tapping test. As a result, we gained knowledge of how to conduct experiments for better generalization of the model in future. A method for removing anomalies is developed and it can be used in further research to increase an accuracy of the model. Developed model is a multilayer recurrent neural network that works well with the classification of time series. Error of model learning on a synthetic dataset is 1.5% and on a real data from similar distribution is 5%.


2017 ◽  
Author(s):  
Aymen A. Elfiky ◽  
Maximilian J. Pany ◽  
Ravi B. Parikh ◽  
Ziad Obermeyer

ABSTRACTBackgroundCancer patients who die soon after starting chemotherapy incur costs of treatment without benefits. Accurately predicting mortality risk from chemotherapy is important, but few patient data-driven tools exist. We sought to create and validate a machine learning model predicting mortality for patients starting new chemotherapy.MethodsWe obtained electronic health records for patients treated at a large cancer center (26,946 patients; 51,774 new regimens) over 2004-14, linked to Social Security data for date of death. The model was derived using 2004-11 data, and performance measured on non-overlapping 2012-14 data.Findings30-day mortality from chemotherapy start was 2.1%. Common cancers included breast (21.1%), colorectal (19.3%), and lung (18.0%). Model predictions were accurate for all patients (AUC 0.94). Predictions for patients starting palliative chemotherapy (46.6% of regimens), for whom prognosis is particularly important, remained highly accurate (AUC 0.92). To illustrate model discrimination, we ranked patients initiating palliative chemotherapy by model-predicted mortality risk, and calculated observed mortality by risk decile. 30-day mortality in the highest-risk decile was 22.6%; in the lowest-risk decile, no patients died. Predictions remained accurate across all primary cancers, stages, and chemotherapies—even for clinical trial regimens that first appeared in years after the model was trained (AUC 0.94). The model also performed well for prediction of 180-day mortality (AUC 0.87; mortality 74.8% in the highest risk decile vs. 0.2% in the lowest). Predictions were more accurate than data from randomized trials of individual chemotherapies, or SEER estimates.InterpretationA machine learning algorithm accurately predicted short-term mortality in patients starting chemotherapy using EHR data. Further research is necessary to determine generalizability and the feasibility of applying this algorithm in clinical settings.


2020 ◽  
Vol 8 (7_suppl6) ◽  
pp. 2325967120S0036
Author(s):  
Audrey Wright ◽  
Jaret Karnuta ◽  
Bryan Luu ◽  
Heather Haeberle ◽  
Eric Makhni ◽  
...  

Objectives: With the accumulation of big data surrounding National Hockey League (NHL) and the advent of advanced computational processors, machine learning (ML) is ideally suited to develop a predictive algorithm capable of imbibing historical data to accurately project a future player’s availability to play based on prior injury and performance. To the end of leveraging available analytics to permit data-driven injury prevention strategies and informed decisions for NHL franchises beyond static logistic regression (LR) analysis, the objective of this study of NHL players was to (1) characterize the epidemiology of publicly reported NHL injuries from 2007-17, (2) determine the validity of a machine learning model in predicting next season injury risk for both goalies and non-goalies, and (3) compare the performance of modern ML algorithms versus LR analyses. Methods: Hockey player data was compiled for the years 2007 to 2017 from two publicly reported databases in the absence of an official NHL-approved database. Attributes acquired from each NHL player from each professional year included: age, 85 player metrics, and injury history. A total of 5 ML algorithms were created for both non-goalie and goalie data; Random Forest, K-Nearest Neighbors, Naive Bayes, XGBoost, and Top 3 Ensemble. Logistic regression was also performed for both non-goalie and goalie data. Area under the receiver operating characteristics curve (AUC) primarily determined validation. Results: Player data was generated from 2,109 non-goalies and 213 goalies with an average follow-up of 4.5 years. The results are shown below in Table 1.For models predicting following season injury risk for non-goalies, XGBoost performed the best with an AUC of 0.948, compared to an AUC of 0.937 for logistic regression. For models predicting following season injury risk for goalies, XGBoost had the highest AUC with 0.956, compared to an AUC of 0.947 for LR. Conclusion: Advanced ML models such as XGBoost outperformed LR and demonstrated good to excellent capability of predicting whether a publicly reportable injury is likely to occur the next season. As more player-specific data become available, algorithm refinement may be possible to strengthen predictive insights and allow ML to offer quantitative risk management for franchises, present opportunity for targeted preventative intervention by medical personnel, and replace regression analysis as the new gold standard for predictive modeling. [Figure: see text]


Author(s):  
Akshata Kulkarni

Abstract: Officials around the world are using several COVID-19 outbreak prediction models to make educated decisions and enact necessary control measures. In this study, we developed a Machine Learning model which predicts and forecasts the COVID-19 outbreak in India, with the goal of determining the best regression model for an in-depth examination of the novel coronavirus. Based on data available from January 31 to October 31, 2020, collected from Kaggle, this model predicts the number of confirmed cases in Maharashtra. We're using a Machine Learning model to foresee the future trend of these situations. The project has the potential to demonstrate the importance of information dissemination in improving response time and planning ahead of time to help reduce risk.


2018 ◽  
Author(s):  
Elijah Bogart ◽  
Richard Creswell ◽  
Georg K. Gerber

AbstractLongitudinal studies are crucial for discovering casual relationships between the microbiome and human disease. We present Microbiome Interpretable Temporal Rule Engine (MITRE), the first machine learning method specifically designed for predicting host status from microbiome time-series data. Our method maintains interpretability by learning predictive rules over automatically inferred time-periods and phylogenetically related microbes. We validate MITRE’s performance on semi-synthetic data, and five real datasets measuring microbiome composition over time in infant and adult cohorts. Our results demonstrate that MITRE performs on par or outperforms “black box” machine learning approaches, providing a powerful new tool enabling discovery of biologically interpretable relationships between microbiome and human host.


Author(s):  
Md Golam Moula Mehedi Hasan ◽  
Douglas A. Talbert

Counterfactual explanations are gaining in popularity as a way of explaining machine learning models. Counterfactual examples are generally created to help interpret the decision of a model. In this case, if a model makes a certain decision for an instance, the counterfactual examples of that instance reverse the decision of the model. The counterfactual examples can be created by craftily changing particular feature values of the instance. Though counterfactual examples are generated to explain the decision of machine learning models, in this work, we explore another potential application area of counterfactual examples, whether counterfactual examples are useful for data augmentation. We demonstrate the efficacy of this approach on the widely used “Adult-Income” dataset. We consider several scenarios where we do not have enough data and use counterfactual examples to augment the dataset. We compare our approach with Generative Adversarial Networks approach for dataset augmentation. The experimental results show that our proposed approach can be an effective way to augment a dataset.


Sign in / Sign up

Export Citation Format

Share Document