Towards Generative Design of Computationally Efficient Mathematical Models with Evolutionary Learning

Anna V. Kalyuzhnaya; Nikolay O. Nikitin; Alexander Hvatov; Mikhail Maslyaev; Mikhail Yachmenkov; Alexander Boukhanovsky

doi:10.3390/e23010028

Towards Generative Design of Computationally Efficient Mathematical Models with Evolutionary Learning

Entropy ◽

10.3390/e23010028 ◽

2020 ◽

Vol 23 (1) ◽

pp. 28

Author(s):

Anna V. Kalyuzhnaya ◽

Nikolay O. Nikitin ◽

Alexander Hvatov ◽

Mikhail Maslyaev ◽

Mikhail Yachmenkov ◽

...

Keyword(s):

Mathematical Models ◽

Learning Approach ◽

Model Structure ◽

Evolutionary Learning ◽

Learning Models ◽

Computationally Efficient ◽

Performance Models ◽

Generative Design ◽

Computational Resources ◽

Machine Learning Models

In this paper, we describe the concept of generative design approach applied to the automated evolutionary learning of mathematical models in a computationally efficient way. To formalize the problems of models’ design and co-design, the generalized formulation of the modeling workflow is proposed. A parallelized evolutionary learning approach for the identification of model structure is described for the equation-based model and composite machine learning models. Moreover, the involvement of the performance models in the design process is analyzed. A set of experiments with various models and computational resources is conducted to verify different aspects of the proposed approach.

Download Full-text

A Deep Learning Approach for Detection of Application Layer Attacks in Internet

Handling Priority Inversion in Time-Constrained Distributed Databases - Advances in Data Mining and Database Management ◽

10.4018/978-1-7998-2491-6.ch010 ◽

2020 ◽

pp. 175-188

Author(s):

V. Punitha ◽

C. Mala

Keyword(s):

Deep Learning ◽

Transport Layer ◽

Classification Model ◽

Learning Approach ◽

Ddos Attacks ◽

Application Layer ◽

Learning Models ◽

Technological Transformation ◽

Application Deployment ◽

Machine Learning Models

The recent technological transformation in application deployment, with the enriched availability of applications, induces the attackers to shift the target of the attack to the services provided by the application layer. Application layer DoS or DDoS attacks are launched only after establishing the connection to the server. They are stealthier than network or transport layer attacks. The existing defence mechanisms are unproductive in detecting application layer DoS or DDoS attacks. Hence, this chapter proposes a novel deep learning classification model using an autoencoder to detect application layer DDoS attacks by measuring the deviations in the incoming network traffic. The experimental results show that the proposed deep autoencoder model detects application layer attacks in HTTP traffic more proficiently than existing machine learning models.

Download Full-text

Telugu News Data Classification Using Machine Learning Approach

10.4018/978-1-7998-7685-4.ch014 ◽

2022 ◽

pp. 181-194

Author(s):

Bala Krishna Priya G. ◽

Jabeen Sultana ◽

Usha Rani M.

Keyword(s):

Machine Learning ◽

Social Media ◽

Research Work ◽

Learning Approach ◽

Fake News ◽

Learning Models ◽

Machine Learning Classifiers ◽

Proposed Model ◽

Machine Learning Approach ◽

Machine Learning Models

Mining Telugu news data and categorizing based on public sentiments is quite important since a lot of fake news emerged with rise of social media. Identifying whether news text is positive, negative, or neutral and later classifying the data in which areas they fall like business, editorial, entertainment, nation, and sports is included throughout this research work. This research work proposes an efficient model by adopting machine learning classifiers to perform classification on Telugu news data. The results obtained by various machine-learning models are compared, and an efficient model is found, and it is observed that the proposed model outperformed with reference to accuracy, precision, recall, and F1-score.

Download Full-text

ML-CB: Machine Learning Canvas Block

Proceedings on Privacy Enhancing Technologies ◽

10.2478/popets-2021-0056 ◽

2021 ◽

Vol 2021 (3) ◽

pp. 453-473

Author(s):

Nathan Reitinger ◽

Michelle L. Mazurek

Keyword(s):

Machine Learning ◽

Supervised Learning ◽

Semantic Representation ◽

Source Code ◽

Online Privacy ◽

Learning Approach ◽

Learning Models ◽

One Step ◽

The Web ◽

Machine Learning Models

Abstract With the aim of increasing online privacy, we present a novel, machine-learning based approach to blocking one of the three main ways website visitors are tracked online—canvas fingerprinting. Because the act of canvas fingerprinting uses, at its core, a JavaScript program, and because many of these programs are reused across the web, we are able to fit several machine learning models around a semantic representation of a potentially offending program, achieving accurate and robust classifiers. Our supervised learning approach is trained on a dataset we created by scraping roughly half a million websites using a custom Google Chrome extension storing information related to the canvas. Classification leverages our key insight that the images drawn by canvas fingerprinting programs have a facially distinct appearance, allowing us to manually classify files based on the images drawn; we take this approach one step further and train our classifiers not on the malleable images themselves, but on the more-difficult-to-change, underlying source code generating the images. As a result, ML-CB allows for more accurate tracker blocking.

Download Full-text

Confederated learning in healthcare: training machine learning models using disconnected data separated by individual, data type and identity for Large-Scale Health System Intelligence (Preprint)

10.2196/preprints.24951 ◽

2020 ◽

Author(s):

Dianbo Liu ◽

Kathe Fox ◽

Griffin Weber ◽

Tim Miller

Keyword(s):

Machine Learning ◽

Heart Disease ◽

Ischemic Heart Disease ◽

Psychological Disorders ◽

Data Type ◽

Learning Approach ◽

Learning Method ◽

Learning Models ◽

Data Types ◽

Machine Learning Models

BACKGROUND A patient’s health information is generally fragmented across silos because it follows how care is delivered: multiple providers in multiple settings. Though it is technically feasible to reunite data for analysis in a manner that underpins a rapid learning healthcare system, privacy concerns and regulatory barriers limit data centralization for this purpose. OBJECTIVE Machine learning can be conducted in a federated manner on patient datasets with the same set of variables, but separated across storage. But federated learning cannot handle the situation where different data types for a given patient are separated vertically across different organizations and when patient ID matching across different institutions is difficult. We call methods that enable machine learning model training on data separated by two or more dimensions “confederated machine learning.” We propose and evaluate confederated learning for training machine learning models to stratify the risk of several diseases among silos when data are horizontally separated by individual, vertically separated by data type, and separated by identity without patient ID matching. METHODS The confederated learning method can be intuitively understood as a distributed learning method with representation learning, generative model, imputation method and data augmentation elements.The confederated learning method we developed consists of three steps: Step 1) Conditional generative adversarial networks with matching loss (cGAN) were trained using data from the central analyzer to infer one data type from another, for example, inferring medications using diagnoses. Generative (cGAN) models were used in this study because a considerable percentage of individuals has not paired data types. For instance, a patient may only have his or her diagnoses in the database but not medication information due to insurance enrolment. cGAN can utilize data with paired information by minimizing matching loss and data without paired information by minimizing adversarial loss. Step 2) Missing data types from each silo were inferred using the model trained in step 1. Step 3) Task-specific models, such as a model to predict diagnoses of diabetes, were trained in a federated manner across all silos simultaneously. RESULTS We conducted experiments to train disease prediction models using confederated learning on a large nationwide health insurance dataset from the U.S that is split into 99 silos. The models stratify individuals by their risk of diabetes, psychological disorders or ischemic heart disease in the next two years, using diagnoses, medication claims and clinical lab test records of patients (See Methods section for details). The goal of these experiments is to test whether a confederated learning approach can simultaneously address the two types of separation mentioned above. CONCLUSIONS we demonstrated that health data distributed across silos separated by individual and data type can be used to train machine learning models without moving or aggregating data. Our method obtains predictive accuracy competitive to a centralized upper bound in predicting risks of diabetes, psychological disorders or ischemic heart disease using previous diagnoses, medications and lab tests as inputs. We compared the performance of a confederated learning approach with models trained on centralized data, only data with the central analyzer or a single data type across silos. The experimental results suggested that confederated learning trained predictive models efficiently across disconnected silos. CLINICALTRIAL NA

Download Full-text

Development and Validation of a Machine Learning Approach for Automated Severity Assessment of COVID-19 Based on Clinical and Imaging Data: Retrospective Study

JMIR Medical Informatics ◽

10.2196/24572 ◽

2021 ◽

Vol 9 (2) ◽

pp. e24572

Author(s):

Juan Carlos Quiroz ◽

You-Zhen Feng ◽

Zhong-Yuan Cheng ◽

Dana Rezazadegan ◽

Ping-Kang Chen ◽

...

Keyword(s):

Machine Learning ◽

Predictive Power ◽

Care Delivery ◽

Learning Approach ◽

Imaging Features ◽

Severity Assessment ◽

Imaging Data ◽

Learning Models ◽

Machine Learning Approach ◽

Machine Learning Models

Background COVID-19 has overwhelmed health systems worldwide. It is important to identify severe cases as early as possible, such that resources can be mobilized and treatment can be escalated. Objective This study aims to develop a machine learning approach for automated severity assessment of COVID-19 based on clinical and imaging data. Methods Clinical data—including demographics, signs, symptoms, comorbidities, and blood test results—and chest computed tomography scans of 346 patients from 2 hospitals in the Hubei Province, China, were used to develop machine learning models for automated severity assessment in diagnosed COVID-19 cases. We compared the predictive power of the clinical and imaging data from multiple machine learning models and further explored the use of four oversampling methods to address the imbalanced classification issue. Features with the highest predictive power were identified using the Shapley Additive Explanations framework. Results Imaging features had the strongest impact on the model output, while a combination of clinical and imaging features yielded the best performance overall. The identified predictive features were consistent with those reported previously. Although oversampling yielded mixed results, it achieved the best model performance in our study. Logistic regression models differentiating between mild and severe cases achieved the best performance for clinical features (area under the curve [AUC] 0.848; sensitivity 0.455; specificity 0.906), imaging features (AUC 0.926; sensitivity 0.818; specificity 0.901), and a combination of clinical and imaging features (AUC 0.950; sensitivity 0.764; specificity 0.919). The synthetic minority oversampling method further improved the performance of the model using combined features (AUC 0.960; sensitivity 0.845; specificity 0.929). Conclusions Clinical and imaging features can be used for automated severity assessment of COVID-19 and can potentially help triage patients with COVID-19 and prioritize care delivery to those at a higher risk of severe disease.

Download Full-text

Development and Validation of a Machine Learning Approach for Automated Severity Assessment of COVID-19 Based on Clinical and Imaging Data: Retrospective Study (Preprint)

10.2196/preprints.24572 ◽

2020 ◽

Author(s):

Juan Carlos Quiroz ◽

You-Zhen Feng ◽

Zhong-Yuan Cheng ◽

Dana Rezazadegan ◽

Ping-Kang Chen ◽

...

Keyword(s):

Machine Learning ◽

Predictive Power ◽

Care Delivery ◽

Learning Approach ◽

Imaging Features ◽

Severity Assessment ◽

Imaging Data ◽

Learning Models ◽

Machine Learning Approach ◽

Machine Learning Models

BACKGROUND COVID-19 has overwhelmed health systems worldwide. It is important to identify severe cases as early as possible, such that resources can be mobilized and treatment can be escalated. OBJECTIVE This study aims to develop a machine learning approach for automated severity assessment of COVID-19 based on clinical and imaging data. METHODS Clinical data—including demographics, signs, symptoms, comorbidities, and blood test results—and chest computed tomography scans of 346 patients from 2 hospitals in the Hubei Province, China, were used to develop machine learning models for automated severity assessment in diagnosed COVID-19 cases. We compared the predictive power of the clinical and imaging data from multiple machine learning models and further explored the use of four oversampling methods to address the imbalanced classification issue. Features with the highest predictive power were identified using the Shapley Additive Explanations framework. RESULTS Imaging features had the strongest impact on the model output, while a combination of clinical and imaging features yielded the best performance overall. The identified predictive features were consistent with those reported previously. Although oversampling yielded mixed results, it achieved the best model performance in our study. Logistic regression models differentiating between mild and severe cases achieved the best performance for clinical features (area under the curve [AUC] 0.848; sensitivity 0.455; specificity 0.906), imaging features (AUC 0.926; sensitivity 0.818; specificity 0.901), and a combination of clinical and imaging features (AUC 0.950; sensitivity 0.764; specificity 0.919). The synthetic minority oversampling method further improved the performance of the model using combined features (AUC 0.960; sensitivity 0.845; specificity 0.929). CONCLUSIONS Clinical and imaging features can be used for automated severity assessment of COVID-19 and can potentially help triage patients with COVID-19 and prioritize care delivery to those at a higher risk of severe disease.

Download Full-text

A Radiogenomics Ensemble to Predict EGFR and KRAS Mutations in NSCLC

Tomography ◽

10.3390/tomography7020014 ◽

2021 ◽

Vol 7 (2) ◽

pp. 154-168

Author(s):

Silvia Moreno ◽

Mario Bonfante ◽

Eduardo Zurek ◽

Dmitry Cherezov ◽

Dmitry Goldgof ◽

...

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Kras Mutation ◽

Learning Approach ◽

Learning Models ◽

Kras Mutations ◽

Machine Learning Approach ◽

Class Average ◽

Public Datasets ◽

Machine Learning Models

Lung cancer causes more deaths globally than any other type of cancer. To determine the best treatment, detecting EGFR and KRAS mutations is of interest. However, non-invasive ways to obtain this information are not available. Furthermore, many times there is a lack of big enough relevant public datasets, so the performance of single classifiers is not outstanding. In this paper, an ensemble approach is applied to increase the performance of EGFR and KRAS mutation prediction using a small dataset. A new voting scheme, Selective Class Average Voting (SCAV), is proposed and its performance is assessed both for machine learning models and CNNs. For the EGFR mutation, in the machine learning approach, there was an increase in the sensitivity from 0.66 to 0.75, and an increase in AUC from 0.68 to 0.70. With the deep learning approach, an AUC of 0.846 was obtained, and with SCAV, the accuracy of the model was increased from 0.80 to 0.857. For the KRAS mutation, both in the machine learning models (0.65 to 0.71 AUC) and the deep learning models (0.739 to 0.778 AUC), a significant increase in performance was found. The results obtained in this work show how to effectively learn from small image datasets to predict EGFR and KRAS mutations, and that using ensembles with SCAV increases the performance of machine learning classifiers and CNNs. The results provide confidence that as large datasets become available, tools to augment clinical capabilities can be fielded.

Download Full-text

Machine learning models to predict onset of dementia: A label learning approach

Alzheimer s & Dementia Translational Research & Clinical Interventions ◽

10.1016/j.trci.2019.10.006 ◽

2019 ◽

Vol 5 (1) ◽

pp. 918-925 ◽

Cited By ~ 6

Author(s):

Vijay S. Nori ◽

Christopher A. Hane ◽

William H. Crown ◽

Rhoda Au ◽

William J. Burke ◽

...

Keyword(s):

Machine Learning ◽

Learning Approach ◽

Learning Models ◽

Machine Learning Models

Download Full-text

A Comparative Study on the Privacy Risks of Face Recognition Libraries

Acta Cybernetica ◽

10.14232/actacyb.289662 ◽

2021 ◽

Author(s):

István Fábián ◽

Gábor György Gulyás

Keyword(s):

Machine Learning ◽

Face Recognition ◽

Rapid Development ◽

Face Image ◽

Human Face ◽

Learning Models ◽

Demographic Attributes ◽

Computational Resources ◽

Privacy Risks ◽

Machine Learning Models

The rapid development of machine learning and the decreasing costs of computational resources has led to a widespread usage of face recognition. While this technology offers numerous benefits, it also poses new risks. We consider risks related to the processing of face embeddings, which are floating point vectors representing the human face in an identifying way. Previously, we showed that even simple machine learning models are capable of inferring demographic attributes from embeddings, leading to the possibility of re-identification attacks. This paper examines three popular Python libraries for face recognition, comparing their face detection performance and inspecting how much risk each library's embeddings pose regarding the aforementioned data leakage. Our experiments were conducted on a balanced face image dataset of different sexes and races, allowing us to discover biases in our results.

Download Full-text

A Deep Learning Approach with Feature Derivation and Selection for Overdue Repayment Forecasting

Applied Sciences ◽

10.3390/app10238491 ◽

2020 ◽

Vol 10 (23) ◽

pp. 8491

Author(s):

Bin Liu ◽

Zhexi Zhang ◽

Junchi Yan ◽

Ning Zhang ◽

Hongyuan Zha ◽

...

Keyword(s):

Neural Network ◽

Machine Learning ◽

Deep Learning ◽

Short Term Memory ◽

Critical Time ◽

Learning Approach ◽

Learning Models ◽

Comparison Results ◽

Online Lending ◽

Machine Learning Models

Risk control has always been a major challenge in finance. Overdue repayment is a frequently encountered discreditable behavior in online lending. Motivated by the powerful capabilities of deep neural networks, we propose a fusion deep learning approach, namely AD-MBLSTM, based on the deep neural network (DNN), multi-layer bi-directional long short-term memory (LSTM) (BiLSTM) and the attention mechanism for overdue repayment behavior forecasting according to historical repayment records. Furthermore, we present a novel feature derivation and selection method for the procedure of data preprocessing. Visualization and interpretability improvement work is also implemented to explore the critical time points and causes of overdue repayment behavior. In addition, we present a new dataset originating from a practical application scenario in online lending. We evaluate our proposed framework on the dataset and compare the performance with various general machine learning models and neural network models. Comparison results and the ablation study demonstrate that our proposed model outperforms many effective general machine learning models by a large margin, and each indispensable sub-component takes an active role.

Download Full-text