Applying DevOps Practices of Continuous Automation for Machine Learning

Ioannis Karamitsos; Saeed Albarhami; Charalampos Apostolopoulos

doi:10.3390/info11070363

Applying DevOps Practices of Continuous Automation for Machine Learning

Information ◽

10.3390/info11070363 ◽

2020 ◽

Vol 11 (7) ◽

pp. 363

Author(s):

Ioannis Karamitsos ◽

Saeed Albarhami ◽

Charalampos Apostolopoulos

Keyword(s):

Machine Learning ◽

Real World ◽

Feedback Loops ◽

Learning Processes ◽

Technical Debt ◽

Complex Time ◽

Continuous Integration ◽

Continuous Delivery ◽

Machine Learning Applications ◽

Operation Environment

This paper proposes DevOps practices for machine learning application, integrating both the development and operation environment seamlessly. The machine learning processes of development and deployment during the experimentation phase may seem easy. However, if not carefully designed, deploying and using such models may lead to a complex, time-consuming approaches which may require significant and costly efforts for maintenance, improvement, and monitoring. This paper presents how to apply continuous integration (CI) and continuous delivery (CD) principles, practices, and tools so as to minimize waste, support rapid feedback loops, explore the hidden technical debt, improve value delivery and maintenance, and improve operational functions for real-world machine learning applications.

Download Full-text

RON-Gauss: Enhancing Utility in Non-Interactive Private Data Release

Proceedings on Privacy Enhancing Technologies ◽

10.2478/popets-2019-0003 ◽

2019 ◽

Vol 2019 (1) ◽

pp. 26-46 ◽

Cited By ~ 2

Author(s):

Thee Chanyaswad ◽

Changchang Liu ◽

Prateek Mittal

Keyword(s):

Machine Learning ◽

Real World ◽

Differential Privacy ◽

Real Data ◽

The Novel ◽

Private Data ◽

Data Release ◽

Machine Learning Applications ◽

Order Of Magnitude ◽

Real World Datasets

Abstract A key challenge facing the design of differential privacy in the non-interactive setting is to maintain the utility of the released data. To overcome this challenge, we utilize the Diaconis-Freedman-Meckes (DFM) effect, which states that most projections of high-dimensional data are nearly Gaussian. Hence, we propose the RON-Gauss model that leverages the novel combination of dimensionality reduction via random orthonormal (RON) projection and the Gaussian generative model for synthesizing differentially-private data. We analyze how RON-Gauss benefits from the DFM effect, and present multiple algorithms for a range of machine learning applications, including both unsupervised and supervised learning. Furthermore, we rigorously prove that (a) our algorithms satisfy the strong ɛ-differential privacy guarantee, and (b) RON projection can lower the level of perturbation required for differential privacy. Finally, we illustrate the effectiveness of RON-Gauss under three common machine learning applications – clustering, classification, and regression – on three large real-world datasets. Our empirical results show that (a) RON-Gauss outperforms previous approaches by up to an order of magnitude, and (b) loss in utility compared to the non-private real data is small. Thus, RON-Gauss can serve as a key enabler for real-world deployment of privacy-preserving data release.

Download Full-text

Tutorial on Software Testing & Quality Assurance for Machine Learning Applications from research bench to real world

Proceedings of the 7th ACM IKDD CoDS and 25th COMAD ◽

10.1145/3371158.3371233 ◽

2020 ◽

Author(s):

Sandya Mannarswamy ◽

Shourya Roy ◽

Saravanan Chidambaram

Keyword(s):

Machine Learning ◽

Quality Assurance ◽

Software Testing ◽

Real World ◽

Machine Learning Applications

Download Full-text

On the characterization of the deterministic/stochastic and linear/nonlinear nature of time series

Proceedings of The Royal Society A Mathematical Physical and Engineering Sciences ◽

10.1098/rspa.2007.0154 ◽

2008 ◽

Vol 464 (2093) ◽

pp. 1141-1160 ◽

Cited By ~ 35

Author(s):

D.P Mandic ◽

M Chen ◽

T Gautama ◽

M.M Van Hulle ◽

A Constantinides

Keyword(s):

Machine Learning ◽

Real World ◽

Heterogeneous Data ◽

Characterization Methods ◽

Machine Learning Applications ◽

Nature Of Time ◽

Qualitative Performance ◽

New Criterion ◽

Simultaneous Assessment

The need for the characterization of real-world signals in terms of their linear, nonlinear, deterministic and stochastic nature is highlighted and a novel framework for signal modality characterization is presented. A comprehensive analysis of signal nonlinearity characterization methods is provided, and based upon local predictability in phase space, a new criterion for qualitative performance assessment in machine learning is introduced. This is achieved based on a simultaneous assessment of nonlinearity and uncertainty within a real-world signal. Next, for a given embedding dimension, based on the target variance of delay vectors, a novel framework for heterogeneous data fusion is introduced. The proposed signal modality characterization framework is verified by comprehensive simulations and comparison against other established methods. Case studies covering a range of machine learning applications support the analysis.

Download Full-text

Machine Learning Applications in Real World

International Journal of Hybrid Information Technology ◽

10.21742/ijhit.2019.12.1.04 ◽

2019 ◽

Vol 12 (1) ◽

Keyword(s):

Machine Learning ◽

Real World ◽

Machine Learning Applications

Download Full-text

Pharmacoepidemiology and Big Data Analytics: Challenges and Opportunities when Moving towards Precision Medicine

CHIMIA International Journal for Chemistry ◽

10.2533/chimia.2019.1012 ◽

2019 ◽

Vol 73 (12) ◽

pp. 1012-1017

Author(s):

Andrea M. Burden

Keyword(s):

Machine Learning ◽

Language Processing ◽

Real World ◽

Big Data Analytics ◽

Large Size ◽

Challenges And Opportunities ◽

Long Term Follow Up ◽

Machine Learning Applications

Pharmacoepidemiology is the study of the safety and effectiveness of medications following market approval. The increased availability and size of healthcare utilization databases allows for the study of rare adverse events, sub-group analyses, and long-term follow-up. These datasets are large, including thousands of patient records spanning multiple years of observation, and representative of real-world clinical practice. Thus, one of the main advantages is the possibility to study the real-world safety and effectiveness of medications in uncontrolled environments. Due to the large size (volume), structure (variety), and availability (velocity) of observational healthcare databases there is a large interest in the application of natural language processing and machine learning, including the development of novel models to detect drug–drug interactions, patient phenotypes, and outcome prediction. This report will provide an overview of the current challenges in pharmacoepidemiology and where machine learning applications may be useful for filling the gap.

Download Full-text

Evaluation of machine learning applications using real-world EHR data for predicting diabetes-related long-term complications

Journal of Business Analytics ◽

10.1080/2573234x.2021.1979901 ◽

2021 ◽

pp. 1-11

Author(s):

Abu Saleh Mohammad Mosa ◽

Chalermpon Thongmotai ◽

Humayera Islam ◽

Tanmoy Paul ◽

K. S. M. Tozammel Hossain ◽

...

Keyword(s):

Machine Learning ◽

Real World ◽

Machine Learning Applications

Download Full-text

Machine learning applications for shock train diagnostics

AIAA Scitech 2021 Forum ◽

10.2514/6.2021-1878 ◽

2021 ◽

Author(s):

Jared Chin ◽

Mirko Gamba

Keyword(s):

Machine Learning ◽

Shock Train ◽

Machine Learning Applications

Download Full-text

Exploring the Applications of Machine Learning in Healthcare

International Journal of Sensors Wireless Communications and Control ◽

10.2174/2210327910666191220103417 ◽

2020 ◽

Vol 10 (4) ◽

pp. 458-472

Author(s):

Tausifa Jan Saleem ◽

Mohammad Ahsan Chishti

Keyword(s):

Machine Learning ◽

Disease Risk ◽

Disease Diagnosis ◽

Machine Intelligence ◽

Healthcare Applications ◽

Comprehensive Overview ◽

Machine Learning Applications ◽

Remote Healthcare ◽

Healthcare Monitoring ◽

Applications Of Machine Learning

The rapid progress in domains like machine learning, and big data has created plenty of opportunities in data-driven applications particularly healthcare. Incorporating machine intelligence in healthcare can result in breakthroughs like precise disease diagnosis, novel methods of treatment, remote healthcare monitoring, drug discovery, and curtailment in healthcare costs. The implementation of machine intelligence algorithms on the massive healthcare datasets is computationally expensive. However, consequential progress in computational power during recent years has facilitated the deployment of machine intelligence algorithms in healthcare applications. Motivated to explore these applications, this paper presents a review of research works dedicated to the implementation of machine learning on healthcare datasets. The studies that were conducted have been categorized into following groups (a) disease diagnosis and detection, (b) disease risk prediction, (c) health monitoring, (d) healthcare related discoveries, and (e) epidemic outbreak prediction. The objective of the research is to help the researchers in this field to get a comprehensive overview of the machine learning applications in healthcare. Apart from revealing the potential of machine learning in healthcare, this paper will serve as a motivation to foster advanced research in the domain of machine intelligence-driven healthcare.

Download Full-text

Learning and control

10.1093/oso/9780199674923.003.0026 ◽

2018 ◽

Author(s):

Ivan Herreros

Keyword(s):

Machine Learning ◽

Reinforcement Learning ◽

Brain Function ◽

Control Strategies ◽

Learning Problems ◽

Animal Learning ◽

Feed Forward Control ◽

Machine Learning Applications ◽

And Control

This chapter discusses basic concepts from control theory and machine learning to facilitate a formal understanding of animal learning and motor control. It first distinguishes between feedback and feed-forward control strategies, and later introduces the classification of machine learning applications into supervised, unsupervised, and reinforcement learning problems. Next, it links these concepts with their counterparts in the domain of the psychology of animal learning, highlighting the analogies between supervised learning and classical conditioning, reinforcement learning and operant conditioning, and between unsupervised and perceptual learning. Additionally, it interprets innate and acquired actions from the standpoint of feedback vs anticipatory and adaptive control. Finally, it argues how this framework of translating knowledge between formal and biological disciplines can serve us to not only structure and advance our understanding of brain function but also enrich engineering solutions at the level of robot learning and control with insights coming from biology.

Download Full-text

Towards CRISP-ML(Q): A Machine Learning Process Model with Quality Assurance Methodology

Machine Learning and Knowledge Extraction ◽

10.3390/make3020020 ◽

2021 ◽

Vol 3 (2) ◽

pp. 392-413

Author(s):

Stefan Studer ◽

Thanh Binh Bui ◽

Christian Drescher ◽

Alexander Hanuschkin ◽

Ludwig Winkler ◽

...

Keyword(s):

Machine Learning ◽

Quality Assurance ◽

Process Model ◽

Practical Experience ◽

Special Focus ◽

Close Monitoring ◽

Machine Learning Applications ◽

Project Organizations ◽

Considerable Impact ◽

Learning Development

Machine learning is an established and frequently used technique in industry and academia, but a standard process model to improve success and efficiency of machine learning applications is still missing. Project organizations and machine learning practitioners face manifold challenges and risks when developing machine learning applications and have a need for guidance to meet business expectations. This paper therefore proposes a process model for the development of machine learning applications, covering six phases from defining the scope to maintaining the deployed machine learning application. Business and data understanding are executed simultaneously in the first phase, as both have considerable impact on the feasibility of the project. The next phases are comprised of data preparation, modeling, evaluation, and deployment. Special focus is applied to the last phase, as a model running in changing real-time environments requires close monitoring and maintenance to reduce the risk of performance degradation over time. With each task of the process, this work proposes quality assurance methodology that is suitable to address challenges in machine learning development that are identified in the form of risks. The methodology is drawn from practical experience and scientific literature, and has proven to be general and stable. The process model expands on CRISP-DM, a data mining process model that enjoys strong industry support, but fails to address machine learning specific tasks. The presented work proposes an industry- and application-neutral process model tailored for machine learning applications with a focus on technical tasks for quality assurance.

Download Full-text