Assuring the Machine Learning Lifecycle

Rob Ashmore; Radu Calinescu; Colin Paterson

doi:10.1145/3453444

Assuring the Machine Learning Lifecycle

ACM Computing Surveys ◽

10.1145/3453444 ◽

2021 ◽

Vol 54 (5) ◽

pp. 1-39

Author(s):

Rob Ashmore ◽

Radu Calinescu ◽

Colin Paterson

Keyword(s):

Machine Learning ◽

Iterative Process ◽

State Of The Art ◽

The State ◽

Enabling Technology ◽

Safety Critical ◽

Wide Range ◽

Comprehensive Survey ◽

Intended Use

Machine learning has evolved into an enabling technology for a wide range of highly successful applications. The potential for this success to continue and accelerate has placed machine learning (ML) at the top of research, economic, and political agendas. Such unprecedented interest is fuelled by a vision of ML applicability extending to healthcare, transportation, defence, and other domains of great societal importance. Achieving this vision requires the use of ML in safety-critical applications that demand levels of assurance beyond those needed for current ML applications. Our article provides a comprehensive survey of the state of the art in the assurance of ML , i.e., in the generation of evidence that ML is sufficiently safe for its intended use. The survey covers the methods capable of providing such evidence at different stages of the machine learning lifecycle , i.e., of the complex, iterative process that starts with the collection of the data used to train an ML component for a system, and ends with the deployment of that component within the system. The article begins with a systematic presentation of the ML lifecycle and its stages. We then define assurance desiderata for each stage, review existing methods that contribute to achieving these desiderata, and identify open challenges that require further research.

Download Full-text

A Survey of On-Device Machine Learning

ACM Transactions on Internet of Things ◽

10.1145/3450494 ◽

2021 ◽

Vol 2 (3) ◽

pp. 1-49

Author(s):

Sauptik Dhar ◽

Junyao Guo ◽

Jiayi (Jason) Liu ◽

Samarth Tripathi ◽

Unmesh Kurup ◽

...

Keyword(s):

Machine Learning ◽

State Of The Art ◽

The State ◽

Smart Devices ◽

Model Adaptation ◽

Middle Ground ◽

Research Areas ◽

Comprehensive Survey ◽

Constrained Learning ◽

Model Training

The predominant paradigm for using machine learning models on a device is to train a model in the cloud and perform inference using the trained model on the device. However, with increasing numbers of smart devices and improved hardware, there is interest in performing model training on the device. Given this surge in interest, a comprehensive survey of the field from a device-agnostic perspective sets the stage for both understanding the state of the art and for identifying open challenges and future avenues of research. However, on-device learning is an expansive field with connections to a large number of related topics in AI and machine learning (including online learning, model adaptation, one/few-shot learning, etc.). Hence, covering such a large number of topics in a single survey is impractical. This survey finds a middle ground by reformulating the problem of on-device learning as resource constrained learning where the resources are compute and memory. This reformulation allows tools, techniques, and algorithms from a wide variety of research areas to be compared equitably. In addition to summarizing the state of the art, the survey also identifies a number of challenges and next steps for both the algorithmic and theoretical aspects of on-device learning.

Download Full-text

Density Guarantee on Finding Multiple Subgraphs and Subtensors

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3446668 ◽

2021 ◽

Vol 15 (5) ◽

pp. 1-32

Author(s):

Quang-huy Duong ◽

Heri Ramampiaro ◽

Kjetil Nørvåg ◽

Thu-lan Dam

Keyword(s):

Lower Bound ◽

State Of The Art ◽

The State ◽

The Other ◽

Exact Methods ◽

Practical Solution ◽

Novel Approach ◽

Wide Range ◽

Real World Datasets ◽

Tensor Data

Dense subregion (subgraph & subtensor) detection is a well-studied area, with a wide range of applications, and numerous efficient approaches and algorithms have been proposed. Approximation approaches are commonly used for detecting dense subregions due to the complexity of the exact methods. Existing algorithms are generally efficient for dense subtensor and subgraph detection, and can perform well in many applications. However, most of the existing works utilize the state-or-the-art greedy 2-approximation algorithm to capably provide solutions with a loose theoretical density guarantee. The main drawback of most of these algorithms is that they can estimate only one subtensor, or subgraph, at a time, with a low guarantee on its density. While some methods can, on the other hand, estimate multiple subtensors, they can give a guarantee on the density with respect to the input tensor for the first estimated subsensor only. We address these drawbacks by providing both theoretical and practical solution for estimating multiple dense subtensors in tensor data and giving a higher lower bound of the density. In particular, we guarantee and prove a higher bound of the lower-bound density of the estimated subgraph and subtensors. We also propose a novel approach to show that there are multiple dense subtensors with a guarantee on its density that is greater than the lower bound used in the state-of-the-art algorithms. We evaluate our approach with extensive experiments on several real-world datasets, which demonstrates its efficiency and feasibility.

Download Full-text

A Systematic Review of Recommender Systems and Their Applications in Cybersecurity

Sensors ◽

10.3390/s21155248 ◽

2021 ◽

Vol 21 (15) ◽

pp. 5248

Author(s):

Aleksandra Pawlicka ◽

Marek Pawlicki ◽

Rafał Kozik ◽

Ryszard S. Choraś

Keyword(s):

Systematic Review ◽

Recommender Systems ◽

Recommender System ◽

State Of The Art ◽

The State ◽

Advantages And Disadvantages ◽

Comprehensive Survey ◽

Security Concerns ◽

Valuable Role

This paper discusses the valuable role recommender systems may play in cybersecurity. First, a comprehensive presentation of recommender system types is presented, as well as their advantages and disadvantages, possible applications and security concerns. Then, the paper collects and presents the state of the art concerning the use of recommender systems in cybersecurity; both the existing solutions and future ideas are presented. The contribution of this paper is two-fold: to date, to the best of our knowledge, there has been no work collecting the applications of recommenders for cybersecurity. Moreover, this paper attempts to complete a comprehensive survey of recommender types, after noticing that other works usually mention two–three types at once and neglect the others.

Download Full-text

Artificial intelligence and machine learning in design of mechanical materials

Materials Horizons ◽

10.1039/d0mh01451f ◽

2021 ◽

Author(s):

Kai Guo ◽

Zhenze Yang ◽

Chi-Hua Yu ◽

Markus J. Buehler

Keyword(s):

Artificial Intelligence ◽

Machine Learning ◽

State Of The Art ◽

The State

This review revisits the state of the art of research efforts on the design of mechanical materials using machine learning.

Download Full-text

The state of the art on buffer allocation problem: a comprehensive survey

Journal of Intelligent Manufacturing ◽

10.1007/s10845-012-0687-9 ◽

2012 ◽

Vol 25 (3) ◽

pp. 371-392 ◽

Cited By ~ 78

Author(s):

Leyla Demir ◽

Semra Tunali ◽

Deniz Tursel Eliiyi

Keyword(s):

State Of The Art ◽

The State ◽

Allocation Problem ◽

Buffer Allocation ◽

Comprehensive Survey

Download Full-text

The State of the Art of Rubber-Seal Technology

Rubber Chemistry and Technology ◽

10.5254/1.3536136 ◽

1987 ◽

Vol 60 (3) ◽

pp. 381-416 ◽

Cited By ~ 20

Author(s):

B. S. Nau

Keyword(s):

State Of The Art ◽

Test Procedure ◽

Working Party ◽

The State ◽

Supporting Evidence ◽

Wide Range ◽

Lip Seals ◽

Alternative Hypotheses ◽

Series Of Experiments ◽

Range Of Values

Abstract The understanding of the engineering fundamentals of rubber seals of all the various types has been developing gradually over the past two or three decades, but there is still much to understand, Tables V–VII summarize the state of the art. In the case of rubber-based gaskets, the field of high-temperature applications has scarcely been touched, although there are plans to initiate work in this area both in the U.S.A. at PVRC, and in the U.K., at BHRA. In the case of reciprocating rubber seals, a broad basis of theory and experiment has been developed, yet it still is not possible to design such a seal from first principles. Indeed, in a comparative series of experiments run recently on seals from a single batch, tested in different laboratories round the world to the same test procedure, under the aegis of an ISO working party, a very wide range of values was reported for leakage and friction. The explanation for this has still to be ascertained. In the case of rotary lip seals, theories and supporting evidence have been brought forward to support alternative hypotheses for lubrication and sealing mechanisms. None can be said to have become generally accepted, and it remains to crystallize a unified theory.

Download Full-text

On the state of the art in machine learning: A personal review

Artificial Intelligence ◽

10.1016/s0004-3702(01)00125-4 ◽

2001 ◽

Vol 131 (1-2) ◽

pp. 199-222 ◽

Cited By ~ 27

Author(s):

Peter A. Flach

Keyword(s):

Machine Learning ◽

State Of The Art ◽

The State

Download Full-text

Adversarial Machine Learning: A Multi-Layer Review of the State-of-the-Art and Challenges for Wireless and Mobile Systems

IEEE Communications Surveys & Tutorials ◽

10.1109/comst.2021.3136132 ◽

2021 ◽

pp. 1-1

Author(s):

Jinxin Liu ◽

Michele Nogueira ◽

Johan Fernandes ◽

Burak Kantarci

Keyword(s):

Machine Learning ◽

State Of The Art ◽

The State ◽

Mobile Systems

Download Full-text

A Graph-based Evolutionary Algorithm for Automated Machine Learning

10.37686/ser.v1i2.77 ◽

2020 ◽

Author(s):

Fei Qi ◽

Zhaohui Xia ◽

Gaoyang Tang ◽

Hang Yang ◽

Yu Song ◽

...

Keyword(s):

Machine Learning ◽

Evolutionary Algorithm ◽

Parameter Optimization ◽

State Of The Art ◽

The State ◽

Complex Structures ◽

Architecture Evolution ◽

Automated Machine Learning ◽

Art Performance

As an emerging field, Automated Machine Learning (AutoML) aims to reduce or eliminate manual operations that require expertise in machine learning. In this paper, a graph-based architecture is employed to represent flexible combinations of ML models, which provides a large searching space compared to tree-based and stacking-based architectures. Based on this, an evolutionary algorithm is proposed to search for the best architecture, where the mutation and heredity operators are the key for architecture evolution. With Bayesian hyper-parameter optimization, the proposed approach can automate the workflow of machine learning. On the PMLB dataset, the proposed approach shows the state-of-the-art performance compared with TPOT, Autostacker, and auto-sklearn. Some of the optimized models are with complex structures which are difficult to obtain in manual design.

Download Full-text

A Network Parameter Database False Data Injection Correction Physics-Based Model: A Machine Learning Synthetic Measurement-Based Approach

Applied Sciences ◽

10.3390/app11178074 ◽

2021 ◽

Vol 11 (17) ◽

pp. 8074

Author(s):

Tierui Zou ◽

Nader Aljohani ◽

Keerthiraj Nagaraj ◽

Sheng Zou ◽

Cody Ruben ◽

...

Keyword(s):

Machine Learning ◽

Power Systems ◽

State Of The Art ◽

Real Life ◽

The State ◽

Wide Area ◽

Network Parameter ◽

False Data Injection ◽

Network Parameters ◽

Injection Attacks

Concerning power systems, real-time monitoring of cyber–physical security, false data injection attacks on wide-area measurements are of major concern. However, the database of the network parameters is just as crucial to the state estimation process. Maintaining the accuracy of the system model is the other part of the equation, since almost all applications in power systems heavily depend on the state estimator outputs. While much effort has been given to measurements of false data injection attacks, seldom reported work is found on the broad theme of false data injection on the database of network parameters. State-of-the-art physics-based model solutions correct false data injection on network parameter database considering only available wide-area measurements. In addition, deterministic models are used for correction. In this paper, an overdetermined physics-based parameter false data injection correction model is presented. The overdetermined model uses a parameter database correction Jacobian matrix and a Taylor series expansion approximation. The method further applies the concept of synthetic measurements, which refers to measurements that do not exist in the real-life system. A machine learning linear regression-based model for measurement prediction is integrated in the framework through deriving weights for synthetic measurements creation. Validation of the presented model is performed on the IEEE 118-bus system. Numerical results show that the approximation error is lower than the state-of-the-art, while providing robustness to the correction process. Easy-to-implement model on the classical weighted-least-squares solution, highlights real-life implementation potential aspects.

Download Full-text