Rapid development of cloud-native intelligent data pipelines for scientific data streams using the HASTE Toolkit

Ben Blamey; Salman Toor; Martin Dahlö; Håkan Wieslander; Philip J Harrison; Ida-Maria Sintorn; Alan Sabirsh; Carolina Wählby; Ola Spjuth; Andreas Hellander

doi:10.1093/gigascience/giab018

Rapid development of cloud-native intelligent data pipelines for scientific data streams using the HASTE Toolkit

GigaScience ◽

10.1093/gigascience/giab018 ◽

2021 ◽

Vol 10 (3) ◽

Cited By ~ 1

Author(s):

Ben Blamey ◽

Salman Toor ◽

Martin Dahlö ◽

Håkan Wieslander ◽

Philip J Harrison ◽

...

Keyword(s):

Data Streams ◽

Rapid Development ◽

Scientific Data ◽

Real Time Control ◽

Microscopy Imaging ◽

Data Intensive ◽

Incoming Stream ◽

Time Control ◽

Data Objects ◽

Pipeline Model

Abstract Background Large streamed datasets, characteristic of life science applications, are often resource-intensive to process, transport and store. We propose a pipeline model, a design pattern for scientific pipelines, where an incoming stream of scientific data is organized into a tiered or ordered “data hierarchy". We introduce the HASTE Toolkit, a proof-of-concept cloud-native software toolkit based on this pipeline model, to partition and prioritize data streams to optimize use of limited computing resources. Findings In our pipeline model, an “interestingness function” assigns an interestingness score to data objects in the stream, inducing a data hierarchy. From this score, a “policy” guides decisions on how to prioritize computational resource use for a given object. The HASTE Toolkit is a collection of tools to adopt this approach. We evaluate with 2 microscopy imaging case studies. The first is a high content screening experiment, where images are analyzed in an on-premise container cloud to prioritize storage and subsequent computation. The second considers edge processing of images for upload into the public cloud for real-time control of a transmission electron microscope. Conclusions Through our evaluation, we created smart data pipelines capable of effective use of storage, compute, and network resources, enabling more efficient data-intensive experiments. We note a beneficial separation between scientific concerns of data priority, and the implementation of this behaviour for different resources in different deployment contexts. The toolkit allows intelligent prioritization to be `bolted on' to new and existing systems – and is intended for use with a range of technologies in different deployment scenarios.

Download Full-text

Rapid development of cloud-native intelligent data pipelines for scientific data streams using the HASTE Toolkit

10.1101/2020.09.13.274779 ◽

2020 ◽

Author(s):

Ben Blamey ◽

Salman Toor ◽

Martin Dahlö ◽

Håkan Wieslander ◽

Philip J Harrison ◽

...

Keyword(s):

Case Studies ◽

Data Streams ◽

Rapid Development ◽

A Priori ◽

Scientific Data ◽

Software Toolkit ◽

Data Object ◽

Transmission Electron ◽

Proposed Model ◽

Pipeline Model

AbstractThis paper introduces the HASTE Toolkit, a cloud-native software toolkit capable of partitioning data streams in order to prioritize usage of limited resources. This in turn enables more efficient data-intensive experiments. We propose a model that introduces automated, autonomous decision making in data pipelines, such that a stream of data can be partitioned into a tiered or ordered data hierarchy. Importantly, the partitioning is online and based on data content rather than a priori metadata. At the core of the model are interestingness functions and policies. Interestingness functions assign a quantitative measure of interestingness to a single data object in the stream, an interestingness score. Based on this score, a policy guides decisions on how to prioritize computational resource usage for a given object. The HASTE Toolkit is a collection of tools to adapt data stream processing to this pipeline model. The result is smart data pipelines capable of effective or even optimal use of e.g. storage, compute and network bandwidth, to support experiments involving rapid processing of scientific data characterized by large individual data object sizes. We demonstrate the proposed model and our toolkit through two microscopy imaging case studies, each with their own interestingness functions, policies, and data hierarchies. The first deals with a high content screening experiment, where images are analyzed in an on-premise container cloud with the goal of prioritizing the images for storage and subsequent computation. The second considers edge processing of images for upload into the public cloud for a real-time control loop for a transmission electron microscope.Key PointsWe propose a pipeline model for building intelligent pipelines for streams, accounting for actual information content in data rather than a priori metadata, and present the HASTE Toolkit, a cloud-native software toolkit for supporting rapid development according to the proposed model.We demonstrate how the HASTE Toolkit enables intelligent resource optimization in two image analysis case studies based on a) high-content imaging and b) transmission electron microscopy.We highlight the challenges of storage, processing and transfer in streamed high volume, high velocity scientific data for both cloud and cloud-edge use cases.

Download Full-text

A Real Time Control Architecture for Continuously Managing Patients in a Care Unit

Methods of Information in Medicine ◽

10.1055/s-0038-1634619 ◽

1995 ◽

Vol 34 (05) ◽

pp. 475-488

Author(s):

B. Seroussi ◽

J. F. Boisvieux ◽

V. Morice

Keyword(s):

Real Time ◽

Linear Time ◽

Treatment Plan ◽

Control Architecture ◽

Real Time Control ◽

Time Representation ◽

Time Control ◽

Continuous Support ◽

Definition Of ◽

Computerized System

Abstract:The monitoring and treatment of patients in a care unit is a complex task in which even the most experienced clinicians can make errors. A hemato-oncology department in which patients undergo chemotherapy asked for a computerized system able to provide intelligent and continuous support in this task. One issue in building such a system is the definition of a control architecture able to manage, in real time, a treatment plan containing prescriptions and protocols in which temporal constraints are expressed in various ways, that is, which supervises the treatment, including controlling the timely execution of prescriptions and suggesting modifications to the plan according to the patient’s evolving condition. The system to solve these issues, called SEPIA, has to manage the dynamic, processes involved in patient care. Its role is to generate, in real time, commands for the patient’s care (execution of tests, administration of drugs) from a plan, and to monitor the patient’s state so that it may propose actions updating the plan. The necessity of an explicit time representation is shown. We propose using a linear time structure towards the past, with precise and absolute dates, open towards the future, and with imprecise and relative dates. Temporal relative scales are introduced to facilitate knowledge representation and access.

Download Full-text

REAL-TIME CONTROL OF SURFACE SHAPES OF COMPONENTS OF INFRARED-RANGE FLIP-CHIP PHOTODETECTORS

Автометрия ◽

10.15372/aut20190208 ◽

2019 ◽

Keyword(s):

Real Time ◽

Flip Chip ◽

Infrared Range ◽

Real Time Control ◽

Time Control

Download Full-text

A Decision Support System for Real-Time Control and Monitoring of Dynamical Processes

10.23919/acc.1989.4790219 ◽

1989 ◽

Cited By ~ 1

Author(s):

Steven R. Nann ◽

Asok Ray ◽

Soundar Kumara

Keyword(s):

Decision Support ◽

Decision Support System ◽

Real Time ◽

Support System ◽

Real Time Control ◽

Dynamical Processes ◽

Time Control

Download Full-text

Finite-Dimensional Modeling of Network-Induced Delays for Real-Time Control Systems

10.23919/acc.1988.4789881 ◽

1988 ◽

Cited By ~ 1

Author(s):

Asok Ray ◽

Yoram Halevi

Keyword(s):

Control Systems ◽

Real Time ◽

Real Time Control ◽

Time Control ◽

Finite Dimensional ◽

Dimensional Modeling

Download Full-text

Developing Instrumentation Laboratory with Real Time Control Component

1993 American Control Conference ◽

10.23919/acc.1993.4793240 ◽

1993 ◽

Author(s):

Amitava Jana ◽

Sahib S. Chehl

Keyword(s):

Real Time ◽

Real Time Control ◽

Time Control

Download Full-text

Real-time Control for Mobile Robot Considering Environmental Changes

Journal of the Japan Society for Precision Engineering ◽

10.2493/jjspe.73.1369 ◽

2007 ◽

Vol 73 (12) ◽

pp. 1369-1374

Author(s):

Hiromi SATO ◽

Yuichiro MORIKUNI ◽

Kiyotaka KATO

Keyword(s):

Mobile Robot ◽

Real Time ◽

Environmental Changes ◽

Real Time Control ◽

Time Control

Download Full-text

MATHEMATICAL MODEL OF RELIABILITY OF INFORMATION PROCESSING COMPUTER APPLIANCES FOR REAL-TIME CONTROL SYSTEMS

Siberian Journal of Science and Technology ◽

10.31772/2587-6066-2020-21-3-296-302 ◽

2020 ◽

Vol 21 (3) ◽

pp. 296-302

Author(s):

A. V. Aab ◽

◽

P. V. Galushin ◽

A. V. Popova ◽

V. A. Terskov ◽

...

Keyword(s):

Mathematical Model ◽

Information Processing ◽

Control Systems ◽

Real Time ◽

Real Time Control ◽

Time Control

Download Full-text

Using graphs to construct a math model of a microcontroller-based system for preset rate control of a flywheel motor to be used in high-dynamics spacecraft

Space engineering and technology ◽

10.33950/spacetech-2308-7625-2020-1-126-134 ◽

2020 ◽

pp. 126-134

Author(s):

Vladimir V. NEKRASOV

Keyword(s):

Real Time ◽

Rotation Rate ◽

Control Function ◽

Real Time Control ◽

Math Model ◽

The Real ◽

Time Control ◽

Power Matrix ◽

High Dynamics ◽

Stated Problem

Developing a microcontroller-based system for controlling the flywheel motor of high-dynamics spacecraft using Russian-made parts and components made it possible to make statement of the problem of searching control function for a preset rotation rate of the flywheel rotor. This paper discusses one of the possible options for mathematical study of the stated problem, namely, application of structural analysis based on graph theory. Within the framework of the stated problem a graph was constructed for generating the new required rate, while in order to consider the stochastic case option the incidence and adjacency matrices were constructed. The stated problem was solved using a power matrix which transforms a set of contiguous matrices of the graph of admissible solution edge sequences, the real-time control function was found. Based on the results of this work, operational trials were run for the developed control function of the flywheel motor rotor rotation rate, a math model was constructed for the real-time control function, and conclusions were drawn about the feasibility of implementing the results of this study. Key words: Control function, graph, incidence matrix, adjacency matrix, power matrix, microcontroller control of the flywheel motor, highly dynamic spacecraft.

Download Full-text

Review for "A real time control strategy for improvement of autonomous operation in AC / DC micro grids using electric springs"

10.1002/2050-7038.12626/v2/review1 ◽

2020 ◽

Keyword(s):

Real Time ◽

Control Strategy ◽

Real Time Control ◽

Time Control ◽

Autonomous Operation ◽

Micro Grids

Download Full-text