Towards a scalable, open-standards service for brokering cross-protocol data transfers across multiple sources and sinks

Author(s):  
David Meredith ◽  
Stephen Crouch ◽  
Gerson Galang ◽  
Ming Jiang ◽  
Hung Nguyen ◽  
...  

Data Transfer Service (DTS) is an open-source project that is developing a document-centric message model for describing a bulk data transfer activity, with an accompanying set of loosely coupled and platform-independent components for brokering the transfer of data between a wide range of (potentially incompatible) storage resources as scheduled, fault-tolerant batch jobs. The architecture scales from small embedded deployments on a single computer to large distributed deployments through an expandable ‘worker-node pool’ controlled through message-orientated middleware. Data access and transfer efficiency are maximized through the strategic placement of worker nodes at or between particular data sources/sinks. The design is inherently asynchronous, and, when third-party transfer is not available, it side-steps the bandwidth, concurrency and scalability limitations associated with buffering bytes directly through intermediary client applications. It aims to address geographical–topological deployment concerns by allowing service hosting to be either centralized (as part of a shared service) or confined to a single institution or domain. Established design patterns and open-source components are coupled with a proposal for a document-centric and open-standards-based messaging protocol. As part of the development of the message protocol, a bulk data copy activity document is proposed for the first time.

2021 ◽  
Vol 12 ◽  
Author(s):  
Rudolf N. Cardinal ◽  
Martin Burchell

CamCOPS is a free, open-source client–server system for secure data capture in the domain of psychiatry, psychology, and the clinical neurosciences. The client is a cross-platform C++ application, suitable for mobile and offline (disconnected) use. It allows touchscreen data entry by subjects/patients, researchers/clinicians, or both together. It implements a large and extensible range of tasks, from simple questionnaires to complex animated tasks. The client uses encrypted data storage and sends data via an encrypted network connection to a CamCOPS server. Individual institutional users set up and run their own CamCOPS server, so no data is transferred outside the hosting institution's control. The server, written in Python, provides clinically oriented and research-oriented views of tasks, including the tracking of changes over time. It provides an audit trail, export facilities (such as to an institution's primary electronic health record system), and full structured data access subject to authorization. A single CamCOPS server can support multiple research/clinical groups, each having its own identity policy (e.g., fully identifiable for clinical use; de-identified/pseudonymised for research use). Intellectual property rules regarding third-party tasks vary and CamCOPS has several mechanisms to support compliance, including for tasks that may be permitted to some institutions but not others. CamCOPS supports task scheduling and home testing via a simplified user interface. We describe the software, report local information governance approvals within part of the UK National Health Service, and describe illustrative clinical and research uses.


2016 ◽  
Vol 24 (2) ◽  
pp. 398-402 ◽  
Author(s):  
Kavishwar B Wagholikar ◽  
Joshua C Mandel ◽  
Jeffery G Klann ◽  
Nich Wattanasin ◽  
Michael Mendis ◽  
...  

We have developed an interface to serve patient data from Informatics for Integrating Biology and the Bedside (i2b2) repositories in the Fast Healthcare Interoperability Resources (FHIR) format, referred to as a SMART-on-FHIR cell. The cell serves FHIR resources on a per-patient basis, and supports the “substitutable” modular third-party applications (SMART) OAuth2 specification for authorization of client applications. It is implemented as an i2b2 server plug-in, consisting of 6 modules: authentication, REST, i2b2-to-FHIR converter, resource enrichment, query engine, and cache. The source code is freely available as open source. We tested the cell by accessing resources from a test i2b2 installation, demonstrating that a SMART app can be launched from the cell that accesses patient data stored in i2b2. We successfully retrieved demographics, medications, labs, and diagnoses for test patients. The SMART-on-FHIR cell will enable i2b2 sites to provide simplified but secure data access in FHIR format, and will spur innovation and interoperability. Further, it transforms i2b2 into an apps platform.


Author(s):  
Yatharth Ranjan ◽  
Zulqarnain Rashid ◽  
Callum Stewart ◽  
Maximilian Kerz ◽  
Mark Begale ◽  
...  

BACKGROUND With a wide range of use cases in both research and clinical domains, collecting continuous mobile health (mHealth) streaming data from multiple sources in a secure, highly scalable and extensible platform is of high interest to the open source mHealth community. The EU IMI RADAR-CNS program is an exemplar project with the requirements to support collection of high resolution data at scale; as such, the RADAR-base platform is designed to meet these needs and additionally facilitate a new generation of mHealth projects in this nascent field. OBJECTIVE Wide-bandwidth networks, smartphone penetrance and wearable sensors offer new possibilities for collecting (near) real-time high resolution datasets from large numbers of participants. We aimed to build a platform that would cater for large scale data collection for remote monitoring initiatives. Key criteria are around scalability, extensibility, security and privacy. METHODS RADAR-base is developed as a modular application, the backend is built on a backbone of the highly successful Confluent/Apache Kafka framework for streaming data. To facilitate scaling and ease of deployment, we use Docker containers to package the components of the platform. RADAR-base provides two main mobile apps for data collection, a Passive App and an Active App. Other 3rd Party Apps and sensors are easily integrated into the platform. Management user interfaces to support data collection and enrolment are also provided. RESULTS General principles of the platform components and design of RADAR-base are presented here, with examples of the types of data currently being collected from devices used in RADAR-CNS projects: Multiple Sclerosis, Epilepsy and Depression cohorts. CONCLUSIONS RADAR-base is a fully functional, remote data collection platform built around Confluent/Apache Kafka and provides off-the-shelf components for projects interested in collecting mHealth datasets at scale.


Author(s):  
Shalin Hai-Jew

As a public and open-source resource, Wikipedia is used by many in the public as a quick reference; academic researchers have tapped Wikipedia for human- and machine-based insights. The MediaWiki understructure enables a wide range of transparency, enabling users to very easily search and access template-structured text and image contents, source citations, contributors, page histories, and others. Transparency has been hard-wired into the platform technology. Developers have been building tools to extend the transparency of data built on a MediaWiki understructure. Network Overview, Discovery and Exploration for Excel (NodeXL) features a third-party graph data importer that enables the extraction of MediaWiki article (and user) networks, which include all of the languages of Wikipedias. This chapter highlights the uses of graphs from two main types of Wikipedia pages for increased knowledge transparency: (1) topical article pages and (2) contributor user pages (whether human or robot).


2018 ◽  
Vol 7 (4) ◽  
pp. 47 ◽  
Author(s):  
Daniel Fisher ◽  
Lisa Woodruff ◽  
Saseendran Anapalli ◽  
Srinavasa Pinnamaneni

Agricultural research involves study of the complex soil–plant–atmosphere–water system, and data relating to this system must be collected under often-harsh outdoor conditions in agricultural environments. Rapid advancements in electronic technologies in the last few decades, as well as more recent widespread proliferation and adoption of electronic sensing and communications, have created many options to address the needs of professional, as well as amateur, researchers. In this study, an agricultural research project was undertaken to collect data and examine the effects of different agronomic practices on yield, with the objectives being to develop a monitoring system to measure soil moisture and temperature conditions in field plots and to upload the data to an internet website. The developed system included sensor nodes consisting of sensors and electronic circuitry to read and transmit sensor data via radio and a cellular gateway to receive node data and forward the data to an internet website via cellular infrastructure. Microcontroller programs were written to control the nodes and gateway, and an internet website was configured to receive and display sensor data. The battery-powered sensor nodes cost $170 each, including electronic circuitry and sensors, and they were operated throughout the cropping season with little maintenance on a single set of batteries. The solar-powered gateway cost $163 to fabricate, plus an additional cost of $2 per month for cellular network access. Wireless and cellular data transmissions were reliable, successfully transferring 95% of sensor data to the internet website. Application of open-source hardware, wireless data transfer, and internet-based data access therefore offers many options and advantages for agricultural sensing and monitoring efforts.


GigaScience ◽  
2021 ◽  
Vol 10 (7) ◽  
Author(s):  
Jonas Cordes ◽  
Thomas Enzlein ◽  
Christian Marsching ◽  
Marven Hinze ◽  
Sandy Engelhardt ◽  
...  

Abstract Background Mass spectrometry imaging (MSI) is a label-free analysis method for resolving bio-molecules or pharmaceuticals in the spatial domain. It offers unique perspectives for the examination of entire organs or other tissue specimens. Owing to increasing capabilities of modern MSI devices, the use of 3D and multi-modal MSI becomes feasible in routine applications—resulting in hundreds of gigabytes of data. To fully leverage such MSI acquisitions, interactive tools for 3D image reconstruction, visualization, and analysis are required, which preferably should be open-source to allow scientists to develop custom extensions. Findings We introduce M2aia (MSI applications for interactive analysis in MITK), a software tool providing interactive and memory-efficient data access and signal processing of multiple large MSI datasets stored in imzML format. M2aia extends MITK, a popular open-source tool in medical image processing. Besides the steps of a typical signal processing workflow, M2aia offers fast visual interaction, image segmentation, deformable 3D image reconstruction, and multi-modal registration. A unique feature is that fused data with individual mass axes can be visualized in a shared coordinate system. We demonstrate features of M2aia by reanalyzing an N-glycan mouse kidney dataset and 3D reconstruction and multi-modal image registration of a lipid and peptide dataset of a mouse brain, which we make publicly available. Conclusions To our knowledge, M2aia is the first extensible open-source application that enables a fast, user-friendly, and interactive exploration of large datasets. M2aia is applicable to a wide range of MSI analysis tasks.


Author(s):  
Jessica Galo ◽  
Tim Choi ◽  
Maria Kim-Bautista

IntroductionResearch increasingly involves linking data from multiple sources, including data collected by researchers. This creates complexity because data providers often have differing policies and requirements for data access. Harmonization of processes requires resources, especially as new data providers are added, and needs to be prioritized appropriately. ObjectivesOur objectives were to: 1. understand the challenges encountered by researchers interested in collecting data and/or linking multiple data sets; and 2. outline and evaluate Population Data (PopData) BC’s efforts into harmonizing documentation and processes to address these challenges. With this information, we aim to better support research and streamline the data access request process. ApproachWe compared data access timelines of projects that did and did not utilize harmonized templates, including consent forms, data access request forms, and research agreements. We then identified the challenges arising from non-harmonized requirements including their number and complexity, and developed priorities for action. ResultsWhile existing consent form templates provided the ethics board-required language to support the collection of researcher-collected data, they lacked the text requirements of the administrative data stewards/providers. These text deficiencies slow down the data access request process, affect data provider workflow, and can be associated with researcher costs to re-consent. To address these gaps, harmonized consent templates were developed and finalized in November 2017. These templates included the data steward text requirements on governance, data sets, data transfer, data storage, and withdrawal. Non-harmonized data access request forms and research agreements varied in format and detail and resulted in coordination challenges and delays. A harmonized form was developed to capture key information required by all stakeholders. Research agreement harmonization discussions are underway. Impact evaluation is ongoing. Conclusion/ImplicationsThe complexity multi-stakeholder dataset research need not extend to the data access process. Coordinated requirements and harmonized documentation reduce the burden on all stakeholders including researchers, ethics boards, and data stewards and improve the project timelines.


Author(s):  
Zhengchun Liu ◽  
Rajkumar Kettimuthu ◽  
Joaquin Chung ◽  
Rachana Ananthakrishnan ◽  
Michael Link ◽  
...  

Modern science and engineering computing environments often feature storage systems of different types, from parallel file systems in high-performance computing centers to object stores operated by cloud providers. To enable easy, reliable, secure, and performant data exchange among these different systems, we propose Connector, a plug-able data access architecture for diverse, distributed storage. By abstracting low-level storage system details, this abstraction permits a managed data transfer service (Globus, in our case) to interact with a large and easily extended set of storage systems. Equally important, it supports third-party transfers: that is, direct data transfers from source to destination that are initiated by a third-party client but do not engage that third party in the data path. The abstraction also enables management of transfers for performance optimization, error handling, and end-to-end integrity. We present the Connector design, describe implementations for different storage services, evaluate tradeoffs inherent in managed vs. direct transfers, motivate recommended deployment options, and propose a model-based method that allows for easy characterization of performance in different contexts without exhaustive benchmarking.


Fault Tolerant Reliable Protocol (FTRP) is proposed as a novel routing protocol designed for Wireless Sensor Networks (WSNs). FTRP offers fault tolerance reliability for packet exchange and support for dynamic network changes. The key concept used is the use of node logical clustering. The protocol delegates the routing ownership to the cluster heads where fault tolerance functionality is implemented. FTRP utilizes cluster head nodes along with cluster head groups to store packets in transient. In addition, FTRP utilizes broadcast, which reduces the message overhead as compared to classical flooding mechanisms. FTRP manipulates Time to Live values for the various routing messages to control message broadcast. FTRP utilizes jitter in messages transmission to reduce the effect of synchronized node states, which in turn reduces collisions. FTRP performance has been extensively through simulations against Ad-hoc On-demand Distance Vector (AODV) and Optimized Link State (OLSR) routing protocols. Packet Delivery Ratio (PDR), Aggregate Throughput and End-to-End delay (E-2-E) had been used as performance metrics. In terms of PDR and aggregate throughput, it is found that FTRP is an excellent performer in all mobility scenarios whether the network is sparse or dense. In stationary scenarios, FTRP performed well in sparse network; however, in dense network FTRP’s performance had degraded yet in an acceptable range. This degradation is attributed to synchronized nodes states. Reliably delivering a message comes to a cost, as in terms of E-2-E. results show that FTRP is considered a good performer in all mobility scenarios where the network is sparse. In sparse stationary scenario, FTRP is considered good performer, however in dense stationary scenarios FTRP’s E-2-E is not acceptable. There are times when receiving a network message is more important than other costs such as energy or delay. That makes FTRP suitable for wide range of WSNs applications, such as military applications by monitoring soldiers’ biological data and supplies while in battlefield and battle damage assessment. FTRP can also be used in health applications in addition to wide range of geo-fencing, environmental monitoring, resource monitoring, production lines monitoring, agriculture and animals tracking. FTRP should be avoided in dense stationary deployments such as, but not limited to, scenarios where high application response is critical and life endangering such as biohazards detection or within intensive care units.


Sign in / Sign up

Export Citation Format

Share Document