scholarly journals Dynamic Data Citation Service—Subset Tool for Operational Data Management

Data ◽  
2019 ◽  
Vol 4 (3) ◽  
pp. 115 ◽  
Author(s):  
Schubert ◽  
Seyerl ◽  
Sack

In earth observation and climatological sciences, data and their data services grow on a daily basis in a large spatial extent due to the high coverage rate of satellite sensors, model calculations, but also by continuous meteorological in situ observations. In order to reuse such data, especially data fragments as well as their data services in a collaborative and reproducible manner by citing the origin source, data analysts, e.g., researchers or impact modelers, need a possibility to identify the exact version, precise time information, parameter, and names of the dataset used. A manual process would make the citation of data fragments as a subset of an entire dataset rather complex and imprecise to obtain. Data in climate research are in most cases multidimensional, structured grid data that can change partially over time. The citation of such evolving content requires the approach of “dynamic data citation”. The applied approach is based on associating queries with persistent identifiers. These queries contain the subsetting parameters, e.g., the spatial coordinates of the desired study area or the time frame with a start and end date, which are automatically included in the metadata of the newly generated subset and thus represent the information about the data history, the data provenance, which has to be established in data repository ecosystems. The Research Data Alliance Data Citation Working Group (RDA Data Citation WG) summarized the scientific status quo as well as the state of the art from existing citation and data management concepts and developed the scalable dynamic data citation methodology of evolving data. The Data Centre at the Climate Change Centre Austria (CCCA) has implemented the given recommendations and offers since 2017 an operational service on dynamic data citation on climate scenario data. With the consciousness that the objective of this topic brings a lot of dependencies on bibliographic citation research which is still under discussion, the CCCA service on Dynamic Data Citation focused on the climate domain specific issues, like characteristics of data, formats, software environment, and usage behavior. The current effort beyond spreading made experiences will be the scalability of the implementation, e.g., towards the potential of an Open Data Cube solution.

GigaScience ◽  
2020 ◽  
Vol 9 (10) ◽  
Author(s):  
Daniel Arend ◽  
Patrick König ◽  
Astrid Junker ◽  
Uwe Scholz ◽  
Matthias Lange

Abstract Background The FAIR data principle as a commitment to support long-term research data management is widely accepted in the scientific community. Although the ELIXIR Core Data Resources and other established infrastructures provide comprehensive and long-term stable services and platforms for FAIR data management, a large quantity of research data is still hidden or at risk of getting lost. Currently, high-throughput plant genomics and phenomics technologies are producing research data in abundance, the storage of which is not covered by established core databases. This concerns the data volume, e.g., time series of images or high-resolution hyper-spectral data; the quality of data formatting and annotation, e.g., with regard to structure and annotation specifications of core databases; uncovered data domains; or organizational constraints prohibiting primary data storage outside institional boundaries. Results To share these potentially dark data in a FAIR way and master these challenges the ELIXIR Germany/de.NBI service Plant Genomic and Phenomics Research Data Repository (PGP) implements a “bring the infrastructure to the data” approach, which allows research data to be kept in place and wrapped in a FAIR-aware software infrastructure. This article presents new features of the e!DAL infrastructure software and the PGP repository as a best practice on how to easily set up FAIR-compliant and intuitive research data services. Furthermore, the integration of the ELIXIR Authentication and Authorization Infrastructure (AAI) and data discovery services are introduced as means to lower technical barriers and to increase the visibility of research data. Conclusion The e!DAL software matured to a powerful and FAIR-compliant infrastructure, while keeping the focus on flexible setup and integration into existing infrastructures and into the daily research process.


Author(s):  
A. V. Vo ◽  
D. F. Laefer ◽  
M. Trifkovic ◽  
C. N. L. Hewage ◽  
M. Bertolotto ◽  
...  

Abstract. The massive amounts of spatio-temporal information often present in LiDAR data sets make their storage, processing, and visualisation computationally demanding. There is an increasing need for systems and tools that support all the spatial and temporal components and the three-dimensional nature of these datasets for effortless retrieval and visualisation. In response to these needs, this paper presents a scalable, distributed database system that is designed explicitly for retrieving and viewing large LiDAR datasets on the web. The ultimate goal of the system is to provide rapid and convenient access to a large repository of LiDAR data hosted in a distributed computing platform. The system is composed of multiple, share-nothing nodes operating in parallel. Namely, each node is autonomous and has a dedicated set of processors and memory. The nodes communicate with each other via an interconnected network. The data management system presented in this paper is implemented based on Apache HBase, a distributed key-value datastore within the Hadoop eco-system. HBase is extended with new data encoding and indexing mechanisms to accommodate both the point cloud and the full waveform components of LiDAR data. The data can be consumed by any desktop or web application that communicates with the data repository using the HTTP protocol. The communication is enabled by a web servlet. In addition to the command line tool used for administration tasks, two web applications are presented to illustrate the types of user-facing applications that can be coupled with the data system.


2020 ◽  
Vol 12 (1) ◽  
pp. 611-628 ◽  
Author(s):  
Michel M. Verstraete ◽  
Linda A. Hunt ◽  
Hugo De Lemos ◽  
Larry Di Girolamo

Abstract. The Multi-angle Imaging SpectroRadiometer (MISR) is one of the five instruments hosted on board the NASA Terra platform, launched on 18 December 1999. This instrument has been operational since 24 February 2000 and is still acquiring Earth observation data as of this writing. The primary mission of the MISR is to document the state and properties of the atmosphere, in particular the clouds and aerosols it contains, as well as the planetary surface, on the basis of 36 data channels collectively gathered by its nine cameras (pointing in different directions along the orbital track) in four spectral bands (blue, green, red and near-infrared). The radiometric camera-by-camera cloud mask (RCCM) is derived from the calibrated measurements at the nominal top of the atmosphere and is provided separately for each of the nine cameras. This RCCM data product is permanently archived at the NASA Atmospheric Science Data Center (ASDC) in Hampton, VA, USA, and is openly accessible (Diner et al., 1999b, and https://doi.org/10.5067/Terra/MISR/MIRCCM_L2.004). For various technical reasons described in this paper, this RCCM product exhibits missing data, even though an estimate of the clear or cloudy status of the environment at each individual observed location can be deduced from the available measurements. The aims of this paper are (1) to describe how to replace over 99 % of the missing values by estimates and (2) to briefly describe the software to replace missing RCCM values, which is openly available to the community from the GitHub website, https://github.com/mmverstraete/MISR\\ RCCM/ (last access: 12 March 2020), or https://doi.org/10.5281/ZENODO.3240017 (Verstraete, 2019e). Two additional sets of resources are also made available on the research data repository of GFZ Data Services in conjunction with this paper. The first set (A; Verstraete et al., 2020; https://doi.org/10.5880/fidgeo.2020.004) includes three items: (A1) a compressed archive, RCCM_Out.zip, containing all intermediary, final and ancillary outputs created while generating the figures of this paper; (A2) a user manual, RCCM_Out.pdf, describing how to install, uncompress and explore those files; and (A3) a separate input MISR data archive, RCCM_input_68050.zip, for Path 168, Orbit 68050. This latter archive is usable with (B), the second set (Verstraete and Vogt, 2020; https://doi.org/10.5880/fidgeo.2020.008), which includes (B1), a stand-alone, self-contained, executable version of the RCCM correction codes, RCCM_Soft_Win.zip, using the IDL Virtual Machine technology that does not require a paid IDL license, as well as (B2), a user manual, RCCM_Soft_Win.pdf, to explain how to install, uncompress and use this software.


2018 ◽  
Author(s):  
Kevin R Glover ◽  
Rachel R Vitoux ◽  
Catherine Schuster ◽  
Christopher R Curtin

BACKGROUND The variety of alarms from all types of medical devices has increased from 6 to 40 in the last three decades, with today’s most critically ill patients experiencing as many as 45 alarms per hour. Alarm fatigue has been identified as a critical safety issue for clinical staff that can lead to potentially dangerous delays or nonresponse to actionable alarms, resulting in serious patient injury and death. To date, most research on medical device alarms has focused on the nonactionable alarms of physiological monitoring devices. While there have been some reports in the literature related to drug library alerts during the infusion pump programing sequence, research related to the types and frequencies of actionable infusion pump alarms remains largely unexplored. OBJECTIVE The objectives of this study protocol are to establish baseline data related to the types and frequency of infusion pump alarms from the B. Braun Outlook 400ES Safety Infusion System with the accompanying DoseTrac Infusion Management Software. METHODS The most recent consecutive 60-day period of backup hospital data received between April 2014 and February 2017 from 32 United States-based hospitals will be selected for analysis. Microsoft SQL Server (2012 - 11.0.5343.0 X64) will be used to manage the data with unique code written to sort data and perform descriptive analyses. A validated data management methodology will be utilized to clean and analyze the data. Data management procedures will include blinding, cleaning, and review of existing infusion data within the DoseTrac Infusion Management Software databases at each hospital. Patient-identifying data will be removed prior to merging into a dedicated and secure data repository. This pooled data will then be analyzed. RESULTS This exploratory study will analyze the aggregate alarm data for each hospital by care area, drug infused, time of day, and day of week, including: overall infusion pump alarm frequency (number of alarms per active infusion), duration of alarms (average, range, median), and type and frequency of alarms distributed by care area. CONCLUSIONS Infusion pump alarm data collected and analyzed in this study will be used to help establish a baseline of infusion pump alarm types and relative frequencies. Understanding the incidences and characteristics of infusion pump alarms will result in more informed quality improvement recommendations to decrease and/or modify infusion pump alarms, and potentially reduce clinical staff alarm fatigue and improve patient safety.  REGISTERED REPORT IDENTIFIER RR1-10.2196/10446


2012 ◽  
Vol 7 (1) ◽  
pp. 107-113 ◽  
Author(s):  
Sarah Callaghan ◽  
Steve Donegan ◽  
Sam Pepler ◽  
Mark Thorley ◽  
Nathan Cunningham ◽  
...  

The NERC Science Information Strategy Data Citation and Publication project aims to develop and formalise a method for formally citing and publishing the datasets stored in its environmental data centres. It is believed that this will act as an incentive for scientists, who often invest a great deal of effort in creating datasets, to submit their data to a suitable data repository where it can properly be archived and curated. Data citation and publication will also provide a mechanism for data producers to receive credit for their work, thereby encouraging them to share their data more freely.


Author(s):  
Denise D. Krause

Background: There are a variety of challenges to health workforce planning, but access to data is critical for effective evidence-based decision-making. Many agencies and organizations throughout Mississippi have been collecting quality health data for many years. Those data have historically resided in data silos and have not been readily shared. A strategy was developed to build and coordinate infrastructure, capacity, tools, and resources to facilitate health workforce and population health planning throughout the state.Objective: Realizing data as the foundation upon which to build, the primary objective was to develop the capacity to collect, store, maintain, visualize, and analyze data from a variety of disparate sources -- with the ultimate goal of improving access to health care.Specific aims were to:1)  build a centralized data repository and scalable informatics platform,2)  develop a data management solution for this platform and then,3)  derive value from this platform by facilitating data visualization and analysis.Methods: We designed and constructed a managed data lake for health data from disparate sources throughout the state of Mississippi. A data management application was developed to log and track all data sources, maps and geographies, and data marts.  With this informatics platform as a foundation, we use a variety of tools to visualize and analyze data.Results: Samples of data visualizations that aim to inform health planners and policymakers are presented. Many agencies and organizations throughout the state benefit from this platform.Conclusion: The overarching goal is that by providing timely, reliable information to stakeholders, Mississippians in general will experience improved access to quality care. 


Author(s):  
Adolphe Ayissi Eteme ◽  
Justin Moskolai Ngossaha

The use of information technology in council management has resulted in the generation of a large amount of data through various autonomous urban bodies. The relevant bodies barely or never reuse such locally-generated data. This may be due particularly to managers', policy makers' and users' lack of awareness of existing information. The Platform for the Integration and Interoperability of the Yaounde Urban Information Systems (YUSIIP) project seeks to reduce this deficit by establishing a federated operational platform of heterogeneous and distributed data systems based on a distributed data repository. The position developed in this paper is that Master Data Management (MDM) will contribute to achieving this objective in a context marked by the dispersion and duplication of data and diversity of information systems.


Author(s):  
Arun Jagatheesan ◽  
Reagan Moore ◽  
Norman W. Paton ◽  
Paul Watson

Sign in / Sign up

Export Citation Format

Share Document