Toward more user-centric data access solutions: Producing synthetic data of high analytical value by data synthesis1

Kenza Sallier

doi:10.3233/sji-200682

Toward more user-centric data access solutions: Producing synthetic data of high analytical value by data synthesis1

Statistical Journal of the IAOS ◽

10.3233/sji-200682 ◽

2020 ◽

Vol 36 (4) ◽

pp. 1059-1066

Author(s):

Kenza Sallier

Keyword(s):

Synthetic Data ◽

Data Access ◽

R Package ◽

Core Values ◽

Cancer Information ◽

Proof Of Concept ◽

Data Synthesis ◽

Use Of Data ◽

User Centric ◽

Statistics Canada

Under the Modernization programme Statistics Canada has recently undertaken, the Agency is to put forward data access solutions that present greater analytical value to Canadians while maintaining its core values of protecting confidentiality of respondents’ information. One avenue currently explored is Data Synthesis as a means of delivering synthetic data with high analytical value to users. At the time of writing, Statistics Canada has publicly released synthetic versions of two different datasets related to census, mortality and cancer information. In both cases, synthetic data were generated using the R package synthpop. This paper describes the use of Data Synthesis as a proof of concept for modernizing Statistics Canada’s data access solutions.

Download Full-text

A Lightweight Blockchain-Based IoT Identity Management Approach

Future Internet ◽

10.3390/fi13020024 ◽

2021 ◽

Vol 13 (2) ◽

pp. 24

Author(s):

Mohammed Amine Bouras ◽

Qinghua Lu ◽

Sahraoui Dhelim ◽

Huansheng Ning

Keyword(s):

Identity Management ◽

Single Point ◽

Data Access ◽

Emerging Technology ◽

Management Approach ◽

Proof Of Concept ◽

Data Access Control ◽

And Storage ◽

Privacy Issues ◽

Centralized System

Identity management is a fundamental feature of Internet of Things (IoT) ecosystem, particularly for IoT data access control. However, most of the actual works adopt centralized approaches, which could lead to a single point of failure and privacy issues that are tied to the use of a trusted third parties. A consortium blockchain is an emerging technology that provides a neutral and trustable computation and storage platform that is suitable for building identity management solutions for IoT. This paper proposes a lightweight architecture and the associated protocols for consortium blockchain-based identity management to address privacy, security, and scalability issues in a centralized system for IoT. Besides, we implement a proof-of-concept prototype and evaluate our approach. We evaluate our work by measuring the latency and throughput of the transactions while using different query actions and payload sizes, and we compared it to other similar works. The results show that the approach is suitable for business adoption.

Download Full-text

Synthetic Business Microdata

Journal of Privacy and Confidentiality ◽

10.29012/jpc.733 ◽

2020 ◽

Vol 10 (2) ◽

Author(s):

Chien-Hung Chien ◽

Alan Hepburn Welsh ◽

John D Moore

Keyword(s):

Synthetic Data ◽

Data Access ◽

Australian Bureau ◽

Trade Off ◽

Information Reduction ◽

Input And Output ◽

Business Data ◽

Business Survey ◽

Australian Bureau Of Statistics ◽

Inform Decision Making

Enhancing microdata access is one of the strategic priorities for the Australian Bureau of Statistics (ABS) in its transformation program. However, balancing the trade-off between enhancing data access and protecting confidentiality is a delicate act. The ABS could use synthetic data to make its business microdata more accessible for researchers to inform decision making while maintaining confidentiality. This study explores the synthetic data approach for the release and analysis of business data. Australian businesses in some industries are characterised by oligopoly or duopoly. This means the existing microdata protection techniques such as information reduction or perturbation may not be as effective as for household microdata. The research focuses on addressing the following questions: Can a synthetic data approach enhance microdata access for the longitudinal business data? What is the utility and protection trade-off using the synthetic data approach? The study compares confidentialised input and output approaches for protecting confidentiality and analysing Australian microdata from business survey or administrative data sources.

Download Full-text

Signal leakage in f-x deconvolution algorithms

Geophysics ◽

10.1190/geo2017-0007.1 ◽

2017 ◽

Vol 82 (5) ◽

pp. W31-W45 ◽

Cited By ~ 11

Author(s):

Necati Gülünay

Keyword(s):

Field Data ◽

New Technologies ◽

Filter Design ◽

Synthetic Data ◽

Data Sets ◽

Amplitude Variation ◽

Amplitude Variation With Offset ◽

Backward Design ◽

Use Of Data ◽

Domain Prediction

The old technology [Formula: see text]-[Formula: see text] deconvolution stands for [Formula: see text]-[Formula: see text] domain prediction filtering. Early versions of it are known to create signal leakage during their application. There have been recent papers in geophysical publications comparing [Formula: see text]-[Formula: see text] deconvolution results with the new technologies being proposed. These comparisons will be most effective if the best existing [Formula: see text]-[Formula: see text] deconvolution algorithms are used. This paper describes common [Formula: see text]-[Formula: see text] deconvolution algorithms and studies signal leakage occurring during their application on simple models, which will hopefully provide a benchmark for the readers in choosing [Formula: see text]-[Formula: see text] algorithms for comparison. The [Formula: see text]-[Formula: see text] deconvolution algorithms can be classified by their use of data which lead to transient or transient-free matrices and hence windowed or nonwindowed autocorrelations, respectively. They can also be classified by the direction they are predicting: forward design and apply; forward design and apply followed by backward design and apply; forward design and apply followed by application of a conjugated forward filter in the backward direction; and simultaneously forward and backward design and apply, which is known as noncausal filter design. All of the algorithm types mentioned above are tested, and the results of their analysis are provided in this paper on noise free and noisy synthetic data sets: a single dipping event, a single dipping event with a simple amplitude variation with offset, and three dipping events. Finally, the results of applying the selected algorithms on field data are provided.

Download Full-text

United States and United Kingdom Sign the First Bilateral Agreement Pursuant to the CLOUD Act, Facilitating Cross-Border Access to Data

The American Journal of International Law ◽

10.1017/ajil.2019.80 ◽

2020 ◽

Vol 114 (1) ◽

pp. 124-128

Keyword(s):

United States ◽

United Kingdom ◽

Majority Vote ◽

Attorney General ◽

Data Access ◽

The United States ◽

Use Of Data ◽

Bilateral Agreement ◽

Efficient Data ◽

Access To Data

On October 3, 2019, the United States and the United Kingdom reached a bilateral agreement to facilitate more efficient data access between the two countries for law enforcement purposes. The Agreement on Access to Electronic Data for the Purpose of Countering Serious Crime (U.S.-UK Data Access Agreement) was signed by U.S. Attorney General William Barr and UK Home Secretary Priti Patel. This is the first such agreement made by the United States after the passage of the 2018 Clarifying Lawful Overseas Use of Data (CLOUD) Act, which authorizes and structures future bilateral agreements on data sharing. Pursuant to the CLOUD Act, Congress has 180 days following receipt of a notification regarding the U.S.-UK Data Access Agreement to block its entry into force via a joint resolution, which would require a majority vote in both houses of Congress and either presidential signature or a subsequent congressional override of a presidential veto.

Download Full-text

The CDGP Repository for Geothermal Data

10.5194/egusphere-egu2020-7534 ◽

2020 ◽

Author(s):

Mathieu Turlure ◽

Marc Schaming ◽

Alice Fremand ◽

Marc Grunberg ◽

Jean Schmittbuhl

Keyword(s):

Data Access ◽

Pilot Project ◽

Data Services ◽

Shared Data ◽

Use Of Data ◽

Data Formats ◽

Distribution Rules ◽

Deep Geothermal Energy ◽

Project Data ◽

Geothermal Plant

The CDGP Repository for Geothermal DataThe Data Center for Deep Geothermal Energy (CDGP &#8211; Centre de Donn&#233;es de G&#233;othermie Profonde, https://cdgp.u-strasbg.fr) was launched in 2016 by the LabEx G-EAU-THERMIE PROFONDE (http://labex-geothermie.unistra.fr) to preserve, archive and distribute data acquired on geothermal sites in Alsace. Since the beginning of the project, specific procedures are followed to respect international requirements for data management. In particular, FAIR recommendations are used to distribute Findable, Accessible, Interoperable and Reusable data.Data currently available on the CDGP mainly consist of seismological and hydraulic data acquired at the Soultz-sous-For&#234;ts geothermal plant pilot project. Data on the website are gathered in episodes. Episodes 1994, 1995, 1996, and 2010 from Soultz-sous-For&#234;ts have been recently added to the episodes already available on the CDGP (1988, 1991, 1993, 2000, 2003, 2004 and 2005). All data are described with metadata and interoperability is promoted with use of open or community-shared data formats: SEED, csv, pdf, etc. Episodes have DOIs.To secure Intellectual Property Rights (IPR) set by data providers that partly come from Industry, an Authentication, Authorization and Accounting Infrastructure (AAAI) grants data access depending to distribution rules and user&#8217;s affiliation (i.e. academic, industrial, &#8230;).The CDGP is also a local node for the European Plate Observing System (EPOS) Anthropogenic Hazards platform (https://tcs.ah-epos.eu). The platform provides an environment and facilities (data, services, software) for research onto anthropogenic hazards, especially related to the exploration and exploitation of geo-resources. Some episodes from Soultz-sous-For&#234;ts are already available and the missing-ones will be soon on the platform.The next step for the CDGP is first to complete data from Soultz-sous-For&#234;ts. Some data are still missing and must be recovered from the industrial partners. Then, data from the other geothermal sites in Alsace (Rittershoffen, Illkirch, Vendenheim) need to be collected in order to be distributed. Finally, with other French data centers, we are on track to apply the CoreTrustSeal certification (ANR Cedre).The preservation of data can be very challenging and time-consuming. We had to deal with obsolete tapes and formats, even incomplete data. Old data are frequently not well documented and the identification of owner is sometimes difficult. However, the hard work to retrieve, collect old geothermal data and make them FAIR is necessary for new analysis and the valorization of these patrimonial data. The re-use of data (e.g. Cauchie et al, 2020) demonstrates the importance of the CDGP.

Download Full-text

Harmonization of immunoassays to the all-procedure trimmed mean – proof of concept by use of data from the insulin standardization project

Clinical Chemistry and Laboratory Medicine (CCLM) ◽

10.1515/cclm-2012-0661 ◽

2013 ◽

Vol 51 (5) ◽

Cited By ~ 4

Author(s):

Sofie K. Van Houcke ◽

Stefan Van Aelst ◽

Katleen Van Uytfanghe ◽

Linda M. Thienpont

Keyword(s):

Proof Of Concept ◽

Trimmed Mean ◽

Use Of Data

Download Full-text

PENTINGNYA PEMANFAATAN DATA KEPENDUDUKAN DI ERA DIGITAL

TEKNIMEDIA: Teknologi Informasi dan Multimedia ◽

10.46764/teknimedia.v1i1.9 ◽

2020 ◽

Vol 1 (1) ◽

pp. 18-21

Author(s):

Sri Handriana Dewi Hastuti

Keyword(s):

Population Data ◽

Data Access ◽

Cooperation Agreement ◽

Civil Registration ◽

Use Of Data ◽

Government Work ◽

User Access ◽

Access Rights ◽

Identity Cards ◽

Public Service Agencies

Government Work Units to use a one data policy approach. The purpose of using this data includes the use of data for schools, taking care of data licensing, managing social assistance, all of which must be the same as the data sources in the Population and Civil Registration Office so that no more people have different identities. Based on the Minister of Home Affairs Regulation No. 61 of 2015 concerning Requirements, Scope and Procedures for Granting Access Rights and Utilization of Population Identification Number, Population Data, and Electronic Resident Identity Cards, data utilization access permits are granted by the Regent / Mayor. After submitting the permit to the Regent / Mayor, then the Cooperation Agreement (PKS) is signed. Furthermore, the Regional Apparatus Organization or public service agencies form a Technical Team implementing the cooperation. Furthermore, data access will be given according to their needs and usage. The user access institution will be monitored by the Regent / Mayor through the Department of population and civil registration, and periodic control, supervision and evaluation will be conducted.

Download Full-text

Regulating Statistics in the Age of Data Abundance

International Journal for Population Data Science ◽

10.23889/ijpds.v4i3.1303 ◽

2019 ◽

Vol 4 (3) ◽

Author(s):

Catherine Bromley

Keyword(s):

Data Sharing ◽

Data Linkage ◽

Resource Constraints ◽

Data Access ◽

Data Governance ◽

Code Of Practice ◽

Regulatory Guidance ◽

Use Of Data ◽

Regulatory Approach ◽

Major Review

Background with rationaleThe Office for Statistics Regulation is the UK’s independent regulator of official statistics produced by public sector bodies. The Code of Practice for Statistics sets out our expectations for statistics to be produced in a trustworthy way, be of high quality, and to serve the public good by informing answers to society’s important questions. We now live in a world of increasingly abundant data. Statistics producers need to adapt to this environment, and so do we as regulators. ApproachThe Code of Practice was updated in 2018 with new provisions to maximise the potential use of data for both citizens and organisations, and to make data available for wider reuse with appropriate safeguards. We have supplemented our commitment to these provisions with a review of data sharing and linking in government, new regulatory guidance on data governance, an increased focus on data access challenges (particularly users of English health data), and by putting data at the heart of our regulatory vision (published in summer 2019). These steps build on our existing work around admin data quality. OverviewThe National Statistician’s response to our data sharing and linkage review included many welcome commitments and a major review of data linkage methodology is now underway. A data linkage community is developing across government. However, we have raised concerns about ongoing difficulties with admin data sharing between departments, resource constraints, and the limited extent of public engagement about data sharing and use. ConclusionsOur regulatory approach to data is evolving and we are building new relationships with organisations with an interest in data beyond the statistics world. Our work to support users to access admin data may yet require more direct interventions to bring about the outcomes we desire. We are keen to share our experiences with admin data users.

Download Full-text

Public views and recommendations on the use of linked data for research: preliminary results from a public deliberation engagement

International Journal for Population Data Science ◽

10.23889/ijpds.v3i4.959 ◽

2018 ◽

Vol 3 (4) ◽

Author(s):

Jack Teng ◽

Kim McGrail ◽

Colene Bentley ◽

Michael Burgess ◽

Kieran O'Doherty

Keyword(s):

Linked Data ◽

Minority Groups ◽

Data Access ◽

Public Deliberation ◽

Multiple Perspectives ◽

Public Interests ◽

The Public ◽

Use Of Data ◽

Research Risks ◽

Public Views

IntroductionThe use of linked data for research is increasing, including in complexity of requests. Rules around access to and use of data necessarily trade-off risks related to privacy to achieve social benefits. Including informed and civic-minded public recommendations that consider different perspectives on privacy and benefit will improve related policy. Objectives and ApproachPopulation Data BC is conducting a deliberative public engagement regarding the use of complex linked data for research. Members of the public will be provided with written materials and hear speakers outlining considerations from multiple perspectives in data access and use, including benefits for health research, risks to privacy, and implications for disability and minority groups. Participants in the deliberation will then discuss questions about the use of linked data and ideas around principles for that use in small and large groups, and develop recommendations for data sharing policies. ResultsWe will be sharing our preliminary analysis of the public deliberation results at the conference. The public deliberation encourages the participants to develop policy recommendations that respect diversity of perspectives while negotiating constructive advice. It asks the group to make recommendations and to identify and explore issues on which the group has persistent disagreement. We will discuss insights into how the public values the use of data linkage and under what conditions such use becomes problematic. For example, we are hoping to gain insight about how publics determine if a project is in the public interest, or conversely, how a project may pose unacceptable harm. Conclusion/ImplicationsChanges in available data and increasing ability to link data makes it essential to include public views in systems of data access governance. Understanding the hopes and concerns of the public regarding the use of linked data for research will help develop data access regulations that reflect wide public interests.

Download Full-text

Accessing environmental time series data in R from Sensor Observation Services with ease

10.5194/egusphere-egu2020-19453 ◽

2020 ◽

Author(s):

Daniel Nüst ◽

Eike H. Jürrens ◽

Benedikt Gräler ◽

Simon Jirka

Keyword(s):

Time Series ◽

Time Series Data ◽

Data Access ◽

R Package ◽

Series Data ◽

Observation Data ◽

Software Environment ◽

Technical Standards ◽

Seamless Integration ◽

Environmental Time Series

Time series data of in-situ measurements is the key to many environmental studies. The first challenge in any analysis typically arises when the data needs to be imported into the analysis framework. Standardisation is one way to lower this burden. Unfortunately, relevant interoperability standards might be challenging for non-IT experts as long as they are not dealt with behind the scenes of a client application. One standard to provide access to environmental time series data is the Sensor Observation Service (SOS, ) specification published by the Open Geospatial Consortium (OGC). SOS instances are currently used in a broad range of applications such as hydrology, air quality monitoring, and ocean sciences. Data sets provided via an SOS interface can be found around the globe from Europe to New Zealand.The R package sos4R (N&#252;st et al., 2011) is an extension package for the R environment for statistical computing and visualization (), which has been demonstrated a a powerful tools for conducting and communicating geospatial research (cf. Pebesma et al., 2012; ). sos4R comprises a client that can connect to an SOS server. The user can use it to query data from SOS instances using simple R function calls. It provides a convenience layer for R users to integrate observation data from data access servers compliant with the SOS standard without any knowledge about the underlying technical standards. To further improve the usability for non-SOS experts, a recent update to sos4R includes a set of wrapper functions, which remove complexity and technical language specific to OGC specifications. This update also features specific consideration of the OGC SOS 2.0 Hydrology Profile and thereby opens up a new scientific domain.In our presentation we illustrate use cases and examples building upon sos4R easing the access of time series data in an R and Shiny () context. We demonstrate how the abstraction provided in the client library makes sensor observation data for accessible and further show how sos4R allows the seamless integration of distributed observations data, i.e., across organisational boundaries, into transparent and reproducible data analysis workflows.ReferencesN&#252;st D., Stasch C., Pebesma E. (2011) Connecting R to the Sensor Web. In: Geertman S., Reinhardt W., Toppen F. (eds) Advancing Geoinformation Science for a Changing World. Lecture Notes in Geoinformation and Cartography, Springer. Pebesma, E., N&#252;st, D., & Bivand, R. (2012). The R software environment in reproducible geoscientific research. Eos, Transactions American Geophysical Union, 93(16), 163&#8211;163.

Download Full-text