Toward more user-centric data access solutions: Producing synthetic data of high analytical value by data synthesis1

2020 ◽  
Vol 36 (4) ◽  
pp. 1059-1066
Author(s):  
Kenza Sallier

Under the Modernization programme Statistics Canada has recently undertaken, the Agency is to put forward data access solutions that present greater analytical value to Canadians while maintaining its core values of protecting confidentiality of respondents’ information. One avenue currently explored is Data Synthesis as a means of delivering synthetic data with high analytical value to users. At the time of writing, Statistics Canada has publicly released synthetic versions of two different datasets related to census, mortality and cancer information. In both cases, synthetic data were generated using the R package synthpop. This paper describes the use of Data Synthesis as a proof of concept for modernizing Statistics Canada’s data access solutions.

2021 ◽  
Vol 13 (2) ◽  
pp. 24
Author(s):  
Mohammed Amine Bouras ◽  
Qinghua Lu ◽  
Sahraoui Dhelim ◽  
Huansheng Ning

Identity management is a fundamental feature of Internet of Things (IoT) ecosystem, particularly for IoT data access control. However, most of the actual works adopt centralized approaches, which could lead to a single point of failure and privacy issues that are tied to the use of a trusted third parties. A consortium blockchain is an emerging technology that provides a neutral and trustable computation and storage platform that is suitable for building identity management solutions for IoT. This paper proposes a lightweight architecture and the associated protocols for consortium blockchain-based identity management to address privacy, security, and scalability issues in a centralized system for IoT. Besides, we implement a proof-of-concept prototype and evaluate our approach. We evaluate our work by measuring the latency and throughput of the transactions while using different query actions and payload sizes, and we compared it to other similar works. The results show that the approach is suitable for business adoption.


2020 ◽  
Vol 10 (2) ◽  
Author(s):  
Chien-Hung Chien ◽  
Alan Hepburn Welsh ◽  
John D Moore

Enhancing microdata access is one of the strategic priorities for the Australian Bureau of Statistics (ABS) in its transformation program. However, balancing the trade-off between enhancing data access and protecting confidentiality is a delicate act. The ABS could use synthetic data to make its business microdata more accessible for researchers to inform decision making while maintaining confidentiality. This study explores the synthetic data approach for the release and analysis of business data. Australian businesses in some industries are characterised by oligopoly or duopoly. This means the existing microdata protection techniques such as information reduction or perturbation may not be as effective as for household microdata. The research focuses on addressing the following questions: Can a synthetic data approach enhance microdata access for the longitudinal business data? What is the utility and protection trade-off using the synthetic data approach? The study compares confidentialised input and output approaches for protecting confidentiality and analysing Australian microdata from business survey or administrative data sources.


Geophysics ◽  
2017 ◽  
Vol 82 (5) ◽  
pp. W31-W45 ◽  
Author(s):  
Necati Gülünay

The old technology [Formula: see text]-[Formula: see text] deconvolution stands for [Formula: see text]-[Formula: see text] domain prediction filtering. Early versions of it are known to create signal leakage during their application. There have been recent papers in geophysical publications comparing [Formula: see text]-[Formula: see text] deconvolution results with the new technologies being proposed. These comparisons will be most effective if the best existing [Formula: see text]-[Formula: see text] deconvolution algorithms are used. This paper describes common [Formula: see text]-[Formula: see text] deconvolution algorithms and studies signal leakage occurring during their application on simple models, which will hopefully provide a benchmark for the readers in choosing [Formula: see text]-[Formula: see text] algorithms for comparison. The [Formula: see text]-[Formula: see text] deconvolution algorithms can be classified by their use of data which lead to transient or transient-free matrices and hence windowed or nonwindowed autocorrelations, respectively. They can also be classified by the direction they are predicting: forward design and apply; forward design and apply followed by backward design and apply; forward design and apply followed by application of a conjugated forward filter in the backward direction; and simultaneously forward and backward design and apply, which is known as noncausal filter design. All of the algorithm types mentioned above are tested, and the results of their analysis are provided in this paper on noise free and noisy synthetic data sets: a single dipping event, a single dipping event with a simple amplitude variation with offset, and three dipping events. Finally, the results of applying the selected algorithms on field data are provided.


2020 ◽  
Vol 114 (1) ◽  
pp. 124-128

On October 3, 2019, the United States and the United Kingdom reached a bilateral agreement to facilitate more efficient data access between the two countries for law enforcement purposes. The Agreement on Access to Electronic Data for the Purpose of Countering Serious Crime (U.S.-UK Data Access Agreement) was signed by U.S. Attorney General William Barr and UK Home Secretary Priti Patel. This is the first such agreement made by the United States after the passage of the 2018 Clarifying Lawful Overseas Use of Data (CLOUD) Act, which authorizes and structures future bilateral agreements on data sharing. Pursuant to the CLOUD Act, Congress has 180 days following receipt of a notification regarding the U.S.-UK Data Access Agreement to block its entry into force via a joint resolution, which would require a majority vote in both houses of Congress and either presidential signature or a subsequent congressional override of a presidential veto.


2020 ◽  
Author(s):  
Mathieu Turlure ◽  
Marc Schaming ◽  
Alice Fremand ◽  
Marc Grunberg ◽  
Jean Schmittbuhl

<p><strong>The CDGP Repository for Geothermal Data</strong></p><p>The Data Center for Deep Geothermal Energy (CDGP – Centre de Données de Géothermie Profonde, https://cdgp.u-strasbg.fr) was launched in 2016 by the LabEx G-EAU-THERMIE PROFONDE (http://labex-geothermie.unistra.fr) to preserve, archive and distribute data acquired on geothermal sites in Alsace. Since the beginning of the project, specific procedures are followed to respect international requirements for data management. In particular, FAIR recommendations are used to distribute Findable, Accessible, Interoperable and Reusable data.</p><p>Data currently available on the CDGP mainly consist of seismological and hydraulic data acquired at the Soultz-sous-Forêts geothermal plant pilot project. Data on the website are gathered in episodes. Episodes 1994, 1995, 1996, and 2010 from Soultz-sous-Forêts have been recently added to the episodes already available on the CDGP (1988, 1991, 1993, 2000, 2003, 2004 and 2005). All data are described with metadata and interoperability is promoted with use of open or community-shared data formats: SEED, csv, pdf, etc. Episodes have DOIs.</p><p>To secure Intellectual Property Rights (IPR) set by data providers that partly come from Industry, an Authentication, Authorization and Accounting Infrastructure (AAAI) grants data access depending to distribution rules and user’s affiliation (i.e. academic, industrial, …).</p><p>The CDGP is also a local node for the European Plate Observing System (EPOS) Anthropogenic Hazards platform (https://tcs.ah-epos.eu). The platform provides an environment and facilities (data, services, software) for research onto anthropogenic hazards, especially related to the exploration and exploitation of geo-resources. Some episodes from Soultz-sous-Forêts are already available and the missing-ones will be soon on the platform.</p><p>The next step for the CDGP is first to complete data from Soultz-sous-Forêts. Some data are still missing and must be recovered from the industrial partners. Then, data from the other geothermal sites in Alsace (Rittershoffen, Illkirch, Vendenheim) need to be collected in order to be distributed. Finally, with other French data centers, we are on track to apply the CoreTrustSeal certification (ANR Cedre).</p><p>The preservation of data can be very challenging and time-consuming. We had to deal with obsolete tapes and formats, even incomplete data. Old data are frequently not well documented and the identification of owner is sometimes difficult. However, the hard work to retrieve, collect old geothermal data and make them FAIR is necessary for new analysis and the valorization of these patrimonial data. The re-use of data (e.g. Cauchie et al, 2020) demonstrates the importance of the CDGP.</p>


2020 ◽  
Vol 1 (1) ◽  
pp. 18-21
Author(s):  
Sri Handriana Dewi Hastuti

Government Work Units to use a one data policy approach. The purpose of using this data includes the use of data for schools, taking care of data licensing, managing social assistance, all of which must be the same as the data sources in the Population and Civil Registration Office so that no more people have different identities. Based on the Minister of Home Affairs Regulation No. 61 of 2015 concerning Requirements, Scope and Procedures for Granting Access Rights and Utilization of Population Identification Number, Population Data, and Electronic Resident Identity Cards, data utilization access permits are granted by the Regent / Mayor. After submitting the permit to the Regent / Mayor, then the Cooperation Agreement (PKS) is signed. Furthermore, the Regional Apparatus Organization or public service agencies form a Technical Team implementing the cooperation. Furthermore, data access will be given according to their needs and usage. The user access institution will be monitored by the Regent / Mayor through the Department of population and civil registration, and periodic control, supervision and evaluation will be conducted.


Author(s):  
Catherine Bromley

Background with rationaleThe Office for Statistics Regulation is the UK’s independent regulator of official statistics produced by public sector bodies. The Code of Practice for Statistics sets out our expectations for statistics to be produced in a trustworthy way, be of high quality, and to serve the public good by informing answers to society’s important questions. We now live in a world of increasingly abundant data. Statistics producers need to adapt to this environment, and so do we as regulators. ApproachThe Code of Practice was updated in 2018 with new provisions to maximise the potential use of data for both citizens and organisations, and to make data available for wider reuse with appropriate safeguards. We have supplemented our commitment to these provisions with a review of data sharing and linking in government, new regulatory guidance on data governance, an increased focus on data access challenges (particularly users of English health data), and by putting data at the heart of our regulatory vision (published in summer 2019). These steps build on our existing work around admin data quality. OverviewThe National Statistician’s response to our data sharing and linkage review included many welcome commitments and a major review of data linkage methodology is now underway. A data linkage community is developing across government. However, we have raised concerns about ongoing difficulties with admin data sharing between departments, resource constraints, and the limited extent of public engagement about data sharing and use. ConclusionsOur regulatory approach to data is evolving and we are building new relationships with organisations with an interest in data beyond the statistics world. Our work to support users to access admin data may yet require more direct interventions to bring about the outcomes we desire. We are keen to share our experiences with admin data users.


Author(s):  
Jack Teng ◽  
Kim McGrail ◽  
Colene Bentley ◽  
Michael Burgess ◽  
Kieran O'Doherty

IntroductionThe use of linked data for research is increasing, including in complexity of requests. Rules around access to and use of data necessarily trade-off risks related to privacy to achieve social benefits. Including informed and civic-minded public recommendations that consider different perspectives on privacy and benefit will improve related policy. Objectives and ApproachPopulation Data BC is conducting a deliberative public engagement regarding the use of complex linked data for research. Members of the public will be provided with written materials and hear speakers outlining considerations from multiple perspectives in data access and use, including benefits for health research, risks to privacy, and implications for disability and minority groups. Participants in the deliberation will then discuss questions about the use of linked data and ideas around principles for that use in small and large groups, and develop recommendations for data sharing policies. ResultsWe will be sharing our preliminary analysis of the public deliberation results at the conference. The public deliberation encourages the participants to develop policy recommendations that respect diversity of perspectives while negotiating constructive advice. It asks the group to make recommendations and to identify and explore issues on which the group has persistent disagreement. We will discuss insights into how the public values the use of data linkage and under what conditions such use becomes problematic. For example, we are hoping to gain insight about how publics determine if a project is in the public interest, or conversely, how a project may pose unacceptable harm. Conclusion/ImplicationsChanges in available data and increasing ability to link data makes it essential to include public views in systems of data access governance. Understanding the hopes and concerns of the public regarding the use of linked data for research will help develop data access regulations that reflect wide public interests.


2020 ◽  
Author(s):  
Daniel Nüst ◽  
Eike H. Jürrens ◽  
Benedikt Gräler ◽  
Simon Jirka

<p>Time series data of in-situ measurements is the key to many environmental studies. The first challenge in any analysis typically arises when the data needs to be imported into the analysis framework. Standardisation is one way to lower this burden. Unfortunately, relevant interoperability standards might be challenging for non-IT experts as long as they are not dealt with behind the scenes of a client application. One standard to provide access to environmental time series data is the Sensor Observation Service (SOS, ) specification published by the Open Geospatial Consortium (OGC). SOS instances are currently used in a broad range of applications such as hydrology, air quality monitoring, and ocean sciences. Data sets provided via an SOS interface can be found around the globe from Europe to New Zealand.</p><p>The R package sos4R (Nüst et al., 2011) is an extension package for the R environment for statistical computing and visualization (), which has been demonstrated a a powerful tools for conducting and communicating geospatial research (cf. Pebesma et al., 2012; ). sos4R comprises a client that can connect to an SOS server. The user can use it to query data from SOS instances using simple R function calls. It provides a convenience layer for R users to integrate observation data from data access servers compliant with the SOS standard without any knowledge about the underlying technical standards. To further improve the usability for non-SOS experts, a recent update to sos4R includes a set of wrapper functions, which remove complexity and technical language specific to OGC specifications. This update also features specific consideration of the OGC SOS 2.0 Hydrology Profile and thereby opens up a new scientific domain.</p><p>In our presentation we illustrate use cases and examples building upon sos4R easing the access of time series data in an R and Shiny () context. We demonstrate how the abstraction provided in the client library makes sensor observation data for accessible and further show how sos4R allows the seamless integration of distributed observations data, i.e., across organisational boundaries, into transparent and reproducible data analysis workflows.</p><p><strong>References</strong></p><p>Nüst D., Stasch C., Pebesma E. (2011) Connecting R to the Sensor Web. In: Geertman S., Reinhardt W., Toppen F. (eds) Advancing Geoinformation Science for a Changing World. Lecture Notes in Geoinformation and Cartography, Springer. </p><p>Pebesma, E., Nüst, D., & Bivand, R. (2012). The R software environment in reproducible geoscientific research. Eos, Transactions American Geophysical Union, 93(16), 163–163. </p>


Sign in / Sign up

Export Citation Format

Share Document