Updating the paradigm of official statistics: New quality criteria for integrating new data and methods in official statistics

Statistical Journal of the IAOS ◽

10.3233/sji-200711 ◽

2020 ◽

pp. 1-18

Author(s):

Sofie De Broe ◽

Peter Struijs ◽

Piet Daas ◽

Arnout van Delden ◽

Joep Burger ◽

...

Keyword(s):

Paradigm Shift ◽

Data Science ◽

Quality Criteria ◽

Unstructured Data ◽

Data Sources ◽

Cultural Barriers ◽

Official Statistics ◽

Knowledge And Skills ◽

Production Processes

This paper aims to elicit a discussion of the existence of a paradigm shift in official statistics through the emergence of new (unstructured) data sources and methods that may not adhere to established and existing statistical practices and quality frameworks. The paper discusses strengths and weaknesses of several data sources. Furthermore, it discusses methodological, technical and cultural barriers in dealing with new data and methods in data science; cultural as in the culture that reigns in an area of expertise or approach. The paper concludes with suggestions of updating the existing quality frameworks. We take the position that there is no paradigm shift but that the existing production processes should be adjusted and that existing quality frameworks should be updated in order for official statistics to benefit from the fusion of data, knowledge and skills among survey methodologists and data scientists.

Download Full-text

Trusted Smart Statistics: How new data will change official statistics

Data & Policy ◽

10.1017/dap.2020.7 ◽

2020 ◽

Vol 2 ◽

Cited By ~ 2

Author(s):

Fabio Ricciato ◽

Albrecht Wirthmann ◽

Martina Hahn

Keyword(s):

Paradigm Shift ◽

New Technologies ◽

Data Sources ◽

Digital Data ◽

System Level ◽

Statistical System ◽

Official Statistics ◽

Discussion Paper ◽

Current State ◽

Data Collection And Analysis

Abstract In this discussion paper, we outline the motivations and the main principles of the Trusted Smart Statistics (TSS) concept that is under development in the European Statistical System. TSS represents the evolution of official statistics in response to the challenges posed by the new datafied society. Taking stock from the availability of new digital data sources, new technologies, and new behaviors, statistical offices are called nowadays to rethink the way they operate in order to reassert their role in modern democratic society. The issue at stake is considerably broader and deeper than merely adapting existing processes to embrace so-called Big Data. In several aspects, such evolution entails a fundamental paradigm shift with respect to the legacy model of official statistics production based on traditional data sources, for example, in the relation between data and computation, between data collection and analysis, between methodological development and statistical production, and of course in the roles of the various stakeholders and their mutual relationships. Such complex evolution must be guided by a comprehensive system-level view based on clearly spelled design principles. In this paper, we aim at providing a general account of the TSS concept reflecting the current state of the discussion within the European Statistical System.

Download Full-text

Tracking Phantastic Objects: A Computer Algorithmic Investigation of Narrative Evolution in Unstructured Data Sources

SSRN Electronic Journal ◽

10.2139/ssrn.2405447 ◽

2014 ◽

Cited By ~ 1

Author(s):

David Tuckett ◽

Robert Elliot Smith ◽

Rickard Nyman

Keyword(s):

Unstructured Data ◽

Data Sources

Download Full-text

Set up and management of a Data Science faculty at Shiga University

Impact ◽

10.21820/23987073.2019.10.18 ◽

2019 ◽

Vol 2019 (10) ◽

pp. 18-20

Author(s):

Akimichi Takemura

Keyword(s):

Data Science ◽

Science Faculty ◽

Knowledge And Skills ◽

International Trends ◽

Leading Position ◽

Global Understanding ◽

Set Up ◽

The University ◽

Do So ◽

Undergraduate Class

Shiga University opened the first data science faculty in Japan in April 2017. Beginning with an undergraduate class of 100 students, the Department has since established a Master's degree programme with 20 students in each annual intake. This is the first data science faculty in Japan and the University intends to retain this leading position, the Department is well-placed to do so. The faculty closely monitors international trends concerning data science and Artificial Intelligence (AI) and adapt its education and research accordingly. The genesis of this department marks a change in Japan's attitudes towards dealing with information and reflects a wider, global understanding of the need for further research in this area. Shiga University's Data Science department seeks to produce well-trained data scientists who demonstrate a good balance of knowledge and skills in each of the three key areas of data science.

Download Full-text

Big Data Driven Clinical Informatics & Surveillance (BDD_CIS) – A Multimodal Database Focused Clinical, Community, and Multi-Omics Surveillance Plan for COVID-19: A study Protocol (Preprint)

10.2196/preprints.24504 ◽

2020 ◽

Author(s):

Bankole Olatosi ◽

Jiajia Zhang ◽

Sharon Weissman ◽

Zhenlong Li ◽

Jianjun Hu ◽

...

Keyword(s):

Big Data ◽

South Carolina ◽

Data Science ◽

Age Groups ◽

The Elderly ◽

The United States ◽

Data Sources ◽

Patient Registries ◽

Multiple Partner ◽

Multimodal Data

BACKGROUND The Coronavirus Disease 2019 (COVID-19) caused by the severe acute respiratory syndrome coronavirus (SARS-CoV-2) remains a serious global pandemic. Currently, all age groups are at risk for infection but the elderly and persons with underlying health conditions are at higher risk of severe complications. In the United States (US), the pandemic curve is rapidly changing with over 6,786,352 cases and 199,024 deaths reported. South Carolina (SC) as of 9/21/2020 reported 138,624 cases and 3,212 deaths across the state. OBJECTIVE The growing availability of COVID-19 data provides a basis for deploying Big Data science to leverage multitudinal and multimodal data sources for incremental learning. Doing this requires the acquisition and collation of multiple data sources at the individual and county level. METHODS The population for the comprehensive database comes from statewide COVID-19 testing surveillance data (March 2020- till present) for all SC COVID-19 patients (N≈140,000). This project will 1) connect multiple partner data sources for prediction and intelligence gathering, 2) build a REDCap database that links de-identified multitudinal and multimodal data sources useful for machine learning and deep learning algorithms to enable further studies. Additional data will include hospital based COVID-19 patient registries, Health Sciences South Carolina (HSSC) data, data from the office of Revenue and Fiscal Affairs (RFA), and Area Health Resource Files (AHRF). RESULTS The project was funded as of June 2020 by the National Institutes for Health. CONCLUSIONS The development of such a linked and integrated database will allow for the identification of important predictors of short- and long-term clinical outcomes for SC COVID-19 patients using data science.

Download Full-text

Mapping the United Nations Fundamental Principles of Official Statistics against new and big data sources

Statistical Journal of the IAOS ◽

10.3233/sji-210789 ◽

2021 ◽

Vol 37 (1) ◽

pp. 161-169

Author(s):

Dominik Rozkrut ◽

Olga Świerkot-Strużewska ◽

Gemma Van Halderen

Keyword(s):

Big Data ◽

Public Information ◽

Fundamental Principle ◽

Data Sources ◽

Official Statistics ◽

Development Agenda ◽

Data Gaps ◽

Data Source ◽

Exciting Time ◽

Statistical Systems

Never has there been a more exciting time to be an official statistician. The data revolution is responding to the demands of the CoVID-19 pandemic and a complex sustainable development agenda to improve how data is produced and used, to close data gaps to prevent discrimination, to build capacity and data literacy, to modernize data collection systems and to liberate data to promote transparency and accountability. But can all data be liberated in the production and communication of official statistics? This paper explores the UN Fundamental Principles of Official Statistics in the context of eight new and big data sources. The paper concludes each data source can be used for the production of official statistics in adherence with the Fundamental Principles and argues these data sources should be used if National Statistical Systems are to adhere to the first Fundamental Principle of compiling and making available official statistics that honor citizen’s entitlement to public information.

Download Full-text

How to be Real and Conventional: A Discussion of the Quality Criteria of Official Statistics

Minerva ◽

10.1007/s11024-009-9125-3 ◽

2009 ◽

Vol 47 (3) ◽

pp. 307-322 ◽

Cited By ~ 20

Author(s):

Alain Desrosières

Keyword(s):

Quality Criteria ◽

Official Statistics

Download Full-text

Tracking phantastic objects: A computer algorithmic investigation of narrative evolution in unstructured data sources

Social Networks ◽

10.1016/j.socnet.2014.03.001 ◽

2014 ◽

Vol 38 ◽

pp. 121-133 ◽

Cited By ~ 12

Author(s):

David Tuckett ◽

Robert Elliot Smith ◽

Rickard Nyman

Keyword(s):

Unstructured Data ◽

Data Sources

Download Full-text

Towards integrated Data Analysis Quality: Criteria for the application of Industrial Data Science

10.1109/iri51335.2021.00024 ◽

2021 ◽

Author(s):

Nikolai West ◽

Jonas Gries ◽

Carina Brockmeier ◽

Jens C. Gobel ◽

Jochen Deuse

Keyword(s):

Data Analysis ◽

Data Science ◽

Quality Criteria ◽

Industrial Data ◽

Analysis Quality ◽

Integrated Data Analysis

Download Full-text

Early Prediction of Sepsis in the ICU using Machine Learning: A Systematic Review.

10.1101/2020.08.31.20185207 ◽

2020 ◽

Author(s):

Michael Moor ◽

Bastian Rieck ◽

Max Horn ◽

Catherine Jutzeler ◽

Karsten Borgwardt

Keyword(s):

Machine Learning ◽

Systematic Review ◽

Quality Assessment ◽

Biomarker Discovery ◽

Learning Algorithms ◽

Quality Criteria ◽

Machine Learning Algorithms ◽

Data Sources ◽

Synthesis Methods ◽

Early Prediction

Background: Sepsis is among the leading causes of death in intensive care units (ICU) worldwide and its recognition, particularly in the early stages of the disease, remains a medical challenge. The advent of an affluence of available digital health data has created a setting in which machine learning can be used for digital biomarker discovery, with the ultimate goal to advance the early recognition of sepsis. Objective: To systematically review and evaluate studies employing machine learning for the prediction of sepsis in the ICU. Data sources: Using Embase, Google Scholar, PubMed/Medline, Scopus, and Web of Science, we systematically searched the existing literature for machine learning-driven sepsis onset prediction for patients in the ICU. Study eligibility criteria: All peer-reviewed articles using machine learning for the prediction of sepsis onset in adult ICU patients were included. Studies focusing on patient populations outside the ICU were excluded. Study appraisal and synthesis methods: A systematic review was performed according to the PRISMA guidelines. Moreover, a quality assessment of all eligible studies was performed. Results: Out of 974 identified articles, 22 and 21 met the criteria to be included in the systematic review and quality assessment, respectively. A multitude of machine learning algorithms were applied to refine the early prediction of sepsis. The quality of the studies ranged from "poor" (satisfying less than 40% of the quality criteria) to "very good" (satisfying more than 90% of the quality criteria). The majority of the studies (n= 19, 86.4%) employed an offline training scenario combined with a horizon evaluation, while two studies implemented an online scenario (n= 2,9.1%). The massive inter-study heterogeneity in terms of model development, sepsis definition, prediction time windows, and outcomes precluded a meta-analysis. Last, only 2 studies provided publicly-accessible source code and data sources fostering reproducibility. Limitations: Articles were only eligible for inclusion when employing machine learning algorithms for the prediction of sepsis onset in the ICU. This restriction led to the exclusion of studies focusing on the prediction of septic shock, sepsis-related mortality, and patient populations outside the ICU. Conclusions and key findings: A growing number of studies employs machine learning to31optimise the early prediction of sepsis through digital biomarker discovery. This review, however, highlights several shortcomings of the current approaches, including low comparability and reproducibility. Finally, we gather recommendations how these challenges can be addressed before deploying these models in prospective analyses. Systematic review registration number: CRD42020200133

Download Full-text

Harnessing Administrative Records for Official Statistics on People and Households

International Journal for Population Data Science ◽

10.23889/ijpds.v3i5.1056 ◽

2018 ◽

Vol 3 (5) ◽

Author(s):

Misty L Heggeness

Keyword(s):

Survey Data ◽

Census Bureau ◽

Data Sources ◽

Official Statistics ◽

Administrative Records ◽

The Great Depression ◽

Federal Statistical ◽

Fiction Writers ◽

Working Age ◽

The Impact

The availability and excessiveness of alternative (non-survey) data sources, collected on a daily, hourly, and sometimes second-by-second basis, has challenged the federal statistical system to update existing protocol for developing official statistics. Federal statistical agencies collect data primarily through survey methodologies built on frames constructed from administrative records. They compute survey weights to adjust for non-response and unequal sampling probabilities, impute answers for nonresponse, and report official statistics via tabulations from these survey. The U.S. federal government has rigorously developed these methodologies since the advent of surveys -- an innovation produced by the urgent desire of Congress and the President to estimate annual unemployment rates of working age men during the Great Depression. In the 1930s, Twitter did not exist; high-scale computing facilities were not abundant let alone cheap, and the ease of the ether was just a storyline from the imagination of fiction writers. Today we do have the technology, and an abundance of data, record markers, and alternative sources, which, if curated and examined properly, can help enhance official statistics. Researchers at the Census Bureau have been experimenting with administrative records in an effort to understand how these alternative data sources can improve our understanding of official statistics. Innovative projects like these have advanced our knowledge of the limitations of survey data in estimating official statistics. This paper will discuss advances made in linking administrative records to survey data to-date and will summarize the research on the impact of administrative records on official statistics.

Download Full-text