Methodological issues to analyse real-world longitudinal occupational health data (Preprint)

Mapping Intimacies ◽

10.2196/preprints.35068 ◽

2021 ◽

Author(s):

Rémi Colin Chevalier ◽

Frédéric Dutheil ◽

Samuel Dewavrin ◽

Thomas Cornet ◽

Julien S Baker ◽

...

Keyword(s):

Data Collection ◽

Occupational Health ◽

Longitudinal Data ◽

Real World ◽

Health Data ◽

Correlated Data ◽

Structure Identification ◽

Data Sets ◽

Methodological Issues ◽

Imputation Methods

UNSTRUCTURED Ever greater technological advances and democratization of digital tools such as computers and smartphones offer researchers new possibilities to collect large amounts of health data in order to conduct clinical research. Such data, called real-world data (RWD), appears to be a perfect complement to traditional randomized clinical trials (RCTs) and has become more important in health decisions. Due to its longitudinal nature, RWD is subject to well-known methodological issues that can occur when collecting this type of data. In this article, we present the three main methodological problems encountered by researchers, these include, the longitudinal data itself, missing data (not available - NA) and cluster-correlated data. These concepts have been widely discussed in the literature and many methods and solutions have been proposed to cope with these issues. As examples, mixed and trajectory models have been developed to explore longitudinal data sets, imputation methods can resolve NA issues, and multilevel models facilitate treating cluster-correlated data. This article reviews the various solutions proposed and attempts to analyze all three in detail. Although solutions exist to meet these data collection challenges, solutions are not always correctly exploited, especially in cases where data collection issues overlap. In an attempt to solve this problem, we have conceived a process that considers all three issues simultaneously. This process can be divided into two parts: the first part of data management comprises of several phases such as definition of data structure, identification of suspect data and application of imputation methods. The second part of the analysis relates to the application of different models for repeated data using the modified data set. As a result, it should be possible to facilitate work with data sets and provide results with higher confidence levels. To support our proposal, we have used results from the “Wittyfit” database, which is an epidemiological database of occupational health data.

Download Full-text

Structure Identification-Based Clustering According to Density Consistency

Mathematical Problems in Engineering ◽

10.1155/2011/890901 ◽

2011 ◽

Vol 2011 ◽

pp. 1-14 ◽

Cited By ~ 1

Author(s):

Chunzhong Li ◽

Zongben Xu

Keyword(s):

High Dimension ◽

Real World ◽

Clustering Algorithm ◽

Density Difference ◽

Structure Identification ◽

Data Sets ◽

Critical Importance ◽

Real World Data ◽

Data Set ◽

High Dimension Data

Structure of data set is of critical importance in identifying clusters, especially the density difference feature. In this paper, we present a clustering algorithm based on density consistency, which is a filtering process to identify same structure feature and classify them into same cluster. This method is not restricted by the shapes and high dimension data set, and meanwhile it is robust to noises and outliers. Extensive experiments on synthetic and real world data sets validate the proposed the new clustering algorithm.

Download Full-text

Activities for Students: As the Ball Rolls: A Quadratic Investigation Using Multiple Representations

Mathematics Teacher ◽

10.5951/mt.103.1.0062 ◽

2009 ◽

Vol 103 (1) ◽

pp. 62-68

Author(s):

Kathleen Cage Mittag ◽

Sharon Taylor

Keyword(s):

Data Collection ◽

Real Time ◽

Physical Model ◽

Real World ◽

Graphing Calculator ◽

Data Sets ◽

Real World Data ◽

World Data ◽

Hands On ◽

Modeling Data

Using activities to create and collect data is not a new idea. Teachers have been incorporating real-world data into their classes since at least the advent of the graphing calculator. Plenty of data collection activities and data sets exist, and the graphing calculator has made modeling data much easier. However, the authors were in search of a better physical model for a quadratic. We wanted students to see an actual parabola take shape in real time and then explore its characteristics, but we could not find such a hands-on model.

Download Full-text

O2-5.1 International data sets on health: data collection and sharing for policy design

Journal of Epidemiology & Community Health ◽

10.1136/jech.2011.142976a.70 ◽

2011 ◽

Vol 65 (Suppl 1) ◽

pp. A27-A27

Author(s):

J. Lee

Keyword(s):

Data Collection ◽

Policy Design ◽

Health Data ◽

Data Sets ◽

International Data

Download Full-text

Analysis of Latina/o Sociodemographic and Health Data Sets in the United States From 1960 to 2019: Findings Suggest Improvements to Future Data Collection Efforts

Health Education & Behavior ◽

10.1177/10901981211011047 ◽

2021 ◽

Vol 48 (3) ◽

pp. 320-331

Author(s):

Ruth Enid Zambrana ◽

Gabriel Amaro ◽

Courtney Butler ◽

Melissa DuPont-Reyes ◽

Deborah Parra-Medina

Keyword(s):

United States ◽

Health Disparities ◽

Data Collection ◽

Critical Analysis ◽

Population Group ◽

Population Data ◽

The United States ◽

Health Data ◽

Data Sets ◽

Population Demographic

Introduction. Prior to 1980, U.S. national demographic and health data collection did not identify individuals of Hispanic/Latina/o heritage as a population group. Post-1990, robust immigration from Latin America (e.g., South America, Central America, Mexico) and subsequent growth in U.S. births, dynamically reconstructed the ethnoracial lines among Latinos from about 20 countries, increasing racial admixture and modifying patterns of health disparities. The increasing racial and class heterogeneity of U.S. Latina/os demands a critical analysis of sociodemographic factors associated with population health disparities. Purposes. To determine the state of available Latina/o population demographic and health data in the United States, assess demographic and health variables and trends from 1960 to the present, and identify current strengths, gaps, and areas of improvement. Method. Analysis of 101 existing data sets that included demographic, socioeconomic, and health characteristics of the U.S. Latina/o population, grouped by three, 20-year intervals: 1960–1979, 1980–1999, and 2000–2019. Results. Increased Latina/o immigration and U.S. births between 1960 and 2019 was associated with increases of Latino population samples in data collection. Findings indicate major gaps in the following four areas: children and youth younger than 18 years, gender and sexual identity, race and mixed-race measures, and immigration factors including nativity and generational status. Conclusions. The analysis of existing ethnoracial Latina/o population data collection efforts provides an opportunity for critical analysis of past trends, future directions in data collection efforts, and an equity lens to guide appropriate community health interventions and policies that will contribute to decreasing health disparities in Latina/o populations.

Download Full-text

Innovation trajectories for information visualizations: Comparing treemaps, cone trees, and hyperbolic trees

Information Visualization ◽

10.1177/1473871611424815 ◽

2011 ◽

Vol 11 (2) ◽

pp. 87-105 ◽

Cited By ~ 13

Author(s):

Ben Shneiderman ◽

Cody Dunne ◽

Puneet Sharma ◽

Ping Wang

Keyword(s):

Data Collection ◽

Longitudinal Data ◽

Information Visualization ◽

Science Policy ◽

Early History ◽

Data Sets ◽

Venture Capitalists ◽

Academic Publications ◽

History Of ◽

Policy Analysts

This paper reviews the trajectory of three information visualization innovations: treemaps, cone trees, and hyperbolic trees. These three ideas were first published around the same time in the early 1990s, so we are able to track academic publications, patents, and trade press articles over almost two decades. We describe the early history of each approach, problems with data collection from differing sources, appropriate metrics, and strategies for visualizing these longitudinal data sets. This paper makes two contributions: (1) it offers the information visualization community a history of how certain ideas evolved, influenced others, and were adopted for widespread use and (2) it provides an example of how such scientometric trajectories of innovations can be gathered and visualized. Guidance for designers is offered, but these conjectures may also be useful to researchers, research managers, science policy analysts, and venture capitalists.

Download Full-text

Data Sets on Pensions and Health: Data Collection and Sharing for Policy Design

10.7249/wr814 ◽

2010 ◽

Author(s):

Jinkook Lee

Keyword(s):

Data Collection ◽

Policy Design ◽

Health Data ◽

Data Sets

Download Full-text

Data Sets on Pensions and Health: Data Collection and Sharing for Policy Design

SSRN Electronic Journal ◽

10.2139/ssrn.1722801 ◽

2010 ◽

Author(s):

Jinkook Lee

Keyword(s):

Data Collection ◽

Policy Design ◽

Health Data ◽

Data Sets

Download Full-text

The Finnish Twin Cohort Study: An Update

Twin Research and Human Genetics ◽

10.1017/thg.2012.142 ◽

2013 ◽

Vol 16 (1) ◽

pp. 157-162 ◽

Cited By ~ 92

Author(s):

Jaakko Kaprio

Keyword(s):

Cohort Study ◽

Data Collection ◽

Longitudinal Data ◽

Population Based ◽

Data Sets ◽

Omics Data ◽

International Collaborations ◽

Data Collections ◽

Adult Twins

In 2002 and 2006, review papers have described the Finnish Twin Cohort and studies conducted on these population-based, longitudinal data sets with extensive follow-up data. Three cohorts have been established: the older twin cohort in the 1970s, and the Finntwin12 and Finntwin16 studies initiated in the 1990s. The present review provides on update on the latest data collections conducted since the previous review. These cover the fourth waves of data collection in the older cohort (twins born before 1958) and Finntwin12 (twins born 1983–1987). The fifth wave of data collection in Finntwin16 (twins born 1975–1979) also included assessments of their spouses/partners. An analysis of mortality in the older cohort from 1975 to 2009 indicates that the mortality of adult twins (as individuals) does not differ from the population at large. Based on the cohorts, many sub-studies with more detailed phenotyping and collection of omics data have been conducted or are in progress. We also contribute to numerous national and international collaborations.

Download Full-text