Methodological issues to analyse real-world longitudinal occupational health data (Preprint)

2021 ◽  
Author(s):  
Rémi Colin Chevalier ◽  
Frédéric Dutheil ◽  
Samuel Dewavrin ◽  
Thomas Cornet ◽  
Julien S Baker ◽  
...  

UNSTRUCTURED Ever greater technological advances and democratization of digital tools such as computers and smartphones offer researchers new possibilities to collect large amounts of health data in order to conduct clinical research. Such data, called real-world data (RWD), appears to be a perfect complement to traditional randomized clinical trials (RCTs) and has become more important in health decisions. Due to its longitudinal nature, RWD is subject to well-known methodological issues that can occur when collecting this type of data. In this article, we present the three main methodological problems encountered by researchers, these include, the longitudinal data itself, missing data (not available - NA) and cluster-correlated data. These concepts have been widely discussed in the literature and many methods and solutions have been proposed to cope with these issues. As examples, mixed and trajectory models have been developed to explore longitudinal data sets, imputation methods can resolve NA issues, and multilevel models facilitate treating cluster-correlated data. This article reviews the various solutions proposed and attempts to analyze all three in detail. Although solutions exist to meet these data collection challenges, solutions are not always correctly exploited, especially in cases where data collection issues overlap. In an attempt to solve this problem, we have conceived a process that considers all three issues simultaneously. This process can be divided into two parts: the first part of data management comprises of several phases such as definition of data structure, identification of suspect data and application of imputation methods. The second part of the analysis relates to the application of different models for repeated data using the modified data set. As a result, it should be possible to facilitate work with data sets and provide results with higher confidence levels. To support our proposal, we have used results from the “Wittyfit” database, which is an epidemiological database of occupational health data.

2011 ◽  
Vol 2011 ◽  
pp. 1-14 ◽  
Author(s):  
Chunzhong Li ◽  
Zongben Xu

Structure of data set is of critical importance in identifying clusters, especially the density difference feature. In this paper, we present a clustering algorithm based on density consistency, which is a filtering process to identify same structure feature and classify them into same cluster. This method is not restricted by the shapes and high dimension data set, and meanwhile it is robust to noises and outliers. Extensive experiments on synthetic and real world data sets validate the proposed the new clustering algorithm.


2009 ◽  
Vol 103 (1) ◽  
pp. 62-68
Author(s):  
Kathleen Cage Mittag ◽  
Sharon Taylor

Using activities to create and collect data is not a new idea. Teachers have been incorporating real-world data into their classes since at least the advent of the graphing calculator. Plenty of data collection activities and data sets exist, and the graphing calculator has made modeling data much easier. However, the authors were in search of a better physical model for a quadratic. We wanted students to see an actual parabola take shape in real time and then explore its characteristics, but we could not find such a hands-on model.


2021 ◽  
Vol 48 (3) ◽  
pp. 320-331
Author(s):  
Ruth Enid Zambrana ◽  
Gabriel Amaro ◽  
Courtney Butler ◽  
Melissa DuPont-Reyes ◽  
Deborah Parra-Medina

Introduction. Prior to 1980, U.S. national demographic and health data collection did not identify individuals of Hispanic/Latina/o heritage as a population group. Post-1990, robust immigration from Latin America (e.g., South America, Central America, Mexico) and subsequent growth in U.S. births, dynamically reconstructed the ethnoracial lines among Latinos from about 20 countries, increasing racial admixture and modifying patterns of health disparities. The increasing racial and class heterogeneity of U.S. Latina/os demands a critical analysis of sociodemographic factors associated with population health disparities. Purposes. To determine the state of available Latina/o population demographic and health data in the United States, assess demographic and health variables and trends from 1960 to the present, and identify current strengths, gaps, and areas of improvement. Method. Analysis of 101 existing data sets that included demographic, socioeconomic, and health characteristics of the U.S. Latina/o population, grouped by three, 20-year intervals: 1960–1979, 1980–1999, and 2000–2019. Results. Increased Latina/o immigration and U.S. births between 1960 and 2019 was associated with increases of Latino population samples in data collection. Findings indicate major gaps in the following four areas: children and youth younger than 18 years, gender and sexual identity, race and mixed-race measures, and immigration factors including nativity and generational status. Conclusions. The analysis of existing ethnoracial Latina/o population data collection efforts provides an opportunity for critical analysis of past trends, future directions in data collection efforts, and an equity lens to guide appropriate community health interventions and policies that will contribute to decreasing health disparities in Latina/o populations.


2011 ◽  
Vol 11 (2) ◽  
pp. 87-105 ◽  
Author(s):  
Ben Shneiderman ◽  
Cody Dunne ◽  
Puneet Sharma ◽  
Ping Wang

This paper reviews the trajectory of three information visualization innovations: treemaps, cone trees, and hyperbolic trees. These three ideas were first published around the same time in the early 1990s, so we are able to track academic publications, patents, and trade press articles over almost two decades. We describe the early history of each approach, problems with data collection from differing sources, appropriate metrics, and strategies for visualizing these longitudinal data sets. This paper makes two contributions: (1) it offers the information visualization community a history of how certain ideas evolved, influenced others, and were adopted for widespread use and (2) it provides an example of how such scientometric trajectories of innovations can be gathered and visualized. Guidance for designers is offered, but these conjectures may also be useful to researchers, research managers, science policy analysts, and venture capitalists.


2013 ◽  
Vol 16 (1) ◽  
pp. 157-162 ◽  
Author(s):  
Jaakko Kaprio

In 2002 and 2006, review papers have described the Finnish Twin Cohort and studies conducted on these population-based, longitudinal data sets with extensive follow-up data. Three cohorts have been established: the older twin cohort in the 1970s, and the Finntwin12 and Finntwin16 studies initiated in the 1990s. The present review provides on update on the latest data collections conducted since the previous review. These cover the fourth waves of data collection in the older cohort (twins born before 1958) and Finntwin12 (twins born 1983–1987). The fifth wave of data collection in Finntwin16 (twins born 1975–1979) also included assessments of their spouses/partners. An analysis of mortality in the older cohort from 1975 to 2009 indicates that the mortality of adult twins (as individuals) does not differ from the population at large. Based on the cohorts, many sub-studies with more detailed phenotyping and collection of omics data have been conducted or are in progress. We also contribute to numerous national and international collaborations.


Sign in / Sign up

Export Citation Format

Share Document