Data set representativeness during data collection in three UK social surveys: generalizability and the effects of auxiliary covariate choice

Jamie C. Moore; Gabriele B. Durrant; Peter W. F. Smith

doi:10.1111/rssa.12256

Getting Started Creating Data Dictionaries: How to Create a Shareable Data Set

Advances in Methods and Practices in Psychological Science ◽

10.1177/2515245920928007 ◽

2021 ◽

Vol 4 (1) ◽

pp. 251524592092800

Author(s):

Erin M. Buchanan ◽

Sarah E. Crain ◽

Ari L. Cunningham ◽

Hannah R. Johnson ◽

Hannah Stash ◽

...

Keyword(s):

Data Collection ◽

Data Sharing ◽

Search Engine ◽

Web Applications ◽

Data Sets ◽

Data Dictionary ◽

Data Set ◽

Entire Process ◽

Shared Data ◽

Source Data

As researchers embrace open and transparent data sharing, they will need to provide information about their data that effectively helps others understand their data sets’ contents. Without proper documentation, data stored in online repositories such as OSF will often be rendered unfindable and unreadable by other researchers and indexing search engines. Data dictionaries and codebooks provide a wealth of information about variables, data collection, and other important facets of a data set. This information, called metadata, provides key insights into how the data might be further used in research and facilitates search-engine indexing to reach a broader audience of interested parties. This Tutorial first explains terminology and standards relevant to data dictionaries and codebooks. Accompanying information on OSF presents a guided workflow of the entire process from source data (e.g., survey answers on Qualtrics) to an openly shared data set accompanied by a data dictionary or codebook that follows an agreed-upon standard. Finally, we discuss freely available Web applications to assist this process of ensuring that psychology data are findable, accessible, interoperable, and reusable.

Download Full-text

Design of a novel Peltier-based cooling device and its use in neutron diffraction data collection of perdeuterated yeast pyrophosphatase

Journal of Applied Crystallography ◽

10.1107/s0021889810027111 ◽

2010 ◽

Vol 43 (5) ◽

pp. 1113-1120 ◽

Cited By ~ 1

Author(s):

Esko Oksanen ◽

François Dauvergne ◽

Adrian Goldman ◽

Monika Budayova-Spano

Keyword(s):

Data Collection ◽

Neutron Diffraction ◽

Diffraction Data ◽

Catalytic Mechanism ◽

Inorganic Pyrophosphatase ◽

Neutron Diffraction Data ◽

Cooling Device ◽

Data Set ◽

X Ray ◽

X Ray Crystallography

H atoms play a central role in enzymatic mechanisms, but H-atom positions cannot generally be determined by X-ray crystallography. Neutron crystallography, on the other hand, can be used to determine H-atom positions but it is experimentally very challenging. Yeast inorganic pyrophosphatase (PPase) is an essential enzyme that has been studied extensively by X-ray crystallography, yet the details of the catalytic mechanism remain incompletely understood. The temperature instability of PPase crystals has in the past prevented the collection of a neutron diffraction data set. This paper reports how the crystal growth has been optimized in temperature-controlled conditions. To stabilize the crystals during neutron data collection a Peltier cooling device that minimizes the temperature gradient along the capillary has been developed. This device allowed the collection of a full neutron diffraction data set.

Download Full-text

From pole to pole: 33 years of physical oceanography onboard R/V <i>Polarstern</i>

Earth System Science Data ◽

10.5194/essd-9-211-2017 ◽

2017 ◽

Vol 9 (1) ◽

pp. 211-220 ◽

Cited By ~ 7

Author(s):

Amelie Driemel ◽

Eberhard Fahrbach ◽

Gerd Rohardt ◽

Agnieszka Beszczynska-Möller ◽

Antje Boetius ◽

...

Keyword(s):

Data Collection ◽

Water Cycle ◽

Heat Budget ◽

Ocean Dynamics ◽

The Arctic ◽

Sensor Calibration ◽

Data Sets ◽

Calibration Data ◽

Data Set ◽

Different Characteristics

Abstract. Measuring temperature and salinity profiles in the world's oceans is crucial to understanding ocean dynamics and its influence on the heat budget, the water cycle, the marine environment and on our climate. Since 1983 the German research vessel and icebreaker Polarstern has been the platform of numerous CTD (conductivity, temperature, depth instrument) deployments in the Arctic and the Antarctic. We report on a unique data collection spanning 33 years of polar CTD data. In total 131 data sets (1 data set per cruise leg) containing data from 10 063 CTD casts are now freely available at doi:10.1594/PANGAEA.860066. During this long period five CTD types with different characteristics and accuracies have been used. Therefore the instruments and processing procedures (sensor calibration, data validation, etc.) are described in detail. This compilation is special not only with regard to the quantity but also the quality of the data – the latter indicated for each data set using defined quality codes. The complete data collection includes a number of repeated sections for which the quality code can be used to investigate and evaluate long-term changes. Beginning with 2010, the salinity measurements presented here are of the highest quality possible in this field owing to the introduction of the OPTIMARE Precision Salinometer.

Download Full-text

Combining Hungarian Administrative Data with Google Search Trends to Predict Tendencies in Local Public Health and Consumer Behaviour

International Journal for Population Data Science ◽

10.23889/ijpds.v4i3.1206 ◽

2019 ◽

Vol 4 (3) ◽

Author(s):

Dorottya Molnár-Kovács

Keyword(s):

Consumer Behaviour ◽

Local Population ◽

Google Trends ◽

Data Set ◽

Local Public Health ◽

Social Surveys ◽

Internet Users ◽

Behavioural Patterns ◽

Public Data ◽

Google Search

Google Trends is a publicly available free tool that provides minute-by-minute, regional statistics on the popularity of keywords users type in the search engine. Although this massive and exponentially growing data set has its limitations, it provides a unique look into the minds of Internet users. As opposed to direct questionnaires, people using Google have no incentives to lie or to hide their true interests, yet they are motivated to be as honest and precise about their questions as they possibly can. This source of information, when combined with official data provided by government institutions can help researchers understand behavioural patterns and overtime can assist in predicting epidemics or anticipating cultural trends. This presentation will focus on a research aiming to use Hungarian administrative health data concerning the influenza epidemics from 2004 to 2018 to see how frequent certain flu-related searches were in the periods preceding the cold and flu season, and during the period itself. After determining which Google keywords work best as indicators of a flu outbreak, we tested Google Trends solely to measure the accuracy with which we could have predicted a forthcoming flu epidemic in the past few years. The goal of our research was to experiment with ways in which researchers studying local phenomena could take advantage of the massive amounts of free, public data from digital footprints of the local population. During our research we have experimented with using Google Trends to mirror and possibly predict consumer behaviour and as an alternative to conducting social surveys.

Download Full-text

Mapping Film Programming across Post-War Europe (1952)

Research Data Journal for the Humanities and Social Sciences ◽

10.1163/24523666-00502009 ◽

2020 ◽

Vol 5 (2) ◽

pp. 109-125

Author(s):

Thunnis van Oort ◽

Åsa Jernudd ◽

Kathleen Lotze ◽

Clara Pafort-Overduin ◽

Daniël Biltereyst ◽

...

Keyword(s):

United Kingdom ◽

Data Collection ◽

European Countries ◽

Global Level ◽

Research Projects ◽

Data Set ◽

Aggregated Data ◽

Feature Films ◽

Post War

Abstract This data paper and the data collection from which it emerges aim to present a fully harmonized data set originating in several research projects on post-war cinema programming. The paper will reflect on the collection and structure of this aggregated data set, that consists of titles of feature films screened for public viewing in cinemas in the cities Bari (Italy), Antwerp and Ghent (Belgium), Gothenburg (Sweden), Leicester (United Kingdom) and Rotterdam (Netherlands) for the year 1952. As comparisons of movie-going patterns between European countries are still rare, this paper offers a model for constructing a data set which can be replicated, scaled up and used to compare, contextualize, and eventually theorize practices of cinema-going across countries at a global level.

Download Full-text

Estimation of Band-Tailed Pigeon Band Recovery and Population Vital Rates in Colorado, 1969–1981

Journal of Fish and Wildlife Management ◽

10.3996/112015-jfwm-110 ◽

2016 ◽

Vol 7 (2) ◽

pp. 369-376 ◽

Cited By ~ 1

Author(s):

Mark E. Seamans ◽

Clait E. Braun

Keyword(s):

New Mexico ◽

Data Collection ◽

Standard Error ◽

Additional Data ◽

Vital Rates ◽

Data Set ◽

Sierra Madre Occidental ◽

Annual Survival ◽

Sierra Madre ◽

Additional Data Collection

AbstractData to inform population assessment of the Interior subspecies of band-tailed pigeon, Patagioenas fasciata fasciata (breeding range from Colorado and Utah south into Sierra Madre Occidental of Mexico), have been lacking despite substantial past banding efforts. We used a data set of more than 26,000 bandings from Colorado, with 3,500 live recaptures and 780 recoveries from the harvest of banded individuals to estimate annual survival, fidelity, and harvest rates. Most birds were harvested in Colorado (62%) followed by Mexico (18%); New Mexico (16%); Arizona (3%); and 1% or less each in California, Washington, and Utah. On average, each year 15% (range 0–30%) of surviving band-tailed pigeons did not return to Colorado. From 1969 to 1981 mean annual survival was 0.633 (standard error [SE] = 0.031) for hatch-year and 0.719 (SE = 0.016) for after-hatch-year birds, with a mean annual recovery rate of 0.015 (SE = 0.002) for hatch-year and 0.011 (SE = 0.001) for after-hatch-year birds. From 1970 to 1974, mean annual abundance of band-tailed pigeons in Colorado on 1 September was 59,911–88,290. These data provide a baseline for additional data collection for band-tailed pigeons in the range of the Interior subspecies.

Download Full-text

Developing a predictive model of construction industry-university research collaboration

Construction Innovation ◽

10.1108/ci-11-2019-0129 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Monty Sutrisna ◽

Dewi Tjia ◽

Peng Wu

Keyword(s):

Data Collection ◽

Predictive Model ◽

Construction Industry ◽

Research Collaboration ◽

Ann Model ◽

Data Set ◽

Content Type ◽

Construction Companies ◽

Likelihood Model ◽

Artificial Neural Network Ann

Purpose This paper aims to identify and examine the factors that influence construction industry-university (IU) collaboration and develop the likelihood model of a potential industry partner within the construction industry to collaborate with universities. Design/methodology/approach Mix method data collection including questionnaire survey and focus groups were used for data collection. The collected data were analysed using descriptive and inferential statistical methods to identify and examine factors. These findings were then used to develop the likelihood predictive model of IU collaboration. A well-known artificial neural network (ANN) model, was trained and cross-validated to develop the predictive model. Findings The study identified company size (number of employees and approximate annual turnover), the length of experience in the construction industry, previous IU collaboration, the importance of innovation and motivation of innovation for short term showed statistically significant influence on the likelihood of collaboration. The study also revealed there was an increase in interest amongst companies to engage the university in collaborative research. The ANN model successfully predicted the likelihood of a potential construction partner to collaborate with universities at the accuracy of 85.5%, which was considered as a reasonably good model. Originality/value The study investigated the nature of collaboration and the factors that can have an impact on the potential IU collaborations and based on that, introduced the implementation of machine learning approach to examine the likelihood of IU collaboration. While the developed model was derived from analysing data set from Western Australian construction industry, the methodology proposed here can be used as the basis of predictive developing models for construction industry elsewhere to help universities in assessing the likelihood for collaborating and partnering with the targeted construction companies.

Download Full-text

OP117 Digital Real-World Evidence In Times Of General Data Protection Regulation

International Journal of Technology Assessment in Health Care ◽

10.1017/s0266462321000660 ◽

2021 ◽

Vol 37 (S1) ◽

pp. 1-1

Author(s):

Rhodri Saunders ◽

Rafael Torrejon Torres ◽

Maximilian Blüher

Keyword(s):

Data Collection ◽

Real World ◽

Data Protection ◽

Safety Data ◽

Design Approach ◽

Data Set ◽

General Data Protection Regulation ◽

Real World Evidence ◽

Quality Markers ◽

General Data

IntroductionReal-world evidence (RWE) is a useful supplement to a product's evidence base especially for medical devices, which are often unsuitable for randomized controlled trials. Generally, RWE is analyzed retrospectively (for example, healthcare records), which lack granularity for health-economic analysis. Prospective collection of RWE in hospitals can promote device-specific endpoint assessment. The advent of the General Data Protection Regulation (GDPR) requires a privacy-by-design approach. This work describes a workflow for a GDPR-compliant device-specific RWE collection as part of quality improvement initiatives (QII).MethodsA literature review identifies relevant clinical and quality markers as endpoints to the investigated technology. A panel of experts grade these endpoints on their clinical significance, privacy sensitivity, analytic value, and feasibility for collection. Endpoints meeting a predefined cut-off are considered quality markers for the QII. Finally, an RWE data collection app is designed to collect the quality markers using either longitudinal, pseudonymized data or single time-point anonymized data to ensure data protection by design.ResultsUsing this approach relevant clinical markers were identified in a GDPR-compliant manner. The data collection app design ensured that patient data were protected, while maintaining minimum requirements on patient information and consent. The pilot QII collected data on over 5,000 procedures, which represents the largest single data set available for the tested technology. Due to its prospective nature this programme was the first to collect patient outcomes in sufficient quantity for analysis, while previous studies only recorded adverse events.ConclusionsGDPR and RWE can co-exist in harmony. A design approach, which has data protection in mind from the start can combine high quality RWE collection of efficacy and safety data with maximum patient privacy.

Download Full-text

An evaluation of patient perceptions to the value of the gait laboratory as part of the rehabilitation of primary lower limb amputees

Prosthetics and Orthotics International ◽

10.1080/03093640701554045 ◽

2008 ◽

Vol 32 (1) ◽

pp. 12-22 ◽

Cited By ~ 8

Author(s):

Mary Jane Cole ◽

Sally Durham ◽

David Ewins

Keyword(s):

Data Collection ◽

Gait Analysis ◽

Data Set ◽

Junior Staff ◽

Data Collection And Analysis ◽

Rehabilitation Period ◽

Development Evaluation ◽

The Cost ◽

Lower Limb Amputees ◽

Time Issue

The value of gait analysis for research and product development/evaluation in prosthetics is well understood. However, in the rehabilitation of amputees, the application of gait analysis beyond observation in the clinical setting is not routinely practiced, largely due to the cost and time taken to complete a fully instrumented data collection and analysis. To address the time issue, a weekly clinical gait service for primary amputees, based around a reduced data set from video and video vector technology, has been developed. In this clinic primary amputees are seen twice in the Gait Laboratory during their rehabilitation period, with the clinic time for each patient being no more than 15 minutes, which covers both data collection and review. A questionnaire was developed to evaluate the patients' perceptions of the clinic. The questionnaire was completed anonymously by 48 primary amputees over a 6-month period. The results from the questionnaire demonstrate that the clinic was perceived positively by the patients, and suggested areas of future development, e.g. the education of junior staff, and the need to include the patient's physiotherapist in the data collection and review process.

Download Full-text

Exploration of Tourist Activities in Urban Destination Using Venue Check-In Data

Journal of Hospitality & Tourism Research ◽

10.1177/1096348019889121 ◽

2019 ◽

Vol 44 (3) ◽

pp. 472-498

Author(s):

Huy Quan Vu ◽

Jian Ming Luo ◽

Gang Li ◽

Rob Law

Keyword(s):

Data Collection ◽

Large Scale ◽

Traditional Approach ◽

Urban Tourism ◽

Data Set ◽

Tourism Marketing ◽

Large Scale Data ◽

New Type ◽

Scale Data

Understanding the differences and similarities in the activities of tourists from various cultures is important for tourism managers to develop appropriate plans and strategies that could support urban tourism marketing and managements. However, tourism managers still face challenges in obtaining such understanding because the traditional approach of data collection, which relies on survey and questionnaires, is incapable of capturing tourist activities at a large scale. In this article, we present a method for the study of tourist activities based on a new type of data, venue check-ins. The effectiveness of the presented approach is demonstrated through a case study of a major tourism country, France. Analysis based on a large-scale data set from 19 tourism cities in France reveals interesting differences and similarities in the activities of tourists from 14 markets (countries). Valuable insights are provided for various urban tourism applications.

Download Full-text