An analysis of data paper templates and guidelines: types of contextual information described by data journals

Jihyun Kim

doi:10.6087/kcse.185

An analysis of data paper templates and guidelines: types of contextual information described by data journals

Science Editing ◽

10.6087/kcse.185 ◽

2020 ◽

Vol 7 (1) ◽

pp. 16-23 ◽

Cited By ~ 2

Author(s):

Jihyun Kim

Keyword(s):

Contextual Information ◽

Data Reuse ◽

Data Set ◽

Data Format ◽

Coding Scheme ◽

Data Production ◽

Consistent Manner ◽

Collection Data ◽

General Data ◽

Information Repository

Purpose: Data papers are a promising genre of scholarly communication, in which research data are described, shared, and published. Rich documentation of data, including adequate contextual information, enhances the potential of data reuse. This study investigated the extent to which the components of data papers specified by journals represented the types of contextual information necessary for data reuse.Methods: A content analysis of 15 data paper templates/guidelines from 24 data journals indexed by the Web of Science was performed. A coding scheme was developed based on previous studies, consisting of four categories: general data set properties, data production information, repository information, and reuse information.Results: Only a few types of contextual information were commonly requested by the journals. Except data format information and file names, general data set properties were specified less often than other categories of contextual information. Researchers were frequently asked to provide data production information, such as information on the data collection, data producer, and related project. Repository information focused on data identifiers, while information about repository reputation and curation practices was rarely requested. Reuse information mostly involved advice on the reuse of data and terms of use.Conclusion: These findings imply that data journals should provide a more standardized set of data paper components to inform reusers of relevant contextual information in a consistent manner. Information about repository reputation and curation could also be provided by data journals to complement the repository information provided by the authors of data papers and to help researchers evaluate the reusability of data.

Download Full-text

Development of a 14-digit hydrologic coding scheme and boundary data set for New Jersey

10.3133/wri954134 ◽

1995 ◽

Keyword(s):

New Jersey ◽

Boundary Data ◽

Data Set ◽

Coding Scheme

Download Full-text

Household Wealth: Low-Yielding and Poorly Structured?

Journal of Risk and Financial Management ◽

10.3390/jrfm14030099 ◽

2021 ◽

Vol 14 (3) ◽

pp. 99

Author(s):

Marc Peter Radke ◽

Manuel Rupprecht

Keyword(s):

Household Wealth ◽

Financial Advisors ◽

Investment Fund ◽

Data Set ◽

Public And Private ◽

Consistent Manner ◽

Widespread Belief ◽

International Comparative ◽

Real Returns ◽

Mean Variance

In this paper, we present a newly generated data set on real returns of households’ aggregated asset holdings, which adds additional and more sophisticated information to existing relevant datasets in the literature. To do this, we draw on various datasets from public and private sources and then transform and combine them in a consistent manner that allows for international comparative and intertemporal analyses. Based on this, we address two current debates on the development of household wealth in the euro area that have been triggered by the low-interest environment. The first debate refers to the development of real yields on household wealth from 2000 to 2018, whereas the second debate deals with the mean-variance efficiency of household portfolios. Contrary to widespread belief, we find that yields on total wealth, which were largely dominated by non-financial assets’ yields, were mostly positive, although they exhibit a declining trend. Moreover, on average, overall real yields were significantly lower after 2008. Referring to portfolio efficiency, we find that current portfolios seem to be comparatively close to mean-variance efficiency. If households were to optimize their portfolios despite limited room for improvement, holdings of equity and investment fund shares should be reduced, contradicting common recommendations of financial advisors.

Download Full-text

Utilizarea teoriei valorilor extreme în climatologie

Starea actuală a componentelor de mediu ◽

10.53380/9789975315593.17 ◽

2019 ◽

Author(s):

Valentin Raileanu ◽

Keyword(s):

Maximum Likelihood ◽

Extreme Values ◽

Probability Distributions ◽

Simulated Data ◽

Likelihood Estimation ◽

R Software ◽

Data Set ◽

Data Format ◽

Generalized Pareto ◽

Distribution Parameters

The article briefly describes the history and fields of application of the theory of extreme values, including climatology. The data format, the Generalized Extreme Value (GEV) probability distributions with Bock Maxima, the Generalized Pareto (GP) distributions with Point of Threshold (POT) and the analysis methods are presented. Estimating the distribution parameters is done using the Maximum Likelihood Estimation (MLE) method. Free R software installation, the minimum set of required commands and the GUI in2extRemes graphical package are described. As an example, the results of the GEV analysis of a simulated data set in in2extRemes are presented.

Download Full-text

OP117 Digital Real-World Evidence In Times Of General Data Protection Regulation

International Journal of Technology Assessment in Health Care ◽

10.1017/s0266462321000660 ◽

2021 ◽

Vol 37 (S1) ◽

pp. 1-1

Author(s):

Rhodri Saunders ◽

Rafael Torrejon Torres ◽

Maximilian Blüher

Keyword(s):

Data Collection ◽

Real World ◽

Data Protection ◽

Safety Data ◽

Design Approach ◽

Data Set ◽

General Data Protection Regulation ◽

Real World Evidence ◽

Quality Markers ◽

General Data

IntroductionReal-world evidence (RWE) is a useful supplement to a product's evidence base especially for medical devices, which are often unsuitable for randomized controlled trials. Generally, RWE is analyzed retrospectively (for example, healthcare records), which lack granularity for health-economic analysis. Prospective collection of RWE in hospitals can promote device-specific endpoint assessment. The advent of the General Data Protection Regulation (GDPR) requires a privacy-by-design approach. This work describes a workflow for a GDPR-compliant device-specific RWE collection as part of quality improvement initiatives (QII).MethodsA literature review identifies relevant clinical and quality markers as endpoints to the investigated technology. A panel of experts grade these endpoints on their clinical significance, privacy sensitivity, analytic value, and feasibility for collection. Endpoints meeting a predefined cut-off are considered quality markers for the QII. Finally, an RWE data collection app is designed to collect the quality markers using either longitudinal, pseudonymized data or single time-point anonymized data to ensure data protection by design.ResultsUsing this approach relevant clinical markers were identified in a GDPR-compliant manner. The data collection app design ensured that patient data were protected, while maintaining minimum requirements on patient information and consent. The pilot QII collected data on over 5,000 procedures, which represents the largest single data set available for the tested technology. Due to its prospective nature this programme was the first to collect patient outcomes in sufficient quantity for analysis, while previous studies only recorded adverse events.ConclusionsGDPR and RWE can co-exist in harmony. A design approach, which has data protection in mind from the start can combine high quality RWE collection of efficacy and safety data with maximum patient privacy.

Download Full-text

Drilling Automation: The Step Forward for Improving Safety, Consistency, and Performance in Onshore Gas Drilling

10.2118/204849-ms ◽

2021 ◽

Author(s):

Ernesto Gomez ◽

Ebikebena Ombe ◽

Brennan Goodkey ◽

Rafael Carvalho

Keyword(s):

Oil And Gas ◽

Contextual Information ◽

High Mobility ◽

Automation System ◽

Rotary Drilling ◽

Stick Slip ◽

Gas Drilling ◽

Consistent Manner ◽

And Performance ◽

Drilling Conditions

Abstract In the current oil and gas drilling industry, the modernization of rig fleets has been shifting toward high mobility, artificial intelligence, and computerized systems. Part of this shift includes a move toward automation. This paper summarizes the successful application of a fully automated workflow to drill a stand, from slips out to slips back in, in a complex drilling environment in onshore gas. Repeatable processes with adherence to plans and operating practices are a key requirement in the implementation of drilling procedures and vital for optimizing operations in a systematic way. A drilling automation solution has been deployed in two rigs enabling the automation of both pre-connection and post-connection activities as well as rotary drilling of an interval equivalent to a typical drillpipe stand (approximately 90 ft) while optimizing the rate of penetration (ROP) and managing drilling dysfunctionalities, such as stick-slip and drillstring vibrations in a consistent manner. So far, a total of nine wells have been drilled using this solution. The automation system is configured with the outputs of the drilling program, including the drilling parameters roadmap, bottomhole assembly tools, and subsurface constraints. Before drilling every stand, the driller is presented with the planned configuration and can adjust settings whenever necessary. Once a goal is specified, the system directs the rig control system to command the surface equipment (draw works, auto-driller, top drive, and pumps). Everything is undertaken in the context of a workflow that reflects standard operating procedures. This solution runs with minimal intervention from the driller and each workflow contextual information is continuously displayed to the driller thereby giving him the best capacity to monitor and supervise the operational sequence. If drilling conditions change, the system will respond by automatically changing the sequence of activities to execute mitigation procedures and achieve the desired goal. At all times, the driller has the option to override the automation system and assume control by a simple touch on the rig controls. Prior to deployment, key performance indicators (KPI), including automated rig state-based measures, were selected. These KPIs are then monitored while drilling each well with the automation system to compare performance with a pre-deployment baseline. The solution was used to drill almost 60,000 ft of hole section with the system in control, and the results showed a 20% improvement in ROP with increased adherence to pre-connection and post-connection operations. Additionally, many lessons were learned from the use and observation of the automation workflow that was used to drive continuous improvement in efficiency and performance over the course of the project. This deployment was the first in the region and the system is part of a comprehensive digital well construction solution that is continuously enriched with new capabilities. This adaptive automated drilling solution delivered a step change in performance, safety, and consistency in the drilling operations.

Download Full-text

General Data Format Security Extensions for Biomedical Signals

EMBEC & NBC 2017 - IFMBE Proceedings ◽

10.1007/978-981-10-5122-7_183 ◽

2017 ◽

pp. 731-734 ◽

Cited By ~ 1

Author(s):

Saulius Daukantas ◽

Vaidotas Marozas ◽

George Drosatos ◽

Eleni Kaldoudi ◽

Arunas Lukosevicius

Keyword(s):

Data Format ◽

Biomedical Signals ◽

General Data

Download Full-text

Lost in Space: Geolocation in Event Data

Political Science Research and Methods ◽

10.1017/psrm.2018.23 ◽

2018 ◽

Vol 7 (04) ◽

pp. 871-888 ◽

Cited By ~ 6

Author(s):

Sophie J. Lee ◽

Howard Liu ◽

Michael D. Ward

Keyword(s):

Learning Algorithm ◽

Text Processing ◽

Contextual Information ◽

Training Data ◽

Supervised Machine Learning ◽

Model Parameters ◽

Event Data ◽

Data Set ◽

N Gram ◽

Automated Text Processing

Improving geolocation accuracy in text data has long been a goal of automated text processing. We depart from the conventional method and introduce a two-stage supervised machine-learning algorithm that evaluates each location mention to be either correct or incorrect. We extract contextual information from texts, i.e., N-gram patterns for location words, mention frequency, and the context of sentences containing location words. We then estimate model parameters using a training data set and use this model to predict whether a location word in the test data set accurately represents the location of an event. We demonstrate these steps by constructing customized geolocation event data at the subnational level using news articles collected from around the world. The results show that the proposed algorithm outperforms existing geocoders even in a case added post hoc to test the generality of the developed algorithm.

Download Full-text

How to Automatically Document Data With the codebook Package to Facilitate Data Reuse

Advances in Methods and Practices in Psychological Science ◽

10.1177/2515245919838783 ◽

2019 ◽

Vol 2 (2) ◽

pp. 169-187 ◽

Cited By ~ 7

Author(s):

Ruben C. Arslan

Keyword(s):

R Package ◽

Data Reuse ◽

Data Sets ◽

Data Set ◽

Psychological Scales ◽

Rich Data ◽

Data Documentation ◽

Machine Readable ◽

Basic Standards ◽

Existing Data

Data documentation in psychology lags behind not only many other disciplines, but also basic standards of usefulness. Psychological scientists often prefer to invest the time and effort that would be necessary to document existing data well in other duties, such as writing and collecting more data. Codebooks therefore tend to be unstandardized and stored in proprietary formats, and they are rarely properly indexed in search engines. This means that rich data sets are sometimes used only once—by their creators—and left to disappear into oblivion. Even if they can find an existing data set, researchers are unlikely to publish analyses based on it if they cannot be confident that they understand it well enough. My codebook package makes it easier to generate rich metadata in human- and machine-readable codebooks. It uses metadata from existing sources and automates some tedious tasks, such as documenting psychological scales and reliabilities, summarizing descriptive statistics, and identifying patterns of missingness. The codebook R package and Web app make it possible to generate a rich codebook in a few minutes and just three clicks. Over time, its use could lead to psychological data becoming findable, accessible, interoperable, and reusable, thereby reducing research waste and benefiting both its users and the scientific community as a whole.

Download Full-text

FOCAL REGION-BASED VOLUME RENDERING

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001406004909 ◽

2006 ◽

Vol 20 (05) ◽

pp. 665-677 ◽

Cited By ~ 2

Author(s):

JIANLONG ZHOU ◽

ZHIYAN WANG ◽

KLAUS D. TÖNNIES

Keyword(s):

Volume Rendering ◽

Contextual Information ◽

Focal Region ◽

Ray Casting ◽

Volumetric Data ◽

Data Set ◽

Volume Of Interest ◽

Bounded Volume ◽

Volume Ray Casting ◽

Main Components

In this paper, a new approach named focal region-based volume rendering for visualizing internal structures of volumetric data is presented. This approach presents volumetric information through integrating context information as the structure analysis of the data set with a lens-like focal region rendering to show more detailed information. This feature-based approach contains three main components: (i) A feature extraction model using 3D image processing techniques to explore the structure of objects to provide contextual information; (ii) An efficient ray-bounded volume ray casting rendering to provide the detailed information of the volume of interest in the focal region; (iii) The tools used to manipulate focal regions to make this approach more flexible. The approach provides a powerful framework for producing detailed information from volumetric data. Providing contextual information and focal region renditions at the same time has the advantages of easy to understand and comprehend volume information for the scientist. The interaction techniques provided in this approach make the focal region-based volume rendering more flexible and easy to use.

Download Full-text

Mapping Methods Metadata for Research Data

International Journal of Digital Curation ◽

10.2218/ijdc.v10i1.347 ◽

2015 ◽

Vol 10 (1) ◽

pp. 82-94 ◽

Cited By ~ 8

Author(s):

Tiffany Chao

Keyword(s):

Research Data ◽

Data Reuse ◽

Journal Articles ◽

Long Tail ◽

Research Journal ◽

Potential Source ◽

Metadata Standards ◽

Data Production ◽

Initial Results ◽

Metadata Generation

Understanding the methods and processes implemented by data producers to generate research data is essential for fostering data reuse. Yet, producing the metadata that describes these methods remains a time-intensive activity that data producers do not readily undertake. In particular, researchers in the long tail of science often lack the financial support or tools for metadata generation, thereby limiting future access and reuse of data produced. The present study investigates research journal publications as a potential source for identifying descriptive metadata about methods for research data. Initial results indicate that journal articles provide rich descriptive content that can be sufficiently mapped to existing metadata standards with methods-related elements, resulting in a mapping of the data production process for a study. This research has implications for enhancing the generation of robust metadata to support the curation of research data for new inquiry and innovation.

Download Full-text