It’s all in the timing: calibrating temporal penalties for biomedical data sharing

Weiyi Xia; Zhiyu Wan; Zhijun Yin; James Gaupp; Yongtai Liu; Ellen Wright Clayton; Murat Kantarcioglu; Yevgeniy Vorobeychik; Bradley A Malin

doi:10.1093/jamia/ocx101

It’s all in the timing: calibrating temporal penalties for biomedical data sharing

Journal of the American Medical Informatics Association ◽

10.1093/jamia/ocx101 ◽

2017 ◽

Vol 25 (1) ◽

pp. 25-31 ◽

Cited By ~ 1

Author(s):

Weiyi Xia ◽

Zhiyu Wan ◽

Zhijun Yin ◽

James Gaupp ◽

Yongtai Liu ◽

...

Keyword(s):

Data Sharing ◽

Data Science ◽

Data Use ◽

Policy Implications ◽

Biomedical Science ◽

Impact Factors ◽

Biomedical Data ◽

Data Usage ◽

The Impact ◽

Over Time

Abstract Objective Biomedical science is driven by datasets that are being accumulated at an unprecedented rate, with ever-growing volume and richness. There are various initiatives to make these datasets more widely available to recipients who sign Data Use Certificate agreements, whereby penalties are levied for violations. A particularly popular penalty is the temporary revocation, often for several months, of the recipient’s data usage rights. This policy is based on the assumption that the value of biomedical research data depreciates significantly over time; however, no studies have been performed to substantiate this belief. This study investigates whether this assumption holds true and the data science policy implications. Methods This study tests the hypothesis that the value of data for scientific investigators, in terms of the impact of the publications based on the data, decreases over time. The hypothesis is tested formally through a mixed linear effects model using approximately 1200 publications between 2007 and 2013 that used datasets from the Database of Genotypes and Phenotypes, a data-sharing initiative of the National Institutes of Health. Results The analysis shows that the impact factors for publications based on Database of Genotypes and Phenotypes datasets depreciate in a statistically significant manner. However, we further discover that the depreciation rate is slow, only ∼10% per year, on average. Conclusion The enduring value of data for subsequent studies implies that revoking usage for short periods of time may not sufficiently deter those who would violate Data Use Certificate agreements and that alternative penalty mechanisms may need to be invoked.

Download Full-text

Could a New Mode Alternative Modify Psycho-Attitudinal Factors and Travel Behavior?

Transportation Research Record Journal of the Transportation Research Board ◽

10.1177/0361198119843478 ◽

2019 ◽

Vol 2673 (9) ◽

pp. 94-106

Author(s):

Eleonora Sottile ◽

Francesco Piras ◽

Italo Meloni

Keyword(s):

Travel Behavior ◽

Latent Variables ◽

Mode Choice ◽

Choice Models ◽

Individual Characteristics ◽

Policy Implications ◽

Light Rail ◽

Attitudinal Factors ◽

The Impact ◽

Over Time

There is ample consensus that, besides objective characteristics, psycho-attitudinal factors play a key role in influencing people’s mode choice. Hybrid choice models use these theoretical frameworks so as to include latent constructs for capturing the impact of subjective factors on mode choice. But recent work in transportation research raised the question about the ability of hybrid choice models to derive policy implications that aim to change travel behavior, given the focus on cross-sectional data. To address this problem we designed a survey for collecting longitudinal data (socio-economic and psycho-attitudinal) to evaluate, on the one hand, the long-term effects on travel mode choice of the implementation of a new light rail line in the metropolitan area of Cagliari (Italy), on the other to detect any changes in the psycho-attitudinal factors and socio-economic characteristics after implementation of those measures. In particular, the objective of the study is to analyze whether these changes in individual characteristics are able to affect mode choice from a modeling perspective, through the specification and estimation of hybrid models. Our results show that latent variables were not significantly different over waves, showing that the impact of the psychological construct remained stable over time, even after the introduction of the new light rail. Additionally, we found some evidence that the variables that explain the latent variables could change over time.

Download Full-text

Experience Effect in the Impact of Free Trial Promotions

Management Science ◽

10.1287/mnsc.2020.3613 ◽

2020 ◽

Author(s):

Sadat Reza ◽

Hillbun Ho ◽

Rich Ling ◽

Hongyan Shi

Keyword(s):

Policy Implications ◽

Data Set ◽

Free Trial ◽

Data Usage ◽

Free Sample ◽

Experience Good ◽

Usage Level ◽

The Impact ◽

High Percentile ◽

Experience Effect

Although the use of free samples is extensive across industries, the effects of free samples across individuals with varying levels of usage have yet to be systematically examined. The models discussed in the literature consider targeting only the current nonusers of a product. In this research, we examine the question of targeting the current users both analytically and empirically for an experience good. Our analytical discussions highlight the reasons why some current users may be effective targets for free-sample promotions. We then conduct an empirical analysis using a data set on pre- and post-free-sample promotion mobile data usage provided by a telecom firm. The empirical findings are consistent with our analytical results. Specifically, we find the initial usage level to be a key determinant of both the redemption rate of a free-sample offer and the subsequent change in usage owing to free-sample redemption. In our context, the redemption rate increased from the low-percentile users to the high-percentile users. We also find that the change in usage was (weakly) monotonically increasing up to the [Formula: see text] percentile of usage distribution. Beyond the [Formula: see text] percentile, the effect was generally not significant. We discuss the managerial and policy implications of our findings. This paper was accepted by Juanjuan Zhang, marketing.

Download Full-text

What is the Impact of Public Care on Children's Welfare? A Review of Research Findings from England and Wales and their Policy Implications

Journal of Social Policy ◽

10.1017/s0047279409003110 ◽

2009 ◽

Vol 38 (3) ◽

pp. 439-456 ◽

Cited By ~ 34

Author(s):

DONALD FORRESTER ◽

KEITH GOODMAN ◽

CHRISTINE COCKER ◽

CHARLOTTE BINNIE ◽

GRAHAM JENSCH

Keyword(s):

Child Welfare ◽

Policy Implications ◽

Public Care ◽

England And Wales ◽

Number Of Children ◽

Outcomes For Children ◽

Children In Care ◽

The Impact ◽

Over Time ◽

Specific Impact

AbstractThe outcomes for children in public care are generally considered to be poor. This has contributed to a focus on reducing the number of children in care: a goal that is made explicit in the provisions of the current Children and Young Persons Bill. Yet while children in care do less well than most children on a range of measures, such comparisons do not disentangle the extent to which these difficulties pre-dated care and the specific impact of care on child welfare. This article explores the specific impact of care through a review of British research since 1991 that provides data on changes in child welfare over time for children in care. Only 12 studies were identified, indicating a lack of research in this important area. The studies consistently found that children entering care tended to have serious problems but that in general their welfare improved over time. This finding is consistent with the international literature. It has important policy implications. Most significantly it suggests that attempts to reduce the use of public care are misguided, and may place more children at risk of serious harm. Instead, it is argued that England and Wales should move toward a Scandinavian system of public care, in which care is seen as a form of family support and is provided for more rather than fewer children and families.

Download Full-text

The data science tools for research of emigration processes in Ukraine

Problems and Perspectives in Management ◽

10.21511/ppm.18(1).2020.07 ◽

2020 ◽

Vol 18 (1) ◽

pp. 70-81 ◽

Cited By ~ 1

Author(s):

Andrii Roskladka ◽

Nataliia Roskladka ◽

Anatolii Karpuk ◽

Andriy Stavytskyy ◽

Ganna Kharlamova

Keyword(s):

Data Science ◽

Accurate Method ◽

Impact Factors ◽

Third Wave ◽

Host Countries ◽

Economic State ◽

Research Gap ◽

The Impact ◽

Further Development ◽

The Eu

The process of world globalization, labor, and academic mobility, the visa-free regime with the EU countries have caused a significant revival of migration processes in Ukraine. However, there is still the research gap in the most informative, and, at the same time, accurate method of the assessment and forecasting of the migration flows. Thus, the object of research is migration processes (mostly emphasizing the emigration flows). The motives, causes of emigration processes, and their relationship with the economic state were analyzed. The impact factors of external labor migration on the economy of the host countries were revealed, particularly the negative and positive impacts of emigration on the socio-economic situation in Ukraine and the migration attitude of Ukrainians were assessed.The main result of study is further development of the econometric model for forecasting the number of emigrants from Ukraine to other countries in the nearest future. The model considers the factors of minimum wage lavel in Ukraine, the number of open vacancies in the countries of Eastern Europe, and the level of competition for jobs. According to the results of forecasting based on Maple computer algebra system and Microsoft Power BI analytical platform, by the end of 2019, the number of emigrants from Ukraine supposed to be the largest in the last four years and to reach the estimates in the range from 2,444 to 2,550 million people, which may indicate a new third wave of emigration processes.

Download Full-text

Network embedding in biomedical data science

Briefings in Bioinformatics ◽

10.1093/bib/bby117 ◽

2018 ◽

Vol 21 (1) ◽

pp. 182-197 ◽

Cited By ~ 17

Author(s):

Chang Su ◽

Jie Tong ◽

Yongjun Zhu ◽

Peng Cui ◽

Fei Wang

Keyword(s):

Data Science ◽

Dimensional Space ◽

Rapid Development ◽

Deep Understanding ◽

Biomedical Science ◽

Biomedical Data ◽

Network Embedding ◽

Learning Methods ◽

Low Dimensional ◽

Human Healthcare

AbstractOwning to the rapid development of computer technologies, an increasing number of relational data have been emerging in modern biomedical research. Many network-based learning methods have been proposed to perform analysis on such data, which provide people a deep understanding of topology and knowledge behind the biomedical networks and benefit a lot of applications for human healthcare. However, most network-based methods suffer from high computational and space cost. There remain challenges on handling high dimensionality and sparsity of the biomedical networks. The latest advances in network embedding technologies provide new effective paradigms to solve the network analysis problem. It converts network into a low-dimensional space while maximally preserves structural properties. In this way, downstream tasks such as link prediction and node classification can be done by traditional machine learning methods. In this survey, we conduct a comprehensive review of the literature on applying network embedding to advance the biomedical domain. We first briefly introduce the widely used network embedding models. After that, we carefully discuss how the network embedding approaches were performed on biomedical networks as well as how they accelerated the downstream tasks in biomedical science. Finally, we discuss challenges the existing network embedding applications in biomedical domains are faced with and suggest several promising future directions for a better improvement in human healthcare.

Download Full-text

The Structural Influence of Marketing Journals: A Citation Analysis of the Discipline and its Subareas over Time

Journal of Marketing ◽

10.1509/jmkg.67.2.123.18610 ◽

2003 ◽

Vol 67 (2) ◽

pp. 123-139 ◽

Cited By ~ 271

Author(s):

Hans Baumgartner ◽

Rik Pieters

Keyword(s):

Citation Index ◽

Impact Factors ◽

The Social ◽

General Business ◽

The Impact ◽

Changing Role ◽

Structural Influence ◽

Social Sciences Citation Index ◽

Over Time

The authors investigate the overall and subarea influence of a comprehensive set of marketing and marketing-related journals at three points in time during a 30-year period using a citation-based measure of structural influence. The results show that a few journals wield a disproportionate amount of influence in the marketing journal network as a whole and that influential journals tend to derive their influence from many different journals. Different journals are most influential in different subareas of marketing; general business and managerially oriented journals have lost influence, whereas more specialized marketing journals have gained in influence over time. The Journal of Marketing emerges as the most influential marketing journal in the final period (1996–97) and as the journal with the broadest span of influence across all subareas. Yet the Journal of Marketing is notably influential among applied marketing journals, which themselves are of lesser influence. The index of structural influence is significantly correlated with other objective and subjective measures of influence but least so with the impact factors reported in the Social Sciences Citation Index. Overall, the findings demonstrate the rapid maturation of the marketing discipline and the changing role of key journals in the process.

Download Full-text

The Importance of Context: Risk-based De-identification of Biomedical Data

Methods of Information in Medicine ◽

10.3414/me16-01-0012 ◽

2016 ◽

Vol 55 (04) ◽

pp. 347-355 ◽

Cited By ~ 11

Author(s):

Klaus Kuhn ◽

Fabian Prasser ◽

Florian Kohlmayer

Keyword(s):

Data Quality ◽

Data Sharing ◽

Information Content ◽

Biomedical Research ◽

Biomedical Data ◽

Risk Models ◽

Specific Data ◽

Important Challenge ◽

The Impact

Summary Background: Data sharing is a central aspect of modern biomedical research. It is accompanied by significant privacy concerns and often data needs to be protected from re-identification. With methods of de-identification datasets can be transformed in such a way that it becomes extremely difficult to link their records to identified individuals. The most important challenge in this process is to find an adequate balance between an increase in privacy and a decrease in data quality. Objectives: Accurately measuring the risk of re-identification in a specific data sharing scenario is an important aspect of data de-identification. Overestimation of risks will significantly deteriorate data quality, while underestimation will leave data prone to attacks on privacy. Several models have been proposed for measuring risks, but there is a lack of generic methods for risk-based data de-identification. The aim of the work described in this article was to bridge this gap and to show how the quality of de-identified datasets can be improved by using risk models to tailor the process of de-identification to a concrete context. Methods: We implemented a generic de-identification process and several models for measuring re-identification risks into the ARX de-identification tool for biomedical data. By integrating the methods into an existing framework, we were able to automatically transform datasets in such a way that information loss is minimized while it is ensured that re-identification risks meet a user-defined threshold. We performed an extensive experimental evaluation to analyze the impact of using different risk models and assumptions about the goals and the background knowledge of an attacker on the quality of de-identified data. Results: The results of our experiments show that data quality can be improved significantly by using risk models for data de-identification. On a scale where 100 % represents the original input dataset and 0 % represents a dataset from which all information has been removed, the loss of information content could be reduced by up to 10 % when protecting datasets against strong adversaries and by up to 24 % when protecting datasets against weaker adversaries. Conclusions: The methods studied in this article are well suited for protecting sensitive biomedical data and our implementation is available as open-source software. Our results can be used by data custodians to increase the information content of de-identified data by tailoring the process to a specific data sharing scenario. Improving data quality is important for fostering the adoption of de-identification methods in biomedical research.

Download Full-text

Innovation obstacles in an emerging high tech sector

Management Research The Journal of the Iberoamerican Academy of Management ◽

10.1108/mrjiam-11-2018-0883 ◽

2019 ◽

Vol 17 (4) ◽

pp. 474-493

Author(s):

Jeremias Lachman ◽

Andrés López

Keyword(s):

Precision Agriculture ◽

Data Science ◽

Policy Implications ◽

High Tech ◽

Content Type ◽

Innovation Activities ◽

Use Of Data ◽

Knowledge Barriers ◽

The Impact ◽

Cost Factors

Purpose The purpose of this paper is to study the factors that act as innovation obstacles in precision agriculture (PA) technologies in Argentina, one of the world leading exporters of cereals and oilseeds. The focus of this study is on the supply side, i.e. the factors that are perceived by PA firms as obstacles for the expansion of their market. Design/methodology/approach Based on a survey to 67 firms that develop PA technologies in Argentina, this study examines the impact of different types of obstacles on firms’ growth and innovation activities. This analysis is complemented with the results that emerge from a series of interviews with different stakeholders (such as firms’ managers, policymakers and experts). Findings In this study, it was determined that market and cost factors negatively affect firms’ growth, while institutional obstacles reduce the amount of innovation efforts. In turn, knowledge barriers positively impact on the relevance firms assigned to R&D activities. This study helps identify different strategies that firms have put in place to overcome the barriers they face. Finally, policy implications of the results are discussed. Originality/value PA technologies may contribute to greening agricultural production and offer an opportunity for the emergence of domestic suppliers of innovative equipment and services based on the use of data science, artificial intelligence and Internet of Things. To the bets of the authors’ knowledge, this is the first study that explores the obstacles that prevent growth and impact on innovation activities of PA firms. The insights from this study are valuable for both researchers and policymakers aiming to foster emergence of high-tech clusters in developing countries.

Download Full-text

Centers for Mendelian Genomics: A decade of facilitating gene discovery

10.1101/2021.08.24.21261656 ◽

2021 ◽

Author(s):

Samantha M. Baxter ◽

Jennifer E. Posey ◽

Nicole J. Lake ◽

Nara Lygia Sobreira ◽

Jessica X. Chong ◽

...

Keyword(s):

Data Sharing ◽

Data Science ◽

Science Research ◽

Genome Project ◽

Genomic Research ◽

Biomedical Science ◽

Disease Genes ◽

Mendelian Disease ◽

Wide Range ◽

Access To Data

Mendelian disease genomic research has undergone a massive transformation over the last decade. With increasing availability of exome and genome sequencing, the role of Mendelian research has expanded beyond data collection, sequencing, and analysis to worldwide data sharing and collaboration. Over the last 10 years, the NIH-supported Centers for Mendelian Genomics (CMGs) have played a major role in this research and clinical evolution. We highlight the cumulative gene discoveries facilitated by the program, biomedical research leveraged by the approach, and the larger impact on the research community. Mendelian genomic research extends beyond generating lists of gene-phenotype relationships, it includes developing tools, training the larger community to use these tools and approaches, and facilitating collaboration through data sharing. Thus, the CMGs have also focused on creating resources, tools, and training for the larger community to foster the understanding of genes and genome variation. The CMGs have participated in a wide range of data sharing activities, including deposition of all eligible CMG data into AnVIL (NHGRI's Genomic Data Science Analysis, Visualization, and Informatics Lab-Space), sharing candidate genes through Matchmaker Exchange (MME) and the CMG website, and sharing variants in Geno2MP and VariantMatcher. The research genomics output remains exploratory with evidence that thousands of disease genes, in which variant alleles contribute to disease, remain undiscovered, and many patients with rare disease remain molecularly undiagnosed. Strengthening communication between research and clinical labs, continued development and sharing of knowledge and tools required for solving previously unsolved cases, and improving access to data sets, including high-quality metadata, are all required to continue to advance Mendelian genomics research and continue to leverage the Human Genome Project for basic biomedical science research and clinical utility.

Download Full-text

Optimization Model of Traffic Sensor Layout considering Traffic Big Data

Journal of Advanced Transportation ◽

10.1155/2020/8845832 ◽

2020 ◽

Vol 2020 ◽

pp. 1-11

Author(s):

Xu Sun ◽

Zixiu Bai ◽

Kun Lin ◽

Pengpeng Jiao ◽

HuaPu Lu

Keyword(s):

Big Data ◽

Data Sharing ◽

Optimization Model ◽

Programming Model ◽

Influential Factors ◽

Urban Traffic ◽

Traffic Information ◽

Impact Factors ◽

System Cost ◽

The Impact

In order to improve the accuracy, reliability, and economy of urban traffic information collection, an optimization model of traffic sensor layout is proposed in this paper. Considering the impact of traffic big data, a set of impact factors for traffic sensor layout is established, including system cost, multisource data sharing, data demand, sensor failures, road infrastructure, and sensor type. The impacts of these influential factors are taken into account in the traffic sensor layout optimization problem, which is formulated in the form of multiobjective programming model that includes minimum system cost, maximum truncation flow, minimum path coverage, and an origin-destination (OD) coverage constraint. The model is solved by the tolerant lexicographic method based on a genetic algorithm. A case study shows that the model reflects the influence of multisource data sharing and fault conditions and satisfies the origin-destination coverage constraint to achieve the multiobjective optimization of traffic sensor layout.

Download Full-text