scholarly journals Economic Predictions With Big Data: The Illusion of Sparsity

Econometrica ◽  
2021 ◽  
Vol 89 (5) ◽  
pp. 2409-2437 ◽  
Author(s):  
Domenico Giannone ◽  
Michele Lenza ◽  
Giorgio E. Primiceri

We compare sparse and dense representations of predictive models in macroeconomics, microeconomics, and finance. To deal with a large number of possible predictors, we specify a prior that allows for both variable selection and shrinkage. The posterior distribution does not typically concentrate on a single sparse model, but on a wide set of models that often include many predictors.

2018 ◽  
Author(s):  
Gao Wang ◽  
Abhishek Sarkar ◽  
Peter Carbonetto ◽  
Matthew Stephens

We introduce a simple new approach to variable selection in linear regression, with a particular focus on quantifying uncertainty in which variables should be selected. The approach is based on a new model — the “Sum of Single Effects” (SuSiE) model — which comes from writing the sparse vector of regression coefficients as a sum of “single-effect” vectors, each with one non-zero element. We also introduce a corresponding new fitting procedure — Iterative Bayesian Stepwise Selection (IBSS) — which is a Bayesian analogue of stepwise selection methods. IBSS shares the computational simplicity and speed of traditional stepwise methods, but instead of selecting a single variable at each step, IBSS computes a distribution on variables that captures uncertainty in which variable to select. We provide a formal justification of this intuitive algorithm by showing that it optimizes a variational approximation to the posterior distribution under the SuSiE model. Further, this approximate posterior distribution naturally yields convenient novel summaries of uncertainty in variable selection, providing a Credible Set of variables for each selection. Our methods are particularly well-suited to settings where variables are highly correlated and detectable effects are sparse, both of which are characteristics of genetic fine-mapping applications. We demonstrate through numerical experiments that our methods outper-form existing methods for this task, and illustrate their application to fine-mapping genetic variants influencing alternative splicing in human cell-lines. We also discuss the potential and challenges for applying these methods to generic variable selection problems.


Web Services ◽  
2019 ◽  
pp. 2230-2254
Author(s):  
Amandeep Kaur Kahlon ◽  
Ashok Sharma

The major concern in this chapter is to understand the need of system biology in prediction models in studying tuberculosis infection in the big data era. The overall complexity of biological phenomenon, such as biochemical, biophysical, and other molecular processes, within pathogen as well as their interaction with host is studied through system biology approaches. First, consideration is given to the necessity of prediction models integrating system biology approaches and later on for their replacement and refinement using high throughput data. Various ongoing projects, consortium, databases, and research groups involved in tuberculosis eradication are also discussed. This chapter provides a brief account of TB predictive models and their importance in system biology to study tuberculosis and host-pathogen interactions. This chapter also addresses big data resources and applications, data management, limitations, challenges, solutions, and future directions.


Web Services ◽  
2019 ◽  
pp. 618-638
Author(s):  
Goran Klepac ◽  
Kristi L. Berg

This chapter proposes a new analytical approach that consolidates the traditional analytical approach for solving problems such as churn detection, fraud detection, building predictive models, segmentation modeling with data sources, and analytical techniques from the big data area. Presented are solutions offering a structured approach for the integration of different concepts into one, which helps analysts as well as managers to use potentials from different areas in a systematic way. By using this concept, companies have the opportunity to introduce big data potential in everyday data mining projects. As is visible from the chapter, neglecting big data potentials results often with incomplete analytical results, which imply incomplete information for business decisions and can imply bad business decisions. The chapter also provides suggestions on how to recognize useful data sources from the big data area and how to analyze them along with traditional data sources for achieving more qualitative information for business decisions.


2017 ◽  
Vol 35 (31_suppl) ◽  
pp. 59-59
Author(s):  
Marie C. Haverfield ◽  
Adam Singer ◽  
Karl Lorenz

59 Background: The development of “big data” methods offers an opportunity to more precisely predict patient outcomes. We explored physicians, patients, and caregivers’ perspectives about the use of predictive models in oncology practice. Methods: We conducted 12 patient, 12 provider, and 12 caregiver interviews (N = 36) from Stanford University outpatient oncology clinics. We queried participants about patient and family-centered applications of predictive models for prognosis, cost, and novel patient and family-centered outcomes. Two trained coders iteratively examined transcripts for consistent topics and used the constant comparative methods to establish themes and sub-themes. Results: Several overlapping themes emerged: 1) Outcomes of Interest, [provider] “Predictive information about side effects or adverse effects of treatment would be helpful”: 2) Barriers to Using Predictions, [patient] “If it seems too sort of set in stone, without…you know, everything has grey areas”; 3) Benefits to Using Predictions, [provider] “Some people…their cancer may be cured, but they live with these really horrible chronic illnesses and some people would say, ‘I would have rather have just died from my disease than be in this shape’; and 4) Communication Strategy, [provider] “I’m not even sure if I would bring up the models…I would kind of fall back on what I normally discuss with patients”. A theme specific to the provider group was 5) Accuracy of Model Information, [provider] “It’s hard to know whether to use in the clinical setting just the results of the model or whether you would really want to go down to the root level and actually access the raw data”. A theme specific to the patient and caregiver groups was 6) Privacy, [caregiver] “I would want to be able to have the patient authorize that”. Conclusions: There is consistency between provider strategies to communicate prognostic information and patients’ perceptions of how they would like prognostic information to be communicated to them. While providers are concerned with accuracy of predictive models, patients and caregivers are more concerned with privacy.


Author(s):  
Lili Aunimo ◽  
Ari V. Alamäki ◽  
Harri Ketamo

Constructing a big data governance framework is important when a company performs data-driven software development. The most important aspects of big data governance are data privacy, security, availability, usability, and integrity. In this chapter, the authors present a business case where a framework for big data governance has been built. The business case is about the development and continuous improvement of a new mobile application that is targeted for consumers. In this context, big data is used in product development, in building predictive modes related to the users and for personalization of the product. The main finding of the study is a novel big data governance framework and that a proper framework for big data governance is useful when building and maintaining trustworthy and value adding big data-driven predictive models in an authentic business environment.


Sign in / Sign up

Export Citation Format

Share Document