scholarly journals Using Aggregated Relational Data to Feasibly Identify Network Structure without Network Data

2017 ◽  
Author(s):  
Emily Breza ◽  
Arun Chandrasekhar ◽  
Tyler McCormick ◽  
Mengjie Pan
2020 ◽  
Vol 110 (8) ◽  
pp. 2454-2484 ◽  
Author(s):  
Emily Breza ◽  
Arun G. Chandrasekhar ◽  
Tyler H. McCormick ◽  
Mengjie Pan

Social network data are often prohibitively expensive to collect, limiting empirical network research. We propose an inexpensive and feasible strategy for network elicitation using Aggregated Relational Data (ARD): responses to questions of the form “how many of your links have trait k ?” Our method uses ARD to recover parameters of a network formation model, which permits sampling from a distribution over node- or graph-level statistics. We replicate the results of two field experiments that used network data and draw similar conclusions with ARD alone. (JEL C81, C93, D85, Z13)


2020 ◽  
Vol 36 (Supplement_1) ◽  
pp. i464-i473
Author(s):  
Kapil Devkota ◽  
James M Murphy ◽  
Lenore J Cowen

Abstract Motivation One of the core problems in the analysis of biological networks is the link prediction problem. In particular, existing interactions networks are noisy and incomplete snapshots of the true network, with many true links missing because those interactions have not yet been experimentally observed. Methods to predict missing links have been more extensively studied for social than for biological networks; it was recently argued that there is some special structure in protein–protein interaction (PPI) network data that might mean that alternate methods may outperform the best methods for social networks. Based on a generalization of the diffusion state distance, we design a new embedding-based link prediction method called global and local integrated diffusion embedding (GLIDE). GLIDE is designed to effectively capture global network structure, combined with alternative network type-specific customized measures that capture local network structure. We test GLIDE on a collection of three recently curated human biological networks derived from the 2016 DREAM disease module identification challenge as well as a classical version of the yeast PPI network in rigorous cross validation experiments. Results We indeed find that different local network structure is dominant in different types of biological networks. We find that the simple local network measures are dominant in the highly connected network core between hub genes, but that GLIDE’s global embedding measure adds value in the rest of the network. For example, we make GLIDE-based link predictions from genes known to be involved in Crohn’s disease, to genes that are not known to have an association, and make some new predictions, finding support in other network data and the literature. Availability and implementation GLIDE can be downloaded at https://bitbucket.org/kap_devkota/glide. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Mark Newman

This chapter introduces the mathematics of network statistics, the quantification of errors in network data, and the estimation of network structure in the presence of error. The discussion starts with a summary of the types of error that can occur in network data and the empirical sources of those errors. The remainder of the chapter is given over to a discussion of the theory of network statistics, beginning with a review of the theory for ordinary real-valued (non-network) data, then developing the expectation-maximization (EM) algorithm for estimating network structure and error levels in the presence of error, with example applications. The chapter ends with a discussion of error correction methods such as link prediction and node disambiguation.


2021 ◽  
Author(s):  
Abigail Z. Jacobs ◽  
Duncan J. Watts

Theories of organizations are sympathetic to long-standing ideas from network science that organizational networks should be regarded as multiscale and capable of displaying emergent properties. However, the historical difficulty of collecting individual-level network data for many (N ≫ 1) organizations, each of which comprises many (n ≫ 1) individuals, has hobbled efforts to develop specific, theoretically motivated hypotheses connecting micro- (i.e., individual-level) network structure with macro-organizational properties. In this paper we seek to stimulate such efforts with an exploratory analysis of a unique data set of aggregated, anonymized email data from an enterprise email system that includes 1.8 billion messages sent by 1.4 million users from 65 publicly traded U.S. firms spanning a wide range of sizes and 7 industrial sectors. We uncover wide heterogeneity among firms with respect to all measured network characteristics, and we find robust network and organizational variation as a result of size. Interestingly, we find no clear associations between organizational network structure and firm age, industry, or performance; however, we do find that centralization increases with geographical dispersion—a result that is not explained by network size. Although preliminary, these results raise new questions for organizational theory as well as new issues for collecting, processing, and interpreting digital network data. This paper was accepted by David Simchi-Levi, Special Issue of Management Science: 65th Anniversary.


2021 ◽  
Vol 7 (23) ◽  
pp. eabb8762
Author(s):  
Antonia Godoy-Lorite ◽  
Nick S. Jones

Population behavior, like voting and vaccination, depends on the structure of social networks. This structure can differ depending on behavior type and is typically hidden. However, we do often have behavioral data, albeit only snapshots taken at one time point. We present a method jointly inferring a model for both network structure and human behavior using only snapshot population-level behavioral data. This exploits the simplicity of a few parameter model, geometric sociodemographic network model, and a spin-based model of behavior. We illustrate, for the European Union referendum and two London mayoral elections, how the model offers both prediction and the interpretation of the homophilic inclinations of the population. Beyond extracting behavior-specific network structure from behavioral datasets, our approach yields a framework linking inequalities and social preferences to behavioral outcomes. We illustrate potential network-sensitive policies: How changes to income inequality, social temperature, and homophilic preferences might have reduced polarization in a recent election.


2017 ◽  
Vol 10 (13) ◽  
pp. 112
Author(s):  
Sharath Kumar J ◽  
Maheswari N

In this era of 20th century, online social network like Facebook, twitter, etc. plays a very important role in everyone’s life. Social network data, regarding any individual organization can be published online at any time, in which there is a risk of information leakage of anyone’s personal data. So preserving the privacy of individual organizations and companies are needed before data is published online. Therefore the research was carried out in this area for many years and it is still going on. There have been various existing techniques that provide the solutions for preserving privacy to tabular data called as relational data and also social network data represented in graphs. Different techniques exists for tabular data but you can’t apply directly to the structured complex graph  data,which consists of vertices represented as individuals and edges representing some kind of connection or relationship between the nodes. Various techniques like K-anonymity, L-diversity, and T-closeness exist to provide privacy to nodes and techniques like edge perturbation, edge randomization are there to provide privacy to edges in social graphs. Development of new techniques by  Integration to exiting techniques like K-anonymity ,edge perturbation, edge randomization, L-Diversity are still going on to provide more privacy to relational data and social network data are ongoingin the best possible manner. 


Sign in / Sign up

Export Citation Format

Share Document