scholarly journals Efficiently summarizing relationships in large samples: a general duality between statistics of genealogies and genomes

2019 ◽  
Author(s):  
Peter Ralph ◽  
Kevin Thornton ◽  
Jerome Kelleher

AbstractAs a genetic mutation is passed down across generations, it distinguishes those genomes that have inherited it from those that have not, providing a glimpse of the genealogical tree relating the genomes to each other at that site. Statistical summaries of genetic variation therefore also describe the underlying genealogies. We use this correspondence to define a general framework that efficiently computes single-site population genetic statistics using the succinct tree sequence encoding of genealogies and genome sequence. The general approach accumulates “sample weights” within the genealogical tree at each position on the genome, which are then combined using a “summary function”; different statistics result from different choices of weight and function. Results can be reported in three ways: by site, which corresponds to statistics calculated as usual from genome sequence; by branch, which gives the expected value of the dual site statistic under the infinite-sites model of mutation, and by node, which summarizes the contribution of each ancestor to these statistics. We use the framework to implement many currently-defined statistics of genome sequence (making the statistics’ relationship to the underlying genealogical trees concrete and explicit), as well as the corresponding “branch” statistics of tree shape. We evaluate computational performance using simulated data, and show that calculating statistics from tree sequences using this general framework is several orders of magnitude more efficient than optimized matrix-based methods in terms of both run time and memory requirements. We also explore how well the duality between site and branch statistics holds in practice on trees inferred from the 1000 Genomes Project dataset, and discuss ways in which deviations may encode interesting biological signals.

Genetics ◽  
2020 ◽  
Vol 215 (3) ◽  
pp. 779-797 ◽  
Author(s):  
Peter Ralph ◽  
Kevin Thornton ◽  
Jerome Kelleher

As a genetic mutation is passed down across generations, it distinguishes those genomes that have inherited it from those that have not, providing a glimpse of the genealogical tree relating the genomes to each other at that site. Statistical summaries of genetic variation therefore also describe the underlying genealogies. We use this correspondence to define a general framework that efficiently computes single-site population genetic statistics using the succinct tree sequence encoding of genealogies and genome sequence. The general approach accumulates sample weights within the genealogical tree at each position on the genome, which are then combined using a summary function; different statistics result from different choices of weight and function. Results can be reported in three ways: by site, which corresponds to statistics calculated as usual from genome sequence; by branch, which gives the expected value of the dual site statistic under the infinite sites model of mutation, and by node, which summarizes the contribution of each ancestor to these statistics. We use the framework to implement many currently defined statistics of genome sequence (making the statistics’ relationship to the underlying genealogical trees concrete and explicit), as well as the corresponding branch statistics of tree shape. We evaluate computational performance using simulated data, and show that calculating statistics from tree sequences using this general framework is several orders of magnitude more efficient than optimized matrix-based methods in terms of both run time and memory requirements. We also explore how well the duality between site and branch statistics holds in practice on trees inferred from the 1000 Genomes Project data set, and discuss ways in which deviations may encode interesting biological signals.


2021 ◽  
Vol 7 (29) ◽  
pp. eabc0776
Author(s):  
Nathan K. Schaefer ◽  
Beth Shapiro ◽  
Richard E. Green

Many humans carry genes from Neanderthals, a legacy of past admixture. Existing methods detect this archaic hominin ancestry within human genomes using patterns of linkage disequilibrium or direct comparison to Neanderthal genomes. Each of these methods is limited in sensitivity and scalability. We describe a new ancestral recombination graph inference algorithm that scales to large genome-wide datasets and demonstrate its accuracy on real and simulated data. We then generate a genome-wide ancestral recombination graph including human and archaic hominin genomes. From this, we generate a map within human genomes of archaic ancestry and of genomic regions not shared with archaic hominins either by admixture or incomplete lineage sorting. We find that only 1.5 to 7% of the modern human genome is uniquely human. We also find evidence of multiple bursts of adaptive changes specific to modern humans within the past 600,000 years involving genes related to brain development and function.


2020 ◽  
Vol 21 (3) ◽  
pp. 711
Author(s):  
Asami Ueda ◽  
Mitsuo Umetsu ◽  
Takeshi Nakanishi ◽  
Kentaro Hashikami ◽  
Hikaru Nakazawa ◽  
...  

Antibodies are composed of structurally and functionally independent domains that can be used as building blocks to construct different types of chimeric protein-format molecules. However, the generally used genetic fusion and chemical approaches restrict the types of structures that can be formed and do not give an ideal degree of homogeneity. In this study, we combined mutation techniques with chemical conjugation to construct a variety of homogeneous bivalent and bispecific antibodies. First, building modules without lysine residues—which can be chemical conjugation sites—were generated by means of genetic mutation. Specific mutated residues in the lysine-free modules were then re-mutated to lysine residues. Chemical conjugation at the recovered lysine sites enabled the construction of homogeneous bivalent and bispecific antibodies from block modules that could not have been so arranged by genetic fusion approaches. Molecular evolution and bioinformatics techniques assisted in finding viable alternatives to the lysine residues that did not deactivate the block modules. Multiple candidates for re-mutation positions offer a wide variety of possible steric arrangements of block modules, and appropriate linkages between block modules can generate highly bioactive bispecific antibodies. Here, we propose the effectiveness of the lysine-free block module design for site-specific chemical conjugation to form a variety of types of homogeneous chimeric protein-format molecule with a finely tuned structure and function.


2019 ◽  
Vol 116 (31) ◽  
pp. 15407-15413 ◽  
Author(s):  
Mincheng Wu ◽  
Shibo He ◽  
Yongtao Zhang ◽  
Jiming Chen ◽  
Youxian Sun ◽  
...  

Centrality is widely recognized as one of the most critical measures to provide insight into the structure and function of complex networks. While various centrality measures have been proposed for single-layer networks, a general framework for studying centrality in multilayer networks (i.e., multicentrality) is still lacking. In this study, a tensor-based framework is introduced to study eigenvector multicentrality, which enables the quantification of the impact of interlayer influence on multicentrality, providing a systematic way to describe how multicentrality propagates across different layers. This framework can leverage prior knowledge about the interplay among layers to better characterize multicentrality for varying scenarios. Two interesting cases are presented to illustrate how to model multilayer influence by choosing appropriate functions of interlayer influence and design algorithms to calculate eigenvector multicentrality. This framework is applied to analyze several empirical multilayer networks, and the results corroborate that it can quantify the influence among layers and multicentrality of nodes effectively.


Radiocarbon ◽  
2007 ◽  
Vol 49 (2) ◽  
pp. 565-578 ◽  
Author(s):  
Adam Michczyński ◽  
Peter Eeckhout ◽  
Anna Pazdur ◽  
Jacek Pawlyta

The ongoing Ychsma Project aims to shed light on the chronology and function of the late Prehispanic period at the well-known archaeological site of Pachacamac, Peru, through extensive archaeological research. The Temple of the Monkey is a special building that has been cleared, mapped, and excavated within the general framework of the study of “pyramids with ramps,” the most common form of monumental architecture at the site. Through the application of radiocarbon measurements, it can be shown that the temple has been used for around 150 yr and therefore is quite different from other pyramids with ramps previously studied (see Michczyński et al. 2003). Details of the temple, 14C sample selection, and methodology, as well as results, are discussed in this paper. The research has allowed us to make significant advances in the current understanding of pyramids with ramps and the function of the site of Pachacamac as a whole.


2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Maria Cartolano ◽  
Nima Abedpour ◽  
Viktor Achter ◽  
Tsun-Po Yang ◽  
Sandra Ackermann ◽  
...  

Abstract The identification of the mutational processes operating in tumour cells has implications for cancer diagnosis and therapy. These processes leave mutational patterns on the cancer genomes, which are referred to as mutational signatures. Recently, 81 mutational signatures have been inferred using computational algorithms on sequencing data of 23,879 samples. However, these published signatures may not always offer a comprehensive view on the biological processes underlying tumour types that are not included or underrepresented in the reference studies. To circumvent this problem, we designed CaMuS (Cancer Mutational Signatures) to construct de novo signatures while simultaneously fitting publicly available mutational signatures. Furthermore, we propose to estimate signature similarity by comparing probability distributions using the Hellinger distance. We applied CaMuS to infer signatures of mutational processes in poorly studied cancer types. We used whole genome sequencing data of 56 neuroblastoma, thus providing evidence for the versatility of CaMuS. Using simulated data, we compared the performance of CaMuS to sigfit, a recently developed algorithm with comparable inference functionalities. CaMuS and sigfit reconstructed the simulated datasets with similar accuracy; however two main features may argue for CaMuS over sigfit: (i) superior computational performance and (ii) a reliable parameter selection method to avoid spurious signatures.


Author(s):  
Brodie Mather ◽  
Bonnie J Dorr ◽  
Owen Rambow ◽  
Tomek Strzalkowski

We present a generalized framework for domain-specialized stance detection, focusing on Covid-19 as a use case. We define a stance as a predicate-argument structure (combination of an action and its participants) in a simplified one-argument format, e.g., wear(a mask), coupled with a task-specific belief category representing the purpose (e.g., protection) of an argument (e.g., mask) in the context of its predicate (e.g., wear), as constrained by the domain (e.g., Covid-19). A belief category PROTECT captures a belief such as “masks provide protection,” whereas RESTRICT captures a belief such as “mask mandates limit freedom.” A stance combines a belief proposition, e.g., PROTECT(wear(a mask)), with a sentiment toward this proposition. From this, an overall positive attitude toward mask wearing is extracted. The notions purpose and function serve as natural constraints on the choice of belief categories during resource building which, in turn, constrains stance detection. We demonstrate that linguistic constraints (e.g., light verb processing) further refine the choice of predicate-argument pairings for belief and sentiment assignments, yielding significant increases in F1 score for stance detection over a strong baseline.


2017 ◽  
Author(s):  
Diptavo Dutta ◽  
Laura Scott ◽  
Michael Boehnke ◽  
Seunggeun Lee

In genetic association analysis, a joint test of multiple distinct phenotypes can increase power to identify sets of trait-associated variants within genes or regions of interest. Existing multi-phenotype tests for rare variants make specific assumptions about the patterns of association of underlying causal variants, and the violation of these assumptions can reduce power to detect association. Here we develop a general framework for testing pleiotropic effects of rare variants based on multivariate kernel regression (Multi-SKAT). Multi-SKAT models effect sizes of variants on the phenotypes through a kernel matrix and performs a variance component test of association. We show that many existing tests are equivalent to specific choices of kernel matrices with the Multi-SKAT framework. To increase power to detect association across tests with different kernel matrices, we developed a fast and accurate approximation of the significance of the minimum observed p-value across tests. To account for related individuals, our framework uses a random effects for the kinship matrix. Using simulated data and amino acid and exome-array data from the METSIM study, we show that Multi-SKAT can improve power over single-phenotype SKAT-O test and existing multiple phenotype tests, while maintaining type I error rate.


2010 ◽  
Vol 192 (24) ◽  
pp. 6497-6498 ◽  
Author(s):  
Lisa Y. Stein ◽  
Sukhwan Yoon ◽  
Jeremy D. Semrau ◽  
Alan A. DiSpirito ◽  
Andrew Crombie ◽  
...  

ABSTRACT Methylosinus trichosporium OB3b (for “oddball” strain 3b) is an obligate aerobic methane-oxidizing alphaproteobacterium that was originally isolated in 1970 by Roger Whittenbury and colleagues. This strain has since been used extensively to elucidate the structure and function of several key enzymes of methane oxidation, including both particulate and soluble methane monooxygenase (sMMO) and the extracellular copper chelator methanobactin. In particular, the catalytic properties of soluble methane monooxygenase from M. trichosporium OB3b have been well characterized in context with biodegradation of recalcitrant hydrocarbons, such as trichloroethylene. The sequence of the M. trichosporium OB3b genome is the first reported from a member of the Methylocystaceae family in the order Rhizobiales.


2020 ◽  
Vol 29 (15) ◽  
pp. 2508-2522
Author(s):  
Hervé Husson ◽  
Nikolay O Bukanov ◽  
Sarah Moreno ◽  
Mandy M Smith ◽  
Brenda Richards ◽  
...  

Abstract Bardet–Biedl syndrome (BBS) is a pleiotropic autosomal recessive ciliopathy affecting multiple organs. The development of potential disease-modifying therapy for BBS will require concurrent targeting of multi-systemic manifestations. Here, we show for the first time that monosialodihexosylganglioside accumulates in Bbs2−/− cilia, indicating impairment of glycosphingolipid (GSL) metabolism in BBS. Consequently, we tested whether BBS pathology in Bbs2−/− mice can be reversed by targeting the underlying ciliary defect via reduction of GSL metabolism. Inhibition of GSL synthesis with the glucosylceramide synthase inhibitor Genz-667161 decreases the obesity, liver disease, retinal degeneration and olfaction defect in Bbs2−/− mice. These effects are secondary to preservation of ciliary structure and signaling, and stimulation of cellular differentiation. In conclusion, reduction of GSL metabolism resolves the multi-organ pathology of Bbs2−/− mice by directly preserving ciliary structure and function towards a normal phenotype. Since this approach does not rely on the correction of the underlying genetic mutation, it might translate successfully as a treatment for other ciliopathies.


Sign in / Sign up

Export Citation Format

Share Document