Efficiently summarizing relationships in large samples: a general duality between statistics of genealogies and genomes

Mapping Intimacies ◽

10.1101/779132 ◽

2019 ◽

Cited By ~ 3

Author(s):

Peter Ralph ◽

Kevin Thornton ◽

Jerome Kelleher

Keyword(s):

Genome Sequence ◽

General Framework ◽

Simulated Data ◽

Genetic Mutation ◽

Genealogical Tree ◽

Computational Performance ◽

Infinite Sites Model ◽

And Function ◽

General Duality ◽

Dual Site

AbstractAs a genetic mutation is passed down across generations, it distinguishes those genomes that have inherited it from those that have not, providing a glimpse of the genealogical tree relating the genomes to each other at that site. Statistical summaries of genetic variation therefore also describe the underlying genealogies. We use this correspondence to define a general framework that efficiently computes single-site population genetic statistics using the succinct tree sequence encoding of genealogies and genome sequence. The general approach accumulates “sample weights” within the genealogical tree at each position on the genome, which are then combined using a “summary function”; different statistics result from different choices of weight and function. Results can be reported in three ways: by site, which corresponds to statistics calculated as usual from genome sequence; by branch, which gives the expected value of the dual site statistic under the infinite-sites model of mutation, and by node, which summarizes the contribution of each ancestor to these statistics. We use the framework to implement many currently-defined statistics of genome sequence (making the statistics’ relationship to the underlying genealogical trees concrete and explicit), as well as the corresponding “branch” statistics of tree shape. We evaluate computational performance using simulated data, and show that calculating statistics from tree sequences using this general framework is several orders of magnitude more efficient than optimized matrix-based methods in terms of both run time and memory requirements. We also explore how well the duality between site and branch statistics holds in practice on trees inferred from the 1000 Genomes Project dataset, and discuss ways in which deviations may encode interesting biological signals.

Download Full-text

Efficiently Summarizing Relationships in Large Samples: A General Duality Between Statistics of Genealogies and Genomes

Genetics ◽

10.1534/genetics.120.303253 ◽

2020 ◽

Vol 215 (3) ◽

pp. 779-797 ◽

Cited By ~ 3

Author(s):

Peter Ralph ◽

Kevin Thornton ◽

Jerome Kelleher

Keyword(s):

Genome Sequence ◽

General Framework ◽

Simulated Data ◽

Genetic Mutation ◽

Data Set ◽

Genealogical Tree ◽

Computational Performance ◽

Infinite Sites Model ◽

Project Data ◽

And Function

As a genetic mutation is passed down across generations, it distinguishes those genomes that have inherited it from those that have not, providing a glimpse of the genealogical tree relating the genomes to each other at that site. Statistical summaries of genetic variation therefore also describe the underlying genealogies. We use this correspondence to define a general framework that efficiently computes single-site population genetic statistics using the succinct tree sequence encoding of genealogies and genome sequence. The general approach accumulates sample weights within the genealogical tree at each position on the genome, which are then combined using a summary function; different statistics result from different choices of weight and function. Results can be reported in three ways: by site, which corresponds to statistics calculated as usual from genome sequence; by branch, which gives the expected value of the dual site statistic under the infinite sites model of mutation, and by node, which summarizes the contribution of each ancestor to these statistics. We use the framework to implement many currently defined statistics of genome sequence (making the statistics’ relationship to the underlying genealogical trees concrete and explicit), as well as the corresponding branch statistics of tree shape. We evaluate computational performance using simulated data, and show that calculating statistics from tree sequences using this general framework is several orders of magnitude more efficient than optimized matrix-based methods in terms of both run time and memory requirements. We also explore how well the duality between site and branch statistics holds in practice on trees inferred from the 1000 Genomes Project data set, and discuss ways in which deviations may encode interesting biological signals.

Download Full-text

An ancestral recombination graph of human, Neanderthal, and Denisovan genomes

Science Advances ◽

10.1126/sciadv.abc0776 ◽

2021 ◽

Vol 7 (29) ◽

pp. eabc0776

Author(s):

Nathan K. Schaefer ◽

Beth Shapiro ◽

Richard E. Green

Keyword(s):

Incomplete Lineage Sorting ◽

Simulated Data ◽

Modern Human ◽

Ancestral Recombination Graph ◽

Lineage Sorting ◽

Human Genomes ◽

Genome Wide ◽

A Genome ◽

Graph Inference ◽

And Function

Many humans carry genes from Neanderthals, a legacy of past admixture. Existing methods detect this archaic hominin ancestry within human genomes using patterns of linkage disequilibrium or direct comparison to Neanderthal genomes. Each of these methods is limited in sensitivity and scalability. We describe a new ancestral recombination graph inference algorithm that scales to large genome-wide datasets and demonstrate its accuracy on real and simulated data. We then generate a genome-wide ancestral recombination graph including human and archaic hominin genomes. From this, we generate a map within human genomes of archaic ancestry and of genomic regions not shared with archaic hominins either by admixture or incomplete lineage sorting. We find that only 1.5 to 7% of the modern human genome is uniquely human. We also find evidence of multiple bursts of adaptive changes specific to modern humans within the past 600,000 years involving genes related to brain development and function.

Download Full-text

Chemically Crosslinked Bispecific Antibodies for Cancer Therapy: Breaking from the Structural Restrictions of the Genetic Fusion Approach

International Journal of Molecular Sciences ◽

10.3390/ijms21030711 ◽

2020 ◽

Vol 21 (3) ◽

pp. 711

Author(s):

Asami Ueda ◽

Mitsuo Umetsu ◽

Takeshi Nakanishi ◽

Kentaro Hashikami ◽

Hikaru Nakazawa ◽

...

Keyword(s):

Chimeric Protein ◽

Building Blocks ◽

Genetic Mutation ◽

Bispecific Antibodies ◽

Specific Chemical ◽

Module Design ◽

Chemical Conjugation ◽

Lysine Residues ◽

Genetic Fusion ◽

And Function

Antibodies are composed of structurally and functionally independent domains that can be used as building blocks to construct different types of chimeric protein-format molecules. However, the generally used genetic fusion and chemical approaches restrict the types of structures that can be formed and do not give an ideal degree of homogeneity. In this study, we combined mutation techniques with chemical conjugation to construct a variety of homogeneous bivalent and bispecific antibodies. First, building modules without lysine residues—which can be chemical conjugation sites—were generated by means of genetic mutation. Specific mutated residues in the lysine-free modules were then re-mutated to lysine residues. Chemical conjugation at the recovered lysine sites enabled the construction of homogeneous bivalent and bispecific antibodies from block modules that could not have been so arranged by genetic fusion approaches. Molecular evolution and bioinformatics techniques assisted in finding viable alternatives to the lysine residues that did not deactivate the block modules. Multiple candidates for re-mutation positions offer a wide variety of possible steric arrangements of block modules, and appropriate linkages between block modules can generate highly bioactive bispecific antibodies. Here, we propose the effectiveness of the lysine-free block module design for site-specific chemical conjugation to form a variety of types of homogeneous chimeric protein-format molecule with a finely tuned structure and function.

Download Full-text

A tensor-based framework for studying eigenvector multicentrality in multilayer networks

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1801378116 ◽

2019 ◽

Vol 116 (31) ◽

pp. 15407-15413 ◽

Cited By ~ 11

Author(s):

Mincheng Wu ◽

Shibo He ◽

Yongtao Zhang ◽

Jiming Chen ◽

Youxian Sun ◽

...

Keyword(s):

Complex Networks ◽

Prior Knowledge ◽

General Framework ◽

Single Layer ◽

Structure And Function ◽

Centrality Measures ◽

Multilayer Networks ◽

And Function ◽

The Impact ◽

Insight Into

Centrality is widely recognized as one of the most critical measures to provide insight into the structure and function of complex networks. While various centrality measures have been proposed for single-layer networks, a general framework for studying centrality in multilayer networks (i.e., multicentrality) is still lacking. In this study, a tensor-based framework is introduced to study eigenvector multicentrality, which enables the quantification of the impact of interlayer influence on multicentrality, providing a systematic way to describe how multicentrality propagates across different layers. This framework can leverage prior knowledge about the interplay among layers to better characterize multicentrality for varying scenarios. Two interesting cases are presented to illustrate how to model multilayer influence by choosing appropriate functions of interlayer influence and design algorithms to calculate eigenvector multicentrality. This framework is applied to analyze several empirical multilayer networks, and the results corroborate that it can quantify the influence among layers and multicentrality of nodes effectively.

Download Full-text

Radiocarbon Dating of the Temple of the Monkey—The Next Step Towards a Comprehensive Absolute Chronology of Pachacamac, Peru

Radiocarbon ◽

10.1017/s0033822200042478 ◽

2007 ◽

Vol 49 (2) ◽

pp. 565-578 ◽

Cited By ~ 3

Author(s):

Adam Michczyński ◽

Peter Eeckhout ◽

Anna Pazdur ◽

Jacek Pawlyta

Keyword(s):

General Framework ◽

Radiocarbon Dating ◽

Sample Selection ◽

Archaeological Site ◽

Current Understanding ◽

Archaeological Research ◽

Monumental Architecture ◽

And Function ◽

The Temple ◽

Shed Light

The ongoing Ychsma Project aims to shed light on the chronology and function of the late Prehispanic period at the well-known archaeological site of Pachacamac, Peru, through extensive archaeological research. The Temple of the Monkey is a special building that has been cleared, mapped, and excavated within the general framework of the study of “pyramids with ramps,” the most common form of monumental architecture at the site. Through the application of radiocarbon measurements, it can be shown that the temple has been used for around 150 yr and therefore is quite different from other pyramids with ramps previously studied (see Michczyński et al. 2003). Details of the temple, 14C sample selection, and methodology, as well as results, are discussed in this paper. The research has allowed us to make significant advances in the current understanding of pyramids with ramps and the function of the site of Pachacamac as a whole.

Download Full-text

CaMuS: simultaneous fitting and de novo imputation of cancer mutational signature

Scientific Reports ◽

10.1038/s41598-020-75753-8 ◽

2020 ◽

Vol 10 (1) ◽

Author(s):

Maria Cartolano ◽

Nima Abedpour ◽

Viktor Achter ◽

Tsun-Po Yang ◽

Sandra Ackermann ◽

...

Keyword(s):

De Novo ◽

Probability Distributions ◽

Simulated Data ◽

Whole Genome Sequencing Data ◽

Sequencing Data ◽

Mutational Signatures ◽

Computational Performance ◽

Reliable Parameter ◽

Similar Accuracy ◽

Mutational Processes

Abstract The identification of the mutational processes operating in tumour cells has implications for cancer diagnosis and therapy. These processes leave mutational patterns on the cancer genomes, which are referred to as mutational signatures. Recently, 81 mutational signatures have been inferred using computational algorithms on sequencing data of 23,879 samples. However, these published signatures may not always offer a comprehensive view on the biological processes underlying tumour types that are not included or underrepresented in the reference studies. To circumvent this problem, we designed CaMuS (Cancer Mutational Signatures) to construct de novo signatures while simultaneously fitting publicly available mutational signatures. Furthermore, we propose to estimate signature similarity by comparing probability distributions using the Hellinger distance. We applied CaMuS to infer signatures of mutational processes in poorly studied cancer types. We used whole genome sequencing data of 56 neuroblastoma, thus providing evidence for the versatility of CaMuS. Using simulated data, we compared the performance of CaMuS to sigfit, a recently developed algorithm with comparable inference functionalities. CaMuS and sigfit reconstructed the simulated datasets with similar accuracy; however two main features may argue for CaMuS over sigfit: (i) superior computational performance and (ii) a reliable parameter selection method to avoid spurious signatures.

Download Full-text

A General Framework for Domain-Specialization of Stance Detection

The International FLAIRS Conference Proceedings ◽

10.32473/flairs.v34i1.128457 ◽

2021 ◽

Vol 34 (1) ◽

Author(s):

Brodie Mather ◽

Bonnie J Dorr ◽

Owen Rambow ◽

Tomek Strzalkowski

Keyword(s):

Argument Structure ◽

General Framework ◽

Positive Attitude ◽

Use Case ◽

Light Verb ◽

Generalized Framework ◽

Verb Processing ◽

Linguistic Constraints ◽

Predicate Argument Structure ◽

And Function

We present a generalized framework for domain-specialized stance detection, focusing on Covid-19 as a use case. We define a stance as a predicate-argument structure (combination of an action and its participants) in a simplified one-argument format, e.g., wear(a mask), coupled with a task-specific belief category representing the purpose (e.g., protection) of an argument (e.g., mask) in the context of its predicate (e.g., wear), as constrained by the domain (e.g., Covid-19). A belief category PROTECT captures a belief such as “masks provide protection,” whereas RESTRICT captures a belief such as “mask mandates limit freedom.” A stance combines a belief proposition, e.g., PROTECT(wear(a mask)), with a sentiment toward this proposition. From this, an overall positive attitude toward mask wearing is extracted. The notions purpose and function serve as natural constraints on the choice of belief categories during resource building which, in turn, constrains stance detection. We demonstrate that linguistic constraints (e.g., light verb processing) further refine the choice of predicate-argument pairings for belief and sentiment assignments, yielding significant increases in F1 score for stance detection over a strong baseline.

Download Full-text

Multi-SKAT: General framework to test multiple phenotype associations of rare variants

10.1101/229583 ◽

2017 ◽

Cited By ~ 2

Author(s):

Diptavo Dutta ◽

Laura Scott ◽

Michael Boehnke ◽

Seunggeun Lee

Keyword(s):

General Framework ◽

Type I Error ◽

Rare Variants ◽

Simulated Data ◽

P Value ◽

Type I ◽

Component Test ◽

Multiple Phenotype ◽

Causal Variants ◽

Multivariate Kernel Regression

In genetic association analysis, a joint test of multiple distinct phenotypes can increase power to identify sets of trait-associated variants within genes or regions of interest. Existing multi-phenotype tests for rare variants make specific assumptions about the patterns of association of underlying causal variants, and the violation of these assumptions can reduce power to detect association. Here we develop a general framework for testing pleiotropic effects of rare variants based on multivariate kernel regression (Multi-SKAT). Multi-SKAT models effect sizes of variants on the phenotypes through a kernel matrix and performs a variance component test of association. We show that many existing tests are equivalent to specific choices of kernel matrices with the Multi-SKAT framework. To increase power to detect association across tests with different kernel matrices, we developed a fast and accurate approximation of the significance of the minimum observed p-value across tests. To account for related individuals, our framework uses a random effects for the kinship matrix. Using simulated data and amino acid and exome-array data from the METSIM study, we show that Multi-SKAT can improve power over single-phenotype SKAT-O test and existing multiple phenotype tests, while maintaining type I error rate.

Download Full-text

Genome Sequence of the Obligate Methanotroph Methylosinus trichosporium Strain OB3b

Journal of Bacteriology ◽

10.1128/jb.01144-10 ◽

2010 ◽

Vol 192 (24) ◽

pp. 6497-6498 ◽

Cited By ~ 59

Author(s):

Lisa Y. Stein ◽

Sukhwan Yoon ◽

Jeremy D. Semrau ◽

Alan A. DiSpirito ◽

Andrew Crombie ◽

...

Keyword(s):

Methane Oxidation ◽

Genome Sequence ◽

Catalytic Properties ◽

Methane Monooxygenase ◽

Structure And Function ◽

Methylosinus Trichosporium ◽

Soluble Methane Monooxygenase ◽

Obligate Aerobic ◽

Copper Chelator ◽

And Function

ABSTRACT Methylosinus trichosporium OB3b (for “oddball” strain 3b) is an obligate aerobic methane-oxidizing alphaproteobacterium that was originally isolated in 1970 by Roger Whittenbury and colleagues. This strain has since been used extensively to elucidate the structure and function of several key enzymes of methane oxidation, including both particulate and soluble methane monooxygenase (sMMO) and the extracellular copper chelator methanobactin. In particular, the catalytic properties of soluble methane monooxygenase from M. trichosporium OB3b have been well characterized in context with biodegradation of recalcitrant hydrocarbons, such as trichloroethylene. The sequence of the M. trichosporium OB3b genome is the first reported from a member of the Methylocystaceae family in the order Rhizobiales.

Download Full-text

Correction of cilia structure and function alleviates multi-organ pathology in Bardet–Biedl syndrome mice

Human Molecular Genetics ◽

10.1093/hmg/ddaa138 ◽

2020 ◽

Vol 29 (15) ◽

pp. 2508-2522

Author(s):

Hervé Husson ◽

Nikolay O Bukanov ◽

Sarah Moreno ◽

Mandy M Smith ◽

Brenda Richards ◽

...

Keyword(s):

Autosomal Recessive ◽

Genetic Mutation ◽

Structure And Function ◽

Bardet Biedl Syndrome ◽

Multiple Organs ◽

Disease Modifying Therapy ◽

Systemic Manifestations ◽

Synthase Inhibitor ◽

And Function ◽

First Time

Abstract Bardet–Biedl syndrome (BBS) is a pleiotropic autosomal recessive ciliopathy affecting multiple organs. The development of potential disease-modifying therapy for BBS will require concurrent targeting of multi-systemic manifestations. Here, we show for the first time that monosialodihexosylganglioside accumulates in Bbs2−/− cilia, indicating impairment of glycosphingolipid (GSL) metabolism in BBS. Consequently, we tested whether BBS pathology in Bbs2−/− mice can be reversed by targeting the underlying ciliary defect via reduction of GSL metabolism. Inhibition of GSL synthesis with the glucosylceramide synthase inhibitor Genz-667161 decreases the obesity, liver disease, retinal degeneration and olfaction defect in Bbs2−/− mice. These effects are secondary to preservation of ciliary structure and signaling, and stimulation of cellular differentiation. In conclusion, reduction of GSL metabolism resolves the multi-organ pathology of Bbs2−/− mice by directly preserving ciliary structure and function towards a normal phenotype. Since this approach does not rely on the correction of the underlying genetic mutation, it might translate successfully as a treatment for other ciliopathies.

Download Full-text