Applicability of the Mutation–Selection Balance Model to Population Genetics of Heterozygous Protein-Truncating Variants in Humans

Donate Weghorn; Daniel J Balick; Christopher Cassa; Jack A Kosmicki; Mark J Daly; David R Beier; Shamil R Sunyaev

doi:10.1093/molbev/msz092

Applicability of the Mutation–Selection Balance Model to Population Genetics of Heterozygous Protein-Truncating Variants in Humans

Molecular Biology and Evolution ◽

10.1093/molbev/msz092 ◽

2019 ◽

Vol 36 (8) ◽

pp. 1701-1710 ◽

Cited By ~ 1

Author(s):

Donate Weghorn ◽

Daniel J Balick ◽

Christopher Cassa ◽

Jack A Kosmicki ◽

Mark J Daly ◽

...

Keyword(s):

Population Genetics ◽

De Novo ◽

Demographic History ◽

Purifying Selection ◽

Population History ◽

Balance Model ◽

Data Set ◽

Stochastic Force ◽

Drift Estimation ◽

Human Genes

Abstract The fate of alleles in the human population is believed to be highly affected by the stochastic force of genetic drift. Estimation of the strength of natural selection in humans generally necessitates a careful modeling of drift including complex effects of the population history and structure. Protein-truncating variants (PTVs) are expected to evolve under strong purifying selection and to have a relatively high per-gene mutation rate. Thus, it is appealing to model the population genetics of PTVs under a simple deterministic mutation–selection balance, as has been proposed earlier (Cassa et al. 2017). Here, we investigated the limits of this approximation using both computer simulations and data-driven approaches. Our simulations rely on a model of demographic history estimated from 33,370 individual exomes of the Non-Finnish European subset of the ExAC data set (Lek et al. 2016). Additionally, we compared the African and European subset of the ExAC study and analyzed de novo PTVs. We show that the mutation–selection balance model is applicable to the majority of human genes, but not to genes under the weakest selection.

Download Full-text

Applicability of the mutation-selection balance model to population genetics of heterozygous protein-truncating variants in humans

10.1101/433961 ◽

2018 ◽

Cited By ~ 3

Author(s):

Donate Weghorn ◽

Daniel J. Balick ◽

Christopher Cassa ◽

Jack Kosmicki ◽

Mark J. Daly ◽

...

Keyword(s):

Population Genetics ◽

De Novo ◽

Demographic History ◽

Purifying Selection ◽

Population History ◽

Balance Model ◽

Stochastic Force ◽

Drift Estimation ◽

Human Genes ◽

Per Gene

AbstractThe fate of alleles in the human population is believed to be highly affected by the stochastic force of genetic drift. Estimation of the strength of natural selection in humans generally necessitates a careful modeling of drift including complex effects of the population history and structure. Protein truncating variants (PTVs) are expected to evolve under strong purifying selection and to have a relatively high per-gene mutation rate. Thus, it is appealing to model the population genetics of PTVs under a simple deterministic mutation-selection balance, as has been proposed earlier [1]. Here, we investigated the limits of this approximation using both computer simulations and data-driven approaches. Our simulations rely on a model of demographic history estimated from 33,370 individual exomes of the Non-Finnish European subset of the ExAC dataset [2]. Additionally, we compared the African and European subset of the ExAC study and analyzed de novo PTVs. We show that the mutation-selection balance model is applicable to the majority of human genes, but not to genes under the weakest selection.

Download Full-text

Insights into platypus population structure and history from whole-genome sequencing

10.1101/221481 ◽

2017 ◽

Author(s):

Hilary C. Martin ◽

Elizabeth M. Batty ◽

Julie Hussin ◽

Portia Westall ◽

Tasman Daish ◽

...

Keyword(s):

Population Structure ◽

Whole Genome Sequencing ◽

Genome Sequencing ◽

De Novo ◽

Demographic History ◽

Population Decline ◽

Egg Laying ◽

Whole Genome ◽

Data Set ◽

Important Species

AbstractThe platypus is an egg-laying mammal which, alongside the echidna, occupies a unique place in the mammalian phylogenetic tree. Despite widespread interest in its unusual biology, little is known about its population structure or recent evolutionary history. To provide new insights into the dispersal and demographic history of this iconic species, we sequenced the genomes of 57 platypuses from across the whole species range in eastern mainland Australia and Tasmania. Using a highly-improved reference genome, we called over 6.7M SNPs, providing an informative genetic data set for population analyses. Our results show very strong population structure in the platypus, with our sampling locations corresponding to discrete groupings between which there is no evidence for recent gene flow. Genome-wide data allowed us to establish that 28 of the 57 sampled individuals had at least a third-degree relative amongst other samples from the same river, often taken at different times. Taking advantage of a sampled family quartet, we estimated the de novo mutation rate in the platypus at 7.0×10−9/bp/generation (95% CI 4.1×10−9 − 1.2×10−8/bp/generation). We estimated effective population sizes of ancestral populations and haplotype sharing between current groupings, and found evidence for bottlenecks and long-term population decline in multiple regions, and early divergence between populations in different regions. This study demonstrates the power of whole-genome sequencing for studying natural populations of an evolutionarily important species.

Download Full-text

High quality whole genome sequence of an abundant Holarctic odontocete, the harbour porpoise (Phocoena phocoena)

10.1101/246173 ◽

2018 ◽

Author(s):

Marijke Autenrieth ◽

Stefanie Hartmann ◽

Ljerka Lah ◽

Anna Roos ◽

Alice B. Dennis ◽

...

Keyword(s):

Genome Assembly ◽

De Novo ◽

Bos Taurus ◽

Demographic History ◽

Harbour Porpoise ◽

Population History ◽

Whole Genome Sequence ◽

Phocoena Phocoena ◽

Total Size ◽

High Level

AbstractThe harbour porpoise (Phocoena phocoena) is a highly mobile cetacean found in waters across the Northern hemisphere. It occurs in coastal water and inhabits water basins that vary broadly in salinity, temperature, and food availability. These diverse habitats could drive differentiation among populations. Here we report the first harbour porpoise genome, assembled de novo from a Swedish Kattegat individual. The genome is one of the most complete cetacean genomes currently available, with a total size of 2.7 Gb and 50% of the total length found in just 34 scaffolds. Using the largest 122 scaffolds, we were able to validate a high level of homology to the chromosome-level genome assembly of the closest related species for which such resource was available, the domestic cattle (Bos taurus). The draft annotation comprises 22,154 predicted gene models, which we further annotated through matches to the NCBI nucleotide database, GO categorization, and motif prediction. To infer the adaptive abilities of this species, as well as their population history, we performed a Bayesian skyline analysis, and produced results that are concordant with the demographic history of this species, including expansion and fragmentation events. Overall, this genome assembly, together with the draft annotation, represents a crucial addition to the limited genetic markers currently available for the study of porpoises and Phocoenidae conservation, phylogeny, and evolution.

Download Full-text

Overcoming constraints on the detection of recessive selection in human genes from population frequency data

10.1101/2021.05.06.443024 ◽

2021 ◽

Author(s):

Daniel J Balick ◽

Daniel M Jordan ◽

Shamil Sunyaev ◽

Ron Do

Keyword(s):

Population Genetics ◽

Population Sample ◽

Purifying Selection ◽

Selective Constraint ◽

Monogenic Disorders ◽

Genetics Research ◽

Disease Gene Discovery ◽

Gene Sets ◽

Human Genes ◽

Recessive Genes

The identification of genes that evolve under recessive natural selection is a longstanding goal of population genetics research with important applications to disease gene discovery. We found that commonly used methods to evaluate selective constraint at the gene level are highly sensitive to genes under heterozygous selection but ubiquitously fail to detect recessively evolving genes. Additionally, more sophisticated likelihood-based methods designed to detect recessivity similarly lack power for a human gene of realistic length from current population sample sizes. However, extensive simulations suggested that recessive genes may be detectable in aggregate. Here, we offer a method informed by population genetics simulations designed to detect recessive purifying selection in gene sets. Applying this to empirical gene sets produced significant enrichments for strong recessive selection in genes previously inferred to be under recessive selection in a consanguineous cohort and in genes involved in autosomal recessive monogenic disorders.

Download Full-text

Ever-increasing viral diversity associated with the red imported fire ant Solenopsis invicta (Formicidae: Hymenoptera)

Virology Journal ◽

10.1186/s12985-020-01469-w ◽

2021 ◽

Vol 18 (1) ◽

Author(s):

César Augusto Diniz Xavier ◽

Margaret Louise Allen ◽

Anna Elizabeth Whitfield

Keyword(s):

Solenopsis Invicta ◽

De Novo ◽

Rna Virus ◽

Housekeeping Genes ◽

Virus Genome ◽

Single Strand ◽

Viral Diversity ◽

Red Imported Fire Ant ◽

Sequence Comparisons ◽

Data Set

Abstract Background Advances in sequencing and analysis tools have facilitated discovery of many new viruses from invertebrates, including ants. Solenopsis invicta is an invasive ant that has quickly spread worldwide causing significant ecological and economic impacts. Its virome has begun to be characterized pertaining to potential use of viruses as natural enemies. Although the S. invicta virome is the best characterized among ants, most studies have been performed in its native range, with less information from invaded areas. Methods Using a metatranscriptome approach, we further identified and molecularly characterized virus sequences associated with S. invicta, in two introduced areas, U.S and Taiwan. The data set used here was obtained from different stages (larvae, pupa, and adults) of S. invicta life cycle. Publicly available RNA sequences from GenBank’s Sequence Read Archive were downloaded and de novo assembled using CLC Genomics Workbench 20.0.1. Contigs were compared against the non-redundant protein sequences and those showing similarity to viral sequences were further analyzed. Results We characterized five putative new viruses associated with S. invicta transcriptomes. Sequence comparisons revealed extensive divergence across ORFs and genomic regions with most of them sharing less than 40% amino acid identity with those closest homologous sequences previously characterized. The first negative-sense single-stranded RNA virus genomic sequences included in the orders Bunyavirales and Mononegavirales are reported. In addition, two positive single-strand virus genome sequences and one single strand DNA virus genome sequence were also identified. While the presence of a putative tenuivirus associated with S. invicta was previously suggested to be a contamination, here we characterized and present strong evidence that Solenopsis invicta virus 14 (SINV-14) is a tenui-like virus that has a long-term association with the ant. Furthermore, based on virus sequence abundance compared to housekeeping genes, phylogenetic relationships, and completeness of viral coding sequences, our results suggest that four of five virus sequences reported, those being SINV-14, SINV-15, SINV-16 and SINV-17, may be associated to viruses actively replicating in the ant S. invicta. Conclusions The present study expands our knowledge about viral diversity associated with S. invicta in introduced areas with potential to be used as biological control agents, which will require further biological characterization.

Download Full-text

An Integrated Framework for the Inference of Viral Population History From Reconstructed Genealogies

Genetics ◽

10.1093/genetics/155.3.1429 ◽

2000 ◽

Vol 155 (3) ◽

pp. 1429-1437

Author(s):

Oliver G Pybus ◽

Andrew Rambaut ◽

Paul H Harvey

Keyword(s):

Maximum Likelihood ◽

Sequence Data ◽

Demographic History ◽

Population History ◽

Maximum Likelihood Estimates ◽

Viral Population ◽

True Parameter ◽

Subtype B ◽

Exponential Growth Model ◽

Parameter Values

Abstract We describe a unified set of methods for the inference of demographic history using genealogies reconstructed from gene sequence data. We introduce the skyline plot, a graphical, nonparametric estimate of demographic history. We discuss both maximum-likelihood parameter estimation and demographic hypothesis testing. Simulations are carried out to investigate the statistical properties of maximum-likelihood estimates of demographic parameters. The simulations reveal that (i) the performance of exponential growth model estimates is determined by a simple function of the true parameter values and (ii) under some conditions, estimates from reconstructed trees perform as well as estimates from perfect trees. We apply our methods to HIV-1 sequence data and find strong evidence that subtypes A and B have different demographic histories. We also provide the first (albeit tentative) genetic evidence for a recent decrease in the growth rate of subtype B.

Download Full-text

Global production networks and the evolution of industrial capabilities: does production sharing warp the product space?

Oxford Economic Papers ◽

10.1093/oep/gpaa007 ◽

2020 ◽

Vol 72 (3) ◽

pp. 731-747

Author(s):

Russell Thomson ◽

Prema-Chandra Athukorala

Keyword(s):

Product Space ◽

Industrial Structure ◽

De Novo ◽

Trade Openness ◽

Production Networks ◽

Global Production Networks ◽

Industrial Upgrading ◽

Data Set ◽

Production Sharing ◽

Space Approach

Abstract Do production capabilities of countries evolve from existing capabilities or emerge de novo? The Product Space approach developed by Hidalgo, Klinger, Barabási and Hausmann postulates that a country’s existing industrial structure largely determines its opportunities for industrial upgrading. However, this is difficult to reconcile with the export dynamism of many developing countries such as Thailand, Malaysia, Costa Rica and Vietnam that transformed from primary commodity dependence to exporters of dynamic manufactured products. In each of these cases, global production sharing facilitated industrial transition. In this article, we advance the Product Space approach to accommodate the role of global production sharing. Using a newly constructed multi-country data set of manufacturing exports that distinguishes between trade within global production networks and traditional horizontal trade, we find that that existing industrial structure has a smaller impact, but trade openness has a greater impact, on industrial upgrading within vertically integrated global industries.

Download Full-text

De Novo Genome Assembly of Limpet Bathyacmaea lactea (Gastropoda: Pectinodontidae): The First Reference Genome of a Deep-Sea Gastropod Endemic to Cold Seeps

Genome Biology and Evolution ◽

10.1093/gbe/evaa100 ◽

2020 ◽

Vol 12 (6) ◽

pp. 905-910 ◽

Cited By ~ 2

Author(s):

Ruoyu Liu ◽

Kun Wang ◽

Jun Liu ◽

Wenjie Xu ◽

Yang Zhou ◽

...

Keyword(s):

Deep Sea ◽

Metal Ion ◽

De Novo ◽

Demographic History ◽

Gene Families ◽

Phylogenetic Position ◽

Cold Seeps ◽

Nitrogen And Phosphorus ◽

De Novo Genome Assembly ◽

A Genome

Abstract Cold seeps, characterized by the methane, hydrogen sulfide, and other hydrocarbon chemicals, foster one of the most widespread chemosynthetic ecosystems in deep sea that are densely populated by specialized benthos. However, scarce genomic resources severely limit our knowledge about the origin and adaptation of life in this unique ecosystem. Here, we present a genome of a deep-sea limpet Bathyacmaea lactea, a common species associated with the dominant mussel beds in cold seeps. We yielded 54.6 gigabases (Gb) of Nanopore reads and 77.9-Gb BGI-seq raw reads, respectively. Assembly harvested a 754.3-Mb genome for B. lactea, with 3,720 contigs and a contig N50 of 1.57 Mb, covering 94.3% of metazoan Benchmarking Universal Single-Copy Orthologs. In total, 23,574 protein-coding genes and 463.4 Mb of repetitive elements were identified. We analyzed the phylogenetic position, substitution rate, demographic history, and TE activity of B. lactea. We also identified 80 expanded gene families and 87 rapidly evolving Gene Ontology categories in the B. lactea genome. Many of these genes were associated with heterocyclic compound metabolism, membrane-bounded organelle, metal ion binding, and nitrogen and phosphorus metabolism. The high-quality assembly and in-depth characterization suggest the B. lactea genome will serve as an essential resource for understanding the origin and adaptation of life in the cold seeps.

Download Full-text

Phylogeographic structure and gene flow of Himalayan snowcock (Tetraogallus himalayensis)

Animal Biology ◽

10.1163/157075610x523314 ◽

2010 ◽

Vol 60 (4) ◽

pp. 449-465

Author(s):

Wen Longying ◽

Zhang Lixun ◽

An Bei ◽

Luo Huaxing ◽

Liu Naifa ◽

...

Keyword(s):

Demographic History ◽

Phylogenetic Analyses ◽

Divergence Time ◽

Population Expansion ◽

Population History ◽

Tibet Plateau ◽

Phylogeographic Structure ◽

Mitochondrial Cytochrome B ◽

History Of ◽

Qinghai Tibet Plateau

AbstractWe have used phylogeographic methods to investigate the genetic structure and population history of the endangered Himalayan snowcock (Tetraogallus himalayensis) in northwestern China. The mitochondrial cytochrome b gene was sequenced of 102 individuals sampled throughout the distribution range. In total, we found 26 different haplotypes defined by 28 polymorphic sites. Phylogenetic analyses indicated that the samples were divided into two major haplogroups corresponding to one western and one eastern clade. The divergence time between these major clades was estimated to be approximately one million years. An analysis of molecular variance showed that 40% of the total genetic variability was found within local populations, 12% among populations within regional groups and 48% among groups. An analysis of the demographic history of the populations suggested that major expansions have occurred in the Himalayan snowcock populations and these correlate mainly with the first and the second largest glaciations during the Pleistocene. In addition, the data indicate that there was a population expansion of the Tianshan population during the uplift of the Qinghai-Tibet Plateau, approximately 2 million years ago.

Download Full-text

Understanding population history for conservation purposes: population genetics of Saxifraga aizoides (Saxifragaceae) in the lowlands and lower mountains north of the Alps

American Journal of Botany ◽

10.2307/2656602 ◽

2000 ◽

Vol 87 (4) ◽

pp. 583-590 ◽

Cited By ~ 23

Author(s):

Eva Lutz ◽

J. Jakob Schneller ◽

Rolf Holderegger

Keyword(s):

Population Genetics ◽

Population History ◽

The Alps

Download Full-text