Fine-Scale Characterization of Genomic Structural Variation in the Human Genome Reveals Adaptive and Biomedically Relevant Hotspots

AbstractGenomic structural variants (SVs) are distributed nonrandomly across the human genome. These “hotspots” have been implicated in critical evolutionary innovations, as well as serious medical conditions. However, the evolutionary and biomedical features of these hotspots remain incompletely understood. In this study, we analyzed data from 2,504 genomes from the 1000 Genomes Project Consortium and constructed a refined map of 1,148 SV hotspots in human genomes. By studying the genomic architecture of these hotspots, we found that both nonallelic homologous recombination and non-homologous mechanisms act as mechanistic drivers of SV formation. We found that the majority of SV hotspots are within gene-poor regions and evolve under relaxed negative selection or neutrality. However, we found that a small subset of SV hotspots harbor genes that are enriched for anthropologically crucial functions, including blood oxygen transport, olfaction, synapse assembly, and antigen binding. We provide evidence that balancing selection may have maintained these SV hotspots, which include two independent hotspots on different chromosomes affecting alpha and beta hemoglobin gene clusters. Biomedically, we found that the SV hotspots coincide with breakpoints of clinically relevant, large de novo SVs, significantly more often than genome-wide expectations. As an example, we showed that the breakpoints of multiple large de novo SVs, which lead to idiopathic short stature, coincide with SV hotspots. As such, the mutational instability in SV hotpots likely enables chromosomal breaks that lead to pathogenic structural variation formations. Our study contributes to a better understanding of the mutational landscape of the genome and implicates both mechanistic and adaptive forces in the formation and maintenance of SV hotspots.

Download Full-text

Faculty Opinions recommendation of Fine-scale structural variation of the human genome.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.1025956.323360 ◽

2005 ◽

Author(s):

Sue Malcolm

Keyword(s):

Human Genome ◽

Structural Variation ◽

Fine Scale

Download Full-text

A thirty year, fine-scale, characterization of area burned in Canadian forests shows evidence of regionally increasing trends in the last decade

PLoS ONE ◽

10.1371/journal.pone.0197218 ◽

2018 ◽

Vol 13 (5) ◽

pp. e0197218 ◽

Cited By ~ 21

Author(s):

Nicholas C. Coops ◽

Txomin Hermosilla ◽

Michael A. Wulder ◽

Joanne C. White ◽

Douglas K. Bolton

Keyword(s):

Fine Scale ◽

Area Burned ◽

Scale Characterization ◽

Increasing Trends

Download Full-text

Faculty Opinions recommendation of Fine-scale structural variation of the human genome.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.1025956.326529 ◽

2005 ◽

Author(s):

Ulf Pettersson

Keyword(s):

Human Genome ◽

Structural Variation ◽

Fine Scale

Download Full-text

Fine-scale characterization of bird habitat using airborne LiDAR in an urban park in Japan

Urban Forestry & Urban Greening ◽

10.1016/j.ufug.2016.03.007 ◽

2016 ◽

Vol 17 ◽

pp. 16-22 ◽

Cited By ~ 8

Author(s):

Takeshi Sasaki ◽

Junichi Imanishi ◽

Wataru Fukui ◽

Yukihiro Morimoto

Keyword(s):

Airborne Lidar ◽

Urban Park ◽

Fine Scale ◽

Bird Habitat ◽

Scale Characterization

Download Full-text

MetaPalette: a k-mer Painting Approach for Metagenomic Taxonomic Profiling and Quantification of Novel Strain Variation

mSystems ◽

10.1128/msystems.00020-16 ◽

2016 ◽

Vol 1 (3) ◽

Cited By ~ 31

Author(s):

David Koslicki ◽

Daniel Falush

Keyword(s):

Community Composition ◽

Strain Level ◽

Fine Scale ◽

Taxonomic Profiling ◽

Evolutionary Relatedness ◽

Metagenomic Sample ◽

Conserved Genes ◽

Level Information ◽

Scale Characterization

ABSTRACT Taxonomic profiling is a challenging first step when analyzing a metagenomic sample. This work presents a method that facilitates fine-scale characterization of the presence, abundance, and evolutionary relatedness of organisms present in a given sample but absent from the training database. We calculate a “k-mer palette” which summarizes the information from all reads, not just those in conserved genes or containing taxon-specific markers. The compositions of palettes are easy to model, allowing rapid inference of community composition. In addition to providing strain-level information where applicable, our approach provides taxonomic profiles that are more accurate than those of competing methods. Metagenomic profiling is challenging in part because of the highly uneven sampling of the tree of life by genome sequencing projects and the limitations imposed by performing phylogenetic inference at fixed taxonomic ranks. We present the algorithm MetaPalette, which uses long k-mer sizes (k = 30, 50) to fit a k-mer “palette” of a given sample to the k-mer palette of reference organisms. By modeling the k-mer palettes of unknown organisms, the method also gives an indication of the presence, abundance, and evolutionary relatedness of novel organisms present in the sample. The method returns a traditional, fixed-rank taxonomic profile which is shown on independently simulated data to be one of the most accurate to date. Tree figures are also returned that quantify the relatedness of novel organisms to reference sequences, and the accuracy of such figures is demonstrated on simulated spike-ins and a metagenomic soil sample. The software implementing MetaPalette is available at: https://github.com/dkoslicki/MetaPalette . Pretrained databases are included for Archaea, Bacteria, Eukaryota, and viruses. IMPORTANCE Taxonomic profiling is a challenging first step when analyzing a metagenomic sample. This work presents a method that facilitates fine-scale characterization of the presence, abundance, and evolutionary relatedness of organisms present in a given sample but absent from the training database. We calculate a “k-mer palette” which summarizes the information from all reads, not just those in conserved genes or containing taxon-specific markers. The compositions of palettes are easy to model, allowing rapid inference of community composition. In addition to providing strain-level information where applicable, our approach provides taxonomic profiles that are more accurate than those of competing methods. Author Video: An author video summary of this article is available.

Download Full-text