Whole genome re-sequencing reveals adaptation prior to the divergence of buffalo subspecies

Genome Biology and Evolution ◽

10.1093/gbe/evaa231 ◽

2020 ◽

Author(s):

Mostafa Rafiepour ◽

Esmaeil Ebrahimie ◽

Mohammad Farhad Vahidi ◽

Ghasem Hosseini Salekdeh ◽

Ali Niazi ◽

...

Keyword(s):

Read Depth ◽

Genomic Diversity ◽

River Buffalo ◽

Sequencing Data ◽

Ecological Zones ◽

Swamp Buffaloes ◽

Taurine Cattle ◽

East Azerbaijan ◽

Stop Loss ◽

Genomic Regions

Abstract The application of high throughput genotyping or sequencing data helps us to understand the genomic response to natural and artificial selection. In this study, we scanned the genomes of five indigenous buffalo populations belong to three recognized breeds, adapted to different geographical and agro-ecological zones in Iran, to unravel the extent of genomic diversity and to localize genomic regions and genes underwent past selection. A total of 46 river buffalo whole genomes, from West and East Azerbaijan, Gilan, Mazandaran and Khuzestan provinces, were re-sequenced. Our sequencing data reached to a coverage above 99% of the river buffalo reference genome and an average read depth around 9.2X per sample. We identified 20.55 million SNPs, including 63,097 missense, 707 stop-gain and 159 stop-loss mutations that might have functional consequences. Genomic diversity analyses showed modest structuring among Iranian buffalo populations following frequent gene flow or admixture in the recent past. Evidence of positive selection was investigated using both differentiation (Fst) and fixation (Pi) metrics. Analysis of fixation revealed three genomic regions in all three breeds with aberrant polymorphism contents on BBU2, 20 and 21. Fixation signal on BBU2 overlapped with the OCA2-HERC2 genes, suggestive of adaptation to UV exposure through pigmentation mechanism. Further validation using re-sequencing data from other five bovine species as well as the Axiom® Buffalo Genotyping Array 90 K data of river and swamp buffaloes indicated that these fixation signals persisted across river and swamp buffaloes and extended to taurine cattle, implying an ancient evolutionary event occurred before the speciation of buffalo and taurine cattle. These results contributed to our understanding of major genetic switches that took place during the evolution of modern buffaloes.

Download Full-text

Assessing genomic diversity and signatures of selection in Jiaxian Red cattle using whole-genome sequencing data

BMC Genomics ◽

10.1186/s12864-020-07340-0 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Xiaoting Xia ◽

Shunjin Zhang ◽

Huaju Zhang ◽

Zijing Zhang ◽

Ningbo Chen ◽

...

Keyword(s):

Population Structure ◽

Whole Genome Sequencing ◽

Genome Sequencing ◽

Genomic Variation ◽

Genomic Diversity ◽

System Response ◽

Whole Genome ◽

Population Structure Analysis ◽

Native Cattle ◽

Genomic Regions

Abstract Background Native cattle breeds are an important source of genetic variation because they might carry alleles that enable them to adapt to local environment and tough feeding conditions. Jiaxian Red, a Chinese native cattle breed, is reported to have originated from crossbreeding between taurine and indicine cattle; their history as a draft and meat animal dates back at least 30 years. Using whole-genome sequencing (WGS) data of 30 animals from the core breeding farm, we investigated the genetic diversity, population structure and genomic regions under selection of Jiaxian Red cattle. Furthermore, we used 131 published genomes of world-wide cattle to characterize the genomic variation of Jiaxian Red cattle. Results The population structure analysis revealed that Jiaxian Red cattle harboured the ancestry with East Asian taurine (0.493), Chinese indicine (0.379), European taurine (0.095) and Indian indicine (0.033). Three methods (nucleotide diversity, linkage disequilibrium decay and runs of homozygosity) implied the relatively high genomic diversity in Jiaxian Red cattle. We used θπ, CLR, FST and XP-EHH methods to look for the candidate signatures of positive selection in Jiaxian Red cattle. A total number of 171 (θπ and CLR) and 17 (FST and XP-EHH) shared genes were identified using different detection strategies. Functional annotation analysis revealed that these genes are potentially responsible for growth and feed efficiency (CCSER1), meat quality traits (ROCK2, PPP1R12A, CYB5R4, EYA3, PHACTR1), fertility (RFX4, SRD5A2) and immune system response (SLAMF1, CD84 and SLAMF6). Conclusion We provide a comprehensive overview of sequence variations in Jiaxian Red cattle genomes. Selection signatures were detected in genomic regions that are possibly related to economically important traits in Jiaxian Red cattle. We observed a high level of genomic diversity and low inbreeding in Jiaxian Red cattle. These results provide a basis for further resource protection and breeding improvement of this breed.

Download Full-text

Control of artefactual variation in reported inter-sample relatedness during clinical use of a Mycobacterium tuberculosis sequencing pipeline

10.1101/252460 ◽

2018 ◽

Author(s):

David H Wyllie ◽

Nicholas Sanderson ◽

Richard Myers ◽

Tim Peto ◽

Esther Robinson ◽

...

Keyword(s):

Consensus Sequence ◽

Read Depth ◽

Pairwise Distance ◽

Contact Tracing ◽

Clinical Samples ◽

Bacterial Dna ◽

Consensus Sequences ◽

Minor Variant ◽

Validation Set ◽

Genomic Regions

ABSTRACTContact tracing requires reliable identification of closely related bacterial isolates. When we noticed the reporting of artefactual variation between M. tuberculosis isolates during routine next generation sequencing of Mycobacterium spp, we investigated its basis in 2,018 consecutive M. tuberculosis isolates. In the routine process used, clinical samples were decontaminated and inoculated into broth cultures; from positive broth cultures DNA was extracted, sequenced, reads mapped, and consensus sequences determined. We investigated the process of consensus sequence determination, which selects the most common nucleotide at each position. Having determined the high-quality read depth and depth of minor variants across 8,006 M. tuberculosis genomic regions, we quantified the relationship between the minor variant depth and the amount of non-Mycobacterial bacterial DNA, which originates from commensal microbes killed during sample decontamination. In the presence of non-Mycobacterial bacterial DNA, we found significant increases in minor variant frequencies of more than 1.5 fold in 242 regions covering 5.1% of the M. tuberculosis genome. Included within these were four high variation regions strongly influenced by the amount of non-Mycobacterial bacterial DNA. Excluding these four regions from pairwise distance comparisons reduced biologically implausible variation from 5.2% to 0% in an independent validation set derived from 226 individuals. Thus, we have demonstrated an approach identifying critical genomic regions contributing to clinically relevant artefactual variation in bacterial similarity searches. The approach described monitors the outputs of the complex multi-step laboratory and bioinformatics process, allows periodic process adjustments, and will have application to quality control of routine bacterial genomics.

Download Full-text

CNV-P: a machine-learning framework for predicting high confident copy number variations

PeerJ ◽

10.7717/peerj.12564 ◽

2021 ◽

Vol 9 ◽

pp. e12564

Author(s):

Taifu Wang ◽

Jinghua Sun ◽

Xiuqing Zhang ◽

Wen-Jing Wang ◽

Qing Zhou

Keyword(s):

Machine Learning ◽

False Positive ◽

Copy Number ◽

Genetic Disorders ◽

Genetic Diseases ◽

Basic Research ◽

Read Depth ◽

Copy Number Variations ◽

Sequencing Data ◽

Learning Framework

Background Copy-number variants (CNVs) have been recognized as one of the major causes of genetic disorders. Reliable detection of CNVs from genome sequencing data has been a strong demand for disease research. However, current software for detecting CNVs has high false-positive rates, which needs further improvement. Methods Here, we proposed a novel and post-processing approach for CNVs prediction (CNV-P), a machine-learning framework that could efficiently remove false-positive fragments from results of CNVs detecting tools. A series of CNVs signals such as read depth (RD), split reads (SR) and read pair (RP) around the putative CNV fragments were defined as features to train a classifier. Results The prediction results on several real biological datasets showed that our models could accurately classify the CNVs at over 90% precision rate and 85% recall rate, which greatly improves the performance of state-of-the-art algorithms. Furthermore, our results indicate that CNV-P is robust to different sizes of CNVs and the platforms of sequencing. Conclusions Our framework for classifying high-confident CNVs could improve both basic research and clinical diagnosis of genetic diseases.

Download Full-text

Assessing genomic diversity and selective pressures in Bashan cattle by whole-genome sequencing data

Animal Biotechnology ◽

10.1080/10495398.2021.1998094 ◽

2021 ◽

pp. 1-12

Author(s):

Luyang Sun ◽

Kaixing Qu ◽

Yangkai Liu ◽

Xiaohui Ma ◽

Ningbo Chen ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Genomic Diversity ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Sequencing Data ◽

Selective Pressures

Download Full-text

CpG_MPs: identification of CpG methylation patterns of genomic regions from high-throughput bisulfite sequencing data

Nucleic Acids Research ◽

10.1093/nar/gks829 ◽

2012 ◽

Vol 41 (1) ◽

pp. e4-e4 ◽

Cited By ~ 36

Author(s):

Jianzhong Su ◽

Haidan Yan ◽

Yanjun Wei ◽

Hongbo Liu ◽

Hui Liu ◽

...

Keyword(s):

High Throughput ◽

Bisulfite Sequencing ◽

Cpg Methylation ◽

Sequencing Data ◽

Bisulfite Sequencing Data ◽

Genomic Regions ◽

Methylation Patterns

Download Full-text

csaw: a Bioconductor package for differential binding analysis of ChIP-seq data using sliding windows

Nucleic Acids Research ◽

10.1093/nar/gkv1191 ◽

2015 ◽

Vol 44 (5) ◽

pp. e45-e45 ◽

Cited By ~ 120

Author(s):

Aaron T.L. Lun ◽

Gordon K. Smyth

Keyword(s):

De Novo ◽

Massively Parallel Sequencing ◽

Real Data ◽

Sequencing Data ◽

Scientific Application ◽

Sliding Windows ◽

Treatment Conditions ◽

Bioconductor Project ◽

Differential Binding ◽

Genomic Regions

Abstract Chromatin immunoprecipitation with massively parallel sequencing (ChIP-seq) is widely used to identify binding sites for a target protein in the genome. An important scientific application is to identify changes in protein binding between different treatment conditions, i.e. to detect differential binding. This can reveal potential mechanisms through which changes in binding may contribute to the treatment effect. The csaw package provides a framework for the de novo detection of differentially bound genomic regions. It uses a window-based strategy to summarize read counts across the genome. It exploits existing statistical software to test for significant differences in each window. Finally, it clusters windows into regions for output and controls the false discovery rate properly over all detected regions. The csaw package can handle arbitrarily complex experimental designs involving biological replicates. It can be applied to both transcription factor and histone mark datasets, and, more generally, to any type of sequencing data measuring genomic coverage. csaw performs favorably against existing methods for de novo DB analyses on both simulated and real data. csaw is implemented as a R software package and is freely available from the open-source Bioconductor project.

Download Full-text

Genome-wide patterns of divergence during speciation: the lake whitefish case study

Philosophical Transactions of the Royal Society B Biological Sciences ◽

10.1098/rstb.2011.0197 ◽

2012 ◽

Vol 367 (1587) ◽

pp. 354-363 ◽

Cited By ~ 81

Author(s):

S. Renaut ◽

N. Maillet ◽

E. Normandeau ◽

C. Sauvage ◽

N. Derome ◽

...

Keyword(s):

Reproductive Isolation ◽

Genomic Islands ◽

Next Generation Sequencing Data ◽

Adaptive Divergence ◽

Phenotypic Traits ◽

Lake Whitefish ◽

Sequencing Data ◽

Species Pairs ◽

Selective Forces ◽

Genomic Regions

The nature, size and distribution of the genomic regions underlying divergence and promoting reproductive isolation remain largely unknown. Here, we summarize ongoing efforts using young (12 000 yr BP) species pairs of lake whitefish ( Coregonus clupeaformis ) to expand our understanding of the initial genomic patterns of divergence observed during speciation. Our results confirmed the predictions that: (i) on average, phenotypic quantitative trait loci (pQTL) show higher F ST values and are more likely to be outliers (and therefore candidates for being targets of divergent selection) than non-pQTL markers; (ii) large islands of divergence rather than small independent regions under selection characterize the early stages of adaptive divergence of lake whitefish; and (iii) there is a general trend towards an increase in terms of numbers and size of genomic regions of divergence from the least (East L.) to the most differentiated species pair (Cliff L.). This is consistent with previous estimates of reproductive isolation between these species pairs being driven by the same selective forces responsible for environment specialization. Altogether, dwarf and normal whitefish species pairs represent a continuum of both morphological and genomic differentiation contributing to ecological speciation. Admittedly, much progress is still required to more finely map and circumscribe genomic islands of speciation. This will be achieved through the use of next generation sequencing data but also through a better quantification of phenotypic traits moulded by selection as organisms adapt to new environmental conditions.

Download Full-text

Modeling Read Counts for CNV Detection in Exome Sequencing Data

Statistical Applications in Genetics and Molecular Biology ◽

10.2202/1544-6115.1732 ◽

2011 ◽

Vol 10 (1) ◽

Cited By ~ 27

Author(s):

Michael I. Love ◽

Alena Myšičková ◽

Ruping Sun ◽

Vera Kalscheuer ◽

Martin Vingron ◽

...

Keyword(s):

Exome Sequencing ◽

High Throughput Sequencing ◽

Gc Content ◽

High Sensitivity ◽

Read Depth ◽

Large Chromosome ◽

Sequencing Data ◽

Sequencing Project ◽

Exome Sequencing Data ◽

Control Set

Varying depth of high-throughput sequencing reads along a chromosome makes it possible to observe copy number variants (CNVs) in a sample relative to a reference. In exome and other targeted sequencing projects, technical factors increase variation in read depth while reducing the number of observed locations, adding difficulty to the problem of identifying CNVs. We present a hidden Markov model for detecting CNVs from raw read count data, using background read depth from a control set as well as other positional covariates such as GC-content. The model, exomeCopy, is applied to a large chromosome X exome sequencing project identifying a list of large unique CNVs. CNVs predicted by the model and experimentally validated are then recovered using a cross-platform control set from publicly available exome sequencing data. Simulations show high sensitivity for detecting heterozygous and homozygous CNVs, outperforming normalization and state-of-the-art segmentation methods.

Download Full-text

SvABA: Genome-wide detection of structural variants and indels by local assembly

10.1101/105080 ◽

2017 ◽

Cited By ~ 9

Author(s):

Jeremiah Wala ◽

Pratiti Bandopadhayay ◽

Noah Greenwald ◽

Ryan O’Rourke ◽

Ted Sharpe ◽

...

Keyword(s):

Variant Calling ◽

Accurate Method ◽

Structural Variants ◽

Sequencing Data ◽

Cancer Driver ◽

Insertion And Deletion ◽

Genome Wide ◽

Cancer Genomes ◽

Local Assembly ◽

Genomic Regions

AbstractStructural variants (SVs), including small insertion and deletion variants (indels), are challenging to detect through standard alignment-based variant calling methods. Sequence assembly offers a powerful approach to identifying SVs, but is difficult to apply at-scale genome-wide for SV detection due to its computational complexity and the difficulty of extracting SVs from assembly contigs. We describe SvABA, an efficient and accurate method for detecting SVs from short-read sequencing data using genome-wide local assembly with low memory and computing requirements. We evaluated SvABA’s performance on the NA12878 human genome and in simulated and real cancer genomes. SvABA demonstrates superior sensitivity and specificity across a large spectrum of SVs, and substantially improved detection performance for variants in the 20-300 bp range, compared with existing methods. SvABA also identifies complex somatic rearrangements with chains of short (< 1,000 bp) templated-sequence insertions copied from distant genomic regions. We applied SvABA to 344 cancer genomes from 11 cancer types, and found that templated-sequence insertions occur in ~4% of all somatic rearrangements. Finally, we demonstrate that SvABA can identify sites of viral integration and cancer driver alterations containing medium-sized SVs.

Download Full-text

LRSDAY: Long-read Sequencing Data Analysis for Yeasts

10.1101/184572 ◽

2017 ◽

Author(s):

Jia-Xing Yue ◽

Gianni Liti

Keyword(s):

Genome Assembly ◽

Model Organism ◽

Sequencing Data ◽

Protein Coding ◽

Sequencing Technologies ◽

Long Reads ◽

Long Read ◽

Downstream Analysis ◽

Eukaryotic Organisms ◽

Genomic Regions

AbstractLong-read sequencing technologies have become increasingly popular in genome projects due to their strengths in resolving complex genomic regions. As a leading model organism with small genome size and great biotechnological importance, the budding yeast, Saccharomyces cerevisiae, has many isolates currently being sequenced with long reads. However, analyzing long-read sequencing data to produce high-quality genome assembly and annotation remains challenging. Here we present LRSDAY, the first one-stop solution to streamline this process. LRSDAY can produce chromosome-level end-to-end genome assembly and comprehensive annotations for various genomic features (including centromeres, protein-coding genes, tRNAs, transposable elements and telomere-associated elements) that are ready for downstream analysis. Although tailored for S. cerevisiae, we designed LRSDAY to be highly modular and customizable, making it adaptable for virtually any eukaryotic organisms. Applying LRSDAY to a S. cerevisiae strain takes ∼43 hrs to generate a complete and well-annotated genome from ∼100X Pacific Biosciences (PacBio) reads using four threads.

Download Full-text