scholarly journals New whole genome de novo assemblies of three divergent strains of rice (O. sativa) documents novel gene space of aus and indica

2014 ◽  
Author(s):  
Michael C Schatz ◽  
Lyza G Maron ◽  
Joshua C Stein ◽  
Alejandro Hernandez Wences ◽  
James Gurtowski ◽  
...  

The use of high throughput genome-sequencing technologies has uncovered a large extent of structural variation in eukaryotic genomes that makes important contributions to genomic diversity and phenotypic variation. Currently, when the genomes of different strains of a given organism are compared, whole genome resequencing data are aligned to an established reference sequence. However when the reference differs in significant structural ways from the individuals under study, the analysis is often incomplete or inaccurate. Here, we use rice as a model to explore the extent of structural variation among strains adapted to different ecologies and geographies, and show that this variation can be significant, often matching or exceeding the variation present in closely related human populations or other mammals. We demonstrate how improvements in sequencing and assembly technology allow rapid and inexpensive de novo assembly of next generation sequence data into high-quality assemblies that can be directly compared to provide an unbiased assessment. Using this approach, we are able to accurately assess the ?pan-genome? of three divergent rice varieties and document several megabases of each genome absent in the other two. Many of the genome-specific loci are annotated to contain genes, reflecting the potential for new biological properties that would be missed by standard resequencing approaches. We further provide a detailed analysis of several loci associated with agriculturally important traits, illustrating the utility of our approach for biological discovery. All of the data and software are openly available to support further breeding and functional studies of rice and other species.

2018 ◽  
Author(s):  
Weiling Li ◽  
Lin Lin ◽  
Raunaq Malhotra ◽  
Lei Yang ◽  
Raj Acharya ◽  
...  

AbstractHuman Endogenous Retrovirus type K (HERV-K) is the only HERV known to be insertionally polymorphic. It is possible that HERV-Ks contribute to human disease because people differ in both number and genomic location of these retroviruses. Indeed viral transcripts, proteins, and antibody against HERV-K are detected in cancers, auto-immune, and neurodegenerative diseases. However, attempts to link a polymorphic HERV-K with any disease have been frustrated in part because population frequency of HERV-K provirus at each site is lacking and it is challenging to identify closely related elements such as HERV-K from short read sequence data. We present an integrated and computationally robust approach that uses whole genome short read data to determine the occupation status at all sites reported to contain a HERV-K provirus. Our method estimates the proportion of fixed length genomic sequence (k-mers) from whole genome sequence data matching a reference set ofk-mersunique to each HERV-K loci and applies mixture model-based clustering to account for low depth sequence data. Our analysis of 1000 Genomes Project Data (KGP) reveals numerous differences among the five KGP super-populations in the frequency of individual and co-occurring HERV-K proviruses; we provide a visualization tool to easily depict the prevalence of any combination of HERV-K among KGP populations. Further, the genome burden of polymorphic HERV-K is variable in humans, with East Asian (EAS) individuals having the fewest integration sites. Our study identifies population-specific sequence variation for several HERV-K proviruses. We expect these resources will advance research on HERV-K contributions to human diseases.Author summaryHuman Endogenous Retrovirus type K (HERV-K) is the youngest of retrovirus families in the human genome and is the only group that is polymorphic; a HERV-K can be present in one individual but absent from others. HERV-Ks could contribute to disease risk but establishing a link of a polymorphic HERV-K to a specific disease has been difficult. We develop an easy to use method that reveals the considerable variation existing among global populations in the frequency of individual and co-occurring polymorphic HERV-K, and in the total number of HERV-K that any individual has in their genome. Our study provides a global reference set of HERV-K genomic diversity and tools needed to determine the genomic landscape of HERV-K in any patient population.


Author(s):  
Seyoung Mun ◽  
Songmi Kim ◽  
Wooseok Lee ◽  
Keunsoo Kang ◽  
Thomas J. Meyer ◽  
...  

AbstractAdvances in next-generation sequencing (NGS) technology have made personal genome sequencing possible, and indeed, many individual human genomes have now been sequenced. Comparisons of these individual genomes have revealed substantial genomic differences between human populations as well as between individuals from closely related ethnic groups. Transposable elements (TEs) are known to be one of the major sources of these variations and act through various mechanisms, including de novo insertion, insertion-mediated deletion, and TE–TE recombination-mediated deletion. In this study, we carried out de novo whole-genome sequencing of one Korean individual (KPGP9) via multiple insert-size libraries. The de novo whole-genome assembly resulted in 31,305 scaffolds with a scaffold N50 size of 13.23 Mb. Furthermore, through computational data analysis and experimental verification, we revealed that 182 TE-associated structural variation (TASV) insertions and 89 TASV deletions contributed 64,232 bp in sequence gain and 82,772 bp in sequence loss, respectively, in the KPGP9 genome relative to the hg19 reference genome. We also verified structural differences associated with TASVs by comparative analysis with TASVs in recent genomes (AK1 and TCGA genomes) and reported their details. Here, we constructed a new Korean de novo whole-genome assembly and provide the first study, to our knowledge, focused on the identification of TASVs in an individual Korean genome. Our findings again highlight the role of TEs as a major driver of structural variations in human individual genomes.


2018 ◽  
Author(s):  
Peng Xu ◽  
Zechen Chong ◽  

AbstractMeiotic recombination (MR), which transmits exchanged genetic materials between homologous chromosomes to offspring, plays a crucial role in shaping genomic diversity in eukaryotic organisms. In humans, thousands of meiotic recombination hotspots have been mapped by population genetics approaches. However, direct identification of MR events for individuals is still challenging due to the difficulty in resolving the haplotypes of homologous chromosomes and reconstructing the gamete genome. Whole genome linked-read sequencing (lrWGS) can generate haplotype sequences of mega-base pairs (N50 ~2.5Mb) after computational phasing. However, the haplotype information is still isolated in a large number of fragmented genomic regions and limited by switch errors, impeding its further application in the chromosome-scale analysis. In this study, we developed a tool MRLR (Meiotic Recombination identification by Linked-Read sequencing) for the analysis of individual MR events. By leveraging trio pedigree information with lrWGS haplotypes, our pipeline is sufficient to reconstruct the whole human gamete genome with 99.8% haplotyping accuracy. By analyzing the haplotype exchange between homologous chromosomes, MRLR identified 462 high-resolution MR events in 6 human trio samples from the Genome In A Bottle (GIAB) and the Human Genome Structural Variation Consortium (HGSVC). In three datasets of the HGSVC, our results recapitulated 149 (92%) previously identified high-confident MR events and discovered 85 novel events. About half (40) of the new events are supported by single-cell template strand sequencing (Strand-seq) results. We found that 332 (71.9%) MR events co-localize with recombination hotspots (>10 cM/Mb) in human populations, and MR breakpoint regions are enriched in PRDM9 and DMC1 binding sites. In addition, 48% (221) breakpoint regions were detected inside a gene, indicating these MRs can directly affect the haplotype diversity of genic regions. Taken together, our approach provides new opportunities in the haplotype-based genomic analysis of individual meiotic recombination. The MRLR software is implemented in Perl and is freely available at https://github.com/ChongLab/MRLR.


Plants ◽  
2021 ◽  
Vol 10 (12) ◽  
pp. 2740
Author(s):  
Yuya Liang ◽  
Shichen Wang ◽  
Chersty L. Harper ◽  
Nithya K. Subramanian ◽  
Rodante E. Tabien ◽  
...  

Global climate change has increased the number of severe flooding events that affect agriculture, including rice production in the U.S. and internationally. Heavy rainfall can cause rice plants to be completely submerged, which can significantly affect grain yield or completely destroy the plants. Recently, a major effect submergence tolerance QTL during the vegetative stage, qSub8.1, which originated from Ciherang-Sub1, was identified in a mapping population derived from a cross between Ciherang-Sub1 and IR10F365. Ciherang-Sub1 was, in turn, derived from a cross between Ciherang and IR64-Sub1. Here, we characterize the qSub8.1 region by analyzing the sequence information of Ciherang-Sub1 and its two parents (Ciherang and IR64-Sub1) and compare the whole genome profile of these varieties with the Nipponbare and Minghui 63 (MH63) reference genomes. The three rice varieties were sequenced with 150 bp pair-end whole-genome shotgun sequencing (Illumina HiSeq4000), followed by performing the Trimmomatic-SOAPdenovo2-MUMmer3 pipeline for genome assembly, resulting in approximate genome sizes of 354.4, 343.7, and 344.7 Mb, with N50 values of 25.1, 25.4, and 26.1 kb, respectively. The results showed that the Ciherang-Sub1 genome is composed of 59–63% Ciherang, 22–24% of IR64-Sub1, and 15–17% of unknown sources. The genome profile revealed a more detailed genomic composition than previous marker-assisted breeding and showed that the qSub8.1 region is mostly from Ciherang, with some introgressed segments from IR64-Sub1 and currently unknown source(s).


2016 ◽  
Author(s):  
Peter A. Andrews ◽  
Ivan Iossifov ◽  
Jude Kendall ◽  
Steven Marks ◽  
Lakshmi Muthuswamy ◽  
...  

AbstractMotivationStandard genome sequence alignment tools primarily designed to find one alignment per read have difficulty detecting inversion, translocation and large insertion and deletion (indel) events. Moreover, dedicated split read alignment methods that depend only upon the reference genome may misidentify or find too many potential split read alignments because of reference genome anomalies.MethodsWe introduce MUMdex, a Maximal Unique Match (MUM)-based genomic analysis software package consisting of a sequence aligner to the reference genome, a storage-indexing format and analysis software. Discordant reference alignments of MUMs are especially suitable for identifying inversion, translocation and large indel differences in unique regions. Extracted population databases are used as filters for flaws in the reference genome. We describe the concepts underlying MUM-based analysis, the software implementation and its usage.ResultsWe demonstrate via simulation that the MUMdex aligner and alignment format are able to correctly detect and record genomic events. We characterize alignment performance and output file sizes for human whole genome data and compare to Bowtie 2 and the BAM format. Preliminary results demonstrate the practicality of the analysis approach by detecting de novo mutation candidates in human whole genome DNA sequence data from 510 families. We provide a population database of events from these families for use by others.Availabilityhttp://mumdex.com/[email protected] (or [email protected])Supplementary informationSupplementary data are available online.


2016 ◽  
Vol 11 (1) ◽  
pp. 7 ◽  
Author(s):  
I Made Tasma ◽  
Dani Satyawan ◽  
Habib Rijzaani

<p>Resequencing of the soybean genome facilitates SNP marker discoveries useful for supporting the national soybean breeding<br />programs. The objectives of the present study were to construct soybean genomic libraries, to resequence the whole genome of<br />five Indonesian soybean genotypes, and to identify SNPs based on the resequence data. The studies consisted of genomic<br />library construction and quality analysis, resequencing the whole-genome of five soybean genotypes, and genome-wide SNP<br />identification based on alignment of the resequence data with reference sequence, Williams 82. The five Indonesian soybean<br />genotypes were Tambora, Grobogan, B3293, Malabar, and Davros. The results showed that soybean genomic library was<br />successfully constructed having the size of 400 bp with library concentrations range from 21.2–64.5 ng/μl. Resequencing of the<br />libraries resulted in 50.1 x 109 bp total genomic sequence. The quality of genomic library and sequence data resulted from this<br />study was high as indicated by Q score of 88.6% with low sequencing error of only 0.97%. Bioinformatic analysis resulted in a<br />total of 2,597,286 SNPs, 257,598 insertions, and 202,157 deletions. Of the total SNPs identified, only 95,207 SNPs (2.15%) were<br />located within exons. Among those, 49,926 SNPs caused missense mutation and 1,535 SNPs caused nonsense mutation. SNPs<br />resulted from this study upon verification will be very useful for genome-wide SNP chip development of the soybean genome to<br />accelerate breeding program of the soybean.</p>


2021 ◽  
Author(s):  
Víctor García-Olivares ◽  
Adrián Muñoz-Barrera ◽  
José Miguel Lorenzo-Salazar ◽  
Carlos Zaragoza-Trello ◽  
Luis A. Rubio-Rodríguez ◽  
...  

AbstractThe mitochondrial genome (mtDNA) is of interest for a range of fields including evolutionary, forensic, and medical genetics. Human mitogenomes can be classified into evolutionary related haplogroups that provide ancestral information and pedigree relationships. Because of this and the advent of high-throughput sequencing (HTS) technology, there is a diversity of bioinformatic tools for haplogroup classification. We present a benchmarking of the 11 most salient tools for human mtDNA classification using empirical whole-genome (WGS) and whole-exome (WES) short-read sequencing data from 36 unrelated donors. Besides, because of its relevance, we also assess the best performing tool in third-generation long noisy read WGS data obtained with nanopore technology for a subset of the donors. We found that, for short-read WGS, most of the tools exhibit high accuracy for haplogroup classification irrespective of the input file used for the analysis. However, for short-read WES, Haplocheck and MixEmt were the most accurate tools. Based on the performance shown for WGS and WES, and the accompanying qualitative assessment, Haplocheck stands out as the most complete tool. For third-generation HTS data, we also showed that Haplocheck was able to accurately retrieve mtDNA haplogroups for all samples assessed, although only after following assembly-based approaches (either based on a referenced-based assembly or a hybrid de novo assembly). Taken together, our results provide guidance for researchers to select the most suitable tool to conduct the mtDNA analyses from HTS data.


2020 ◽  
Author(s):  
Bourema Kouriba ◽  
Angela Duerr ◽  
Alexandra Rehn ◽  
Abdoul Karim Sangare ◽  
Brehima Youssouf Traoure ◽  
...  

We are currently facing a pandemic of COVID-19, caused by a spillover from an animal-originating coronavirus to humans occuring in the Wuhan region, China, in December 2019. From China the virus has spread to 188 countries and regions worldwide, reaching the Sahel region on the 2nd of March 2020. Since whole genome sequencing (WGS) data is very crucial to understand the spreading dynamics of the ongoing pandemic, but only limited sequence data is available from the Sahel region to date, we have focused our efforts on generating the first Malian sequencing data available. Screening of 217 Malian patient samples for the presence of SARS-CoV-2 resulted in 38 positive isolates from which 21 whole genome sequences were generated. Our analysis shows that both, the early A (19B) and the fast evolving B (20A/C) clade, are present in Mali indicating multiple and independent introductions of the SARS-CoV-2 to the Sahel region.


2011 ◽  
Vol 29 (8) ◽  
pp. 723-730 ◽  
Author(s):  
Yingrui Li ◽  
Hancheng Zheng ◽  
Ruibang Luo ◽  
Honglong Wu ◽  
Hongmei Zhu ◽  
...  

2019 ◽  
Author(s):  
Sarah J. Vancuren ◽  
Scott J. Dos Santos ◽  
Janet E. Hill ◽  

AbstractAmplification and sequencing of conserved genetic barcodes such as the cpn60 gene is a common approach to determining the taxonomic composition of microbiomes. Exact sequence variant calling has been proposed as an alternative to previously established methods for aggregation of sequence reads into operational taxonomic units (OTU). We investigated the utility of variant calling for cpn60 barcode sequences and determined the minimum sequence length required to provide species-level resolution. Sequence data from the 5’ region of the cpn60 barcode amplified from the human vaginal microbiome (n=45), and a mock community were used to compare variant calling to de novo assembly of reads, and mapping to a reference sequence database in terms of number of OTU formed, and overall community composition. Variant calling resulted in microbiome profiles that were consistent in apparent composition to those generated with the other methods but with significant logistical advantages. Variant calling is rapid, achieves high resolution of taxa, and does not require reference sequence data. Our results further demonstrate that 150 bp from the 5’ end of the cpn60 barcode sequence is sufficient to provide species-level resolution of microbiota.


Sign in / Sign up

Export Citation Format

Share Document