scholarly journals PACVr: Plastome Assembly Coverage Visualization in R

2019 ◽  
Author(s):  
Michael Gruenstaeudl ◽  
Nils Jenke

ABSTRACTBackgroundThe circular, quadripartite structure of plastid genomes which includes two inverted repeat regions renders the automatic assembly of plastid genomes challenging. The correct assembly of plastid genomes is a prerequisite for the validity of subsequent analyses on plastid genome structure and evolution. Plastome-based phylogenetic or population genetic investigations, for example, require the precise identification of DNA sequence and length to determine the location of nucleotide polymorphisms. The average coverage depth of a genome assembly is often used as an indicator for assembly quality. Visualizing coverage depth across a draft genome allows users to inspect the quality of the assembly and, where applicable, identify regions of reduced assembly confidence. Based on such visualizations, users can conduct a local re-assembly or other forms of targeted error correction. Few, if any, contemporary software tools can visualize the coverage depth of a plastid genome assembly while taking its quadripartite structure into account, despite the interplay between genome structure and assembly quality. A software tool is needed that visualizes the coverage depth of a plastid genome assembly on a circular, quadripartite map of the plastid genome.ResultsWe introduce ‘PACVr’, an R package that visualizes the coverage depth of a plastid genome assembly in relation to the circular, quadripartite structure of the genome as well as to the individual plastome genes. The tool allows visualizations on different scales using a variable window approach and also visualizes the equality of gene synteny in the inverted repeat regions of the plastid genome, thus providing an additional measure of assembly quality. As a tool for plastid genomics, PACVr provides the functionality to identify regions of coverage depth above or below user-defined threshold values and helps to identify non-identical IR regions. To allow easy integration into bioinformatic workflows, PACVr can be directly invoked from a Unix shell, thus facilitating its use in automated quality control. We illustrate the application of PACVr on two empirical datasets and compare the resulting visualizations with alternative software tools for displaying plastome sequencing coverage.ConclusionsPACVr provides a user-friendly tool to visualize (a) the coverage depth of a plastid genome assembly on a circular, quadripartite plastome map and in relation to individual plastome genes, and (b) the equality of gene synteny in the inverted repeat regions. It, thus, contributes to optimizing plastid genome assemblies and increasing the reliability of publicly available plastome sequences, especially in light of incongruence among the visualization results of alternative software tools. The software, example datasets, technical documentation, and a tutorial are available with the package at https://github.com/michaelgruenstaeudl/PACVr.

2015 ◽  
Author(s):  
Rob W Ness ◽  
Susanne A Kraemer ◽  
Nick Colegrave ◽  
Peter D Keightley

Plastids perform crucial cellular functions, including photosynthesis, across a wide variety of eukaryotes. Since endosymbiosis, plastids have maintained independent genomes that now display a wide diversity of gene content, genome structure, gene regulation mechanisms, and transmission modes. The evolution of plastid genomes depends on an input ofde novomutation, but our knowledge of mutation in the plastid is limited to indirect inference from patterns of DNA divergence between species. Here, we use a mutation accumulation experiment, where selection acting on mutations is rendered ineffective, combined with whole-plastid genome sequencing to directly characterize de novo mutation inChlamydomonas reinhardtii. We show that the mutation rates of the plastid and nuclear genomes are similar, but that the base spectra of mutations differ significantly. We integrate our measure of the mutation rate with a population genomic dataset of 20 individuals, and show that the plastid genome is subject to substantially stronger genetic drift than the nuclear genome. We also show that high levels of linkage disequilibrium in the plastid genome are not due to restricted recombination, but are instead a consequence of increased genetic drift. One likely explanation for increased drift in the plastid genome is that there are stronger effects of genetic hitchhiking. The presence of recombination in the plastid is consistent with laboratory studies inC. reinhardtiiand demonstrates that although the plastid genome is thought to be uniparentally inherited, it recombines in nature at a rate similar to the nuclear genome.


Plants ◽  
2020 ◽  
Vol 9 (8) ◽  
pp. 965 ◽  
Author(s):  
Xian-Lin Guo ◽  
Hong-Yi Zheng ◽  
Megan Price ◽  
Song-Dong Zhou ◽  
Xing-Jin He

Chamaesium H. Wolff (Apiaceae, Apioideae) is a small genus mainly distributed in the Hengduan Mountains and the Himalayas. Ten species of Chamaesium have been described and nine species are distributed in China. Recent advances in molecular phylogenetics have revolutionized our understanding of Chinese Chamaesium taxonomy and evolution. However, an accurate phylogenetic relationship in Chamaesium based on the second-generation sequencing technology remains poorly understood. Here, we newly assembled nine plastid genomes from the nine Chinese Chamaesium species and combined these genomes with eight other species from five genera to perform a phylogenic analysis by maximum likelihood (ML) using the complete plastid genome and analyzed genome structure, GC content, species pairwise Ka/Ks ratios and the simple sequence repeat (SSR) component. We found that the nine species’ plastid genomes ranged from 152,703 bp (C. thalictrifolium) to 155,712 bp (C. mallaeanum), and contained 133 genes, 34 SSR types and 585 SSR loci. We also found 20,953–21,115 codons from 53 coding sequence (CDS) regions, 38.4–38.7% GC content of the total genome and low Ka/Ks (0.27–0.43) ratios of 53 aligned CDS. These results will facilitate our further understanding of the evolution of the genus Chamaesium.


PeerJ ◽  
2021 ◽  
Vol 9 ◽  
pp. e10774
Author(s):  
Yingfeng Niu ◽  
Chengwen Gao ◽  
Jin Liu

Mango is an important commercial fruit crop belonging to the genus Mangifera. In this study, we reported and compared four newly sequenced plastid genomes of the genus Mangifera, which showed high similarities in overall size (157,780–157,853 bp), genome structure, gene order, and gene content. Three mutation hotspots (trnG-psbZ, psbD-trnT, and ycf4-cemA) were identified as candidate DNA barcodes for Mangifera. These three DNA barcode candidate sequences have high species identification ability. We also identified 12 large fragments that were transferred from the plastid genome to the mitochondrial genome, and found that the similarity was more than 99%. The total size of the transferred fragment was 35,652 bp, accounting for 22.6% of the plastid genome. Fifteen intact chloroplast genes, four tRNAs and numerous partial genes and intergenic spacer regions were identified. There are many of these genes transferred from mitochondria to the chloroplast in other species genomes. Phylogenetic analysis based on whole plastid genome data provided a high support value, and the interspecies relationships within Mangifera were resolved well.


2019 ◽  
Author(s):  
Georgios D. Koutsovoulos ◽  
Marine Poullet ◽  
Abdelnaser El Ashry ◽  
Djampa K. Kozlowski ◽  
Erika Sallet ◽  
...  

AbstractRoot-knot nematodes (genusMeloidogyne) are plant parasitic species that cause huge economic loss in the agricultural industry and affect the prosperity of communities in developing countries. Control methods against these plant pests are sparse and the current preferred method is deployment of plant cultivars bearing resistance genes againstMeloidogynespecies. However, some species such asM. enterolobiiare not controlled by the resistance genes deployed in the most important crop plants cultivated in Europe. The recent identification of this species in Europe is thus a major concern. Like the other most damaging Meloidogyne species (e.g.M. incognita,M. arenariaandM. javanica),M. enterolobiireproduces by obligatory mitotic parthenogenesis. Genomic singularities such as a duplicated genome structure and a relatively high proportion of transposable elements have previously been described in the above mentioned mitotic parthenogenetic Meloidogyne.To gain a better understanding of the genomic and evolutionary background we sequenced the genome ofM. enterolobiiusing high coverage short and long read technologies. The information contained in the long reads helped produce a highly contiguous genome assembly ofM. enterolobii, thus enabling us to perform high quality annotations of coding and non-coding genes, and transposable elements.The genome assembly and annotation reveals a genome structure similar to the ones described in the other mitotic parthenogenetic Meloidogyne, described as recent hybrids. Most of the genome is present in 3 different copies that show high divergence. Because most of the genes belong to these duplicated regions only few gene losses took place, which suggest a recent polyploidization. The most likely hypothesis to reconcile high divergence between genome copies despite few gene losses and translocations is also a recent hybrid origin. Consistent with this hypothesis, we found an abundance of transposable elements at least as high as the one observed in the mitotic parthenogenetic nematodesM. incognitaandM. javanica.


Author(s):  
Lin Kang ◽  
Pawel Michalak ◽  
Eric Hallerman ◽  
Nancy D Moncrief

Abstract The eastern fox squirrel, Sciurus niger, exhibits marked geographic variation in size and coat color, is a model organism for studies of behavior and ecology, and a potential model for investigating physiological solutions to human porphyrias. We assembled a genome using Illumina HiSeq, PacBio SMRT, and Oxford Nanopore MinION sequencing platforms. Together, the sequencing data resulted in a draft genome of 2.99 Gb, containing 32,830 scaffolds with an average size of 90.9 Kb and N50 of 183.8 Kb. Genome completeness was estimated to be 93.78%. A total of 24,443 protein-encoding genes were predicted from the assembly and 23,079 (94.42%) were annotated. Repeat elements comprised an estimated 38.49% of the genome, with the majority being LINEs (13.92%), SINEs (6.04%), and LTR elements. The topology of the species tree reconstructed using maximum-likelihood phylogenetic analysis was congruent with those of previous studies. This genome assembly can prove useful for comparative studies of genome structure and function in this rapidly diversifying lineage of mammals, for studies of population genomics and adaptation, and for biomedical research. Predicted amino acid sequence alignments for genes affecting heme biosynthesis, color vision, and hibernation showed point mutations and indels that may affect protein function and ecological adaptation.


2021 ◽  
Vol 104 (4) ◽  
pp. 003685042110599
Author(s):  
Dhafer Alzahrani ◽  
Enas Albokhari ◽  
Abidina Abba ◽  
Samaila Yaradua

Caylusea hexagyna and Ochradenus baccatus are two species in the Resedaceae family. In this study, we analysed the complete plastid genomes of these two species using high-throughput sequencing technology and compared their genomic data. The length of the plastid genome of C. hexagyna was 154,390 bp while that of O. baccatus was 153,380 bp. The lengths of the inverted repeats (IR) regions were 26,526 bp and 26,558 bp, those of the large single copy (LSC) regions were 83,870 bp and 83,023 bp; and those of the small single copy (SSC) regions were 17,468 bp and 17,241 bp in C. hexagyna and O. baccatus, respectively. Both genomes consisted of 113 genes: 79 protein-coding genes, 30 tRNA genes and 4 rRNA genes. Repeat analysis showed that the plastid genome included all types of repeats, with more frequent occurrences of palindromic sequences. Comparative studies of SSR markers showed that there were 256 markers in C. hexagyna and 255 in O. baccatus; the majority of the SSRs in these plastid genomes were mononucleotide repeats (A/T). All the clusters in the phylogenetic tree had high support. This study reported the first complete plastid genomes of the genera Caylusea and Ochradenus and the first for the Resedaceae family.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Tilman Mehl ◽  
Michael Gruenstaeudl

Abstract Background In most flowering plants, the plastid genome exhibits a quadripartite genome structure, comprising a large and a small single copy as well as two inverted repeat regions. Thousands of plastid genomes have been sequenced and submitted to public sequence repositories in recent years. The quality of sequence annotations in many of these submissions is known to be problematic, especially regarding annotations that specify the length and location of the inverted repeats: such annotations are either missing or portray the length or location of the repeats incorrectly. However, many biological investigations employ publicly available plastid genomes at face value and implicitly assume the correctness of their sequence annotations. Results We introduce , a Python package that automatically assesses the frequency of incomplete or incorrect annotations of the inverted repeats among publicly available plastid genomes. Specifically, the tool automatically retrieves plastid genomes from NCBI Nucleotide under variable search parameters, surveys them for length and location specifications of inverted repeats, and confirms any inverted repeat annotations through self-comparisons of the genome sequences. The package also includes functionality for automatic identification and removal of duplicate genome records and accounts for taxa that genuinely lack inverted repeats. A survey of the presence of inverted repeat annotations among all plastid genomes of flowering plants submitted to NCBI Nucleotide until the end of 2020 using , followed by a statistical analysis of potential associations with record metadata, highlights that release year and publication status of the genome records have a significant effect on the frequency of complete and equal-length inverted repeat annotations. Conclusion The number of plastid genomes on NCBI Nucleotide has increased dramatically in recent years, and many more genomes will likely be submitted over the next decade. enables researchers to automatically access and evaluate the inverted repeats of these plastid genomes as well as their sequence annotations and, thus, contributes to increasing the reliability of publicly available plastid genomes. The software is freely available via the Python package index at http://pypi.python.org/pypi/airpg.


Author(s):  
Luyu Xie ◽  
Limsoon Wong

Abstract Motivation Existing genome assembly evaluation metrics provide only limited insight on specific aspects of genome assembly quality, and sometimes even disagree with each other. For better integrative comparison between assemblies, we propose, here, a new genome assembly evaluation metric, Pairwise Distance Reconstruction (PDR). It derives from a common concern in genetic studies, and takes completeness, contiguity, and correctness into consideration. We also propose an approximation implementation to accelerate PDR computation. Results Our results on publicly available datasets affirm PDR’s ability to integratively assess the quality of a genome assembly. In fact, this is guaranteed by its definition. The results also indicated the error introduced by approximation is extremely small and thus negligible. Availabilityand implementation https://github.com/XLuyu/PDR. Supplementary information Supplementary data are available at Bioinformatics online.


2016 ◽  
Author(s):  
Saemundur Sveinsson ◽  
Quentin Cronk

The plastid genome retains several features from its cyanobacterial-like ancestor, one being the co-transcriptional organization of genes into operon-like structures. Some plastid operons have been identified but undoubtedly many more remain undiscovered. Here we utilize the highly variable plastome structure that exists within certain legumes of the inverted repeat lost clade (IRLC) to find conserved gene clusters. These plastomes exhibit an unusually high frequency of translocations and inversions. We analysed the plastomes of 23 legume species and identified 32 locally collinear blocks (LCBs), which are regions within the plastid genomes that occur in different orientation and/or order among the plastid genomes but are themselves free from internal rearrangements. Several represent gene clusters that have previously been recognized as plastid operons. It appears that the number of LCBs has reached saturation in our data set, suggesting that these LCBs are not random, but likely represent legume plastid operons protected from internal rearrangement by functional constraint. Some of the LCBs we identify, such as psbD/C/Z, are previously known plastid operons. Others, such as rpl32-ndhF-psbA-matK-rbcL-atpB-atpE, may represent novel polycistronic operons in legumes.


2021 ◽  
Author(s):  
Eka Giorgashvili ◽  
Katja Reichel ◽  
Calvinna Caswara ◽  
Vuqar Kerimov ◽  
Thomas Borsch ◽  
...  

Most plastid genome sequences are assembled from short-read whole-genome sequencing data, yet the impact that sequence coverage and the choice of assembly software can have on the accuracy of the resulting assemblies is poorly understood. In this study, we test the impact of both factors on plastid genome assembly in the threatened and rare endemic shrub Calligonum bakuense, which forms a distinct lineage in the genus Calligonum. We aim to characterize the differences across plastid genome assemblies generated by different assembly software tools and levels of sequence coverage and to determine if these differences are large enough to affect the phylogenetic position inferred for C. bakuense. Four assembly software tools (FastPlast, GetOrganelle, IOGA, and NOVOPlasty) and three levels of sequence coverage (original depth, 2,000x, and 500x) are compared in our analyses. The resulting assemblies are evaluated with regard to reproducibility, contig number, gene complement, inverted repeat length, and computation time; the impact of sequence differences on phylogenetic tree inference is also assessed. Our results show that software choice can have a considerable impact on the accuracy and reproducibility of plastid genome assembly and that GetOrganelle produced the most consistent assemblies for C. bakuense. Moreover, we found that a cap in sequence coverage can reduce both the sequence variability across assembly contigs and computation time. While no evidence was found that the sequence variability across assemblies was large enough to affect the phylogenetic position inferred for C. bakuense, differences among the assemblies may influence genotype recognition at the population level.


Sign in / Sign up

Export Citation Format

Share Document