Can we use it? On the utility of de novo and reference-based assembly of Nanopore data for plant plastome sequencing

Mapping Intimacies ◽

10.1101/855981 ◽

2019 ◽

Author(s):

Agnes Scheunert ◽

Marco Dorfner ◽

Thomas Lingl ◽

Christoph Oberprieler

Keyword(s):

Chloroplast Genome ◽

De Novo ◽

Phylogenetic Analyses ◽

Consensus Sequence ◽

Suitable Alternative ◽

Short Read ◽

Chloroplast Genomes ◽

Long Reads ◽

Illumina Data

AbstractThe chloroplast genome harbors plenty of valuable information for phylogenetic research. Illumina short-read data is generally used for de novo assembly of whole plastomes. PacBio or Oxford Nanopore long reads are additionally employed in hybrid approaches to enable assembly across the highly similar inverted repeats of a chloroplast genome. Unlike for PacBio, plastome assemblies based solely on Nanopore reads are rarely found, due to their high error rate and non-random error profile. However, the actual quality decline connected to their use has never been quantified. Furthermore, no study has employed reference-based assembly using Nanopore reads, which is common with Illumina data. Using Leucanthemum Mill. as an example, we compared the sequence quality of seven plastome assemblies of the same species, using combinations of two sequencing platforms and three analysis pipelines. In addition, we assessed the factors which might influence Nanopore assembly quality during sequence generation and bioinformatic processing.The consensus sequence derived from de novo assembly of Nanopore data had a sequence identity of 99.59% compared to Illumina short-read de novo assembly. Most of the found errors comprise indels (81.5%), and a large majority of them is part of homopolymer regions. The quality of reference-based assembly is heavily dependent upon the choice of a close-enough reference. Using a reference with 0.83% sequence divergence from the studied species, mapping of Nanopore reads results in a consensus comparable to that from Nanopore de novo assembly, and of only slightly inferior quality compared to a reference-based assembly with Illumina data (0.49% and 0.26% divergence from Illumina de novo). For optimal assembly of Nanopore data, appropriate filtering of contaminants and chimeric sequences, as well as employing moderate read coverage, is essential.Based on these results, we conclude that Nanopore long reads are a suitable alternative to Illumina short reads in plastome phylogenomics. Only few errors remain in the finalized assembly, which can be easily masked in phylogenetic analyses without loss in analytical accuracy. The easily applicable and cost-effective technology might warrant more attention by researchers dealing with plant chloroplast genomes.

Download Full-text

The complete chloroplast genome sequence of strawberry (Fragaria × ananassaDuch.) and comparison with related species of Rosaceae

PeerJ ◽

10.7717/peerj.3919 ◽

2017 ◽

Vol 5 ◽

pp. e3919 ◽

Cited By ~ 27

Author(s):

Hui Cheng ◽

Jinfeng Li ◽

Hong Zhang ◽

Binhua Cai ◽

Zhihong Gao ◽

...

Keyword(s):

Chloroplast Genome ◽

De Novo ◽

Phylogenetic Analyses ◽

Genome Structure ◽

Rrna Genes ◽

Trna Genes ◽

Illumina Hiseq ◽

Complete Chloroplast Genome ◽

Coding Regions ◽

Chloroplast Genomes

Compared with other members of the family Rosaceae, the chloroplast genomes ofFragariaspecies exhibit low variation, and this situation has limited phylogenetic analyses; thus, complete chloroplast genome sequencing ofFragariaspecies is needed. In this study, we sequenced the complete chloroplast genome ofF. × ananassa‘Benihoppe’ using the Illumina HiSeq 2500-PE150 platform and then performed a combination ofde novoassembly and reference-guided mapping of contigs to generate complete chloroplast genome sequences. The chloroplast genome exhibits a typical quadripartite structure with a pair of inverted repeats (IRs, 25,936 bp) separated by large (LSC, 85,531 bp) and small (SSC, 18,146 bp) single-copy (SC) regions. The length of theF. × ananassa‘Benihoppe’ chloroplast genome is 155,549 bp, representing the smallestFragariachloroplast genome observed to date. The genome encodes 112 unique genes, comprising 78 protein-coding genes, 30 tRNA genes and four rRNA genes. Comparative analysis of the overall nucleotide sequence identity among ten complete chloroplast genomes confirmed that for both coding and non-coding regions in Rosaceae, SC regions exhibit higher sequence variation than IRs. The Ka/Ks ratio of most genes was less than 1, suggesting that most genes are under purifying selection. Moreover, the mVISTA results also showed a high degree of conservation in genome structure, gene order and gene content inFragaria, particularly among three octoploid strawberries which wereF. × ananassa‘Benihoppe’,F.chiloensis(GP33) andF.virginiana(O477). However, when the sequences of the coding and non-coding regions ofF. × ananassa‘Benihoppe’ were compared in detail with those ofF.chiloensis(GP33) andF.virginiana(O477), a number of SNPs and InDels were revealed by MEGA 7. Six non-coding regions (trnK-matK,trnS-trnG,atpF-atpH,trnC-petN,trnT-psbDandtrnP-psaJ) with a percentage of variable sites greater than 1% and no less than five parsimony-informative sites were identified and may be useful for phylogenetic analysis of the genusFragaria.

Download Full-text

Comparison of Chloroplast Genomes among Species of Unisexual and Bisexual Clades of the Monocot Family Araceae

Plants ◽

10.3390/plants9060737 ◽

2020 ◽

Vol 9 (6) ◽

pp. 737 ◽

Cited By ~ 2

Author(s):

Abdullah ◽

Claudia L. Henriquez ◽

Furrukh Mehmood ◽

Iram Shahzadi ◽

Zain Ali ◽

...

Keyword(s):

Chloroplast Genome ◽

De Novo ◽

Phylogenetic Analyses ◽

Genome Structure ◽

Trna Genes ◽

Protein Coding ◽

Chloroplast Genomes ◽

History Of ◽

And Bisexual ◽

Contraction And Expansion

The chloroplast genome provides insight into the evolution of plant species. We de novo assembled and annotated chloroplast genomes of four genera representing three subfamilies of Araceae: Lasia spinosa (Lasioideae), Stylochaeton bogneri, Zamioculcas zamiifolia (Zamioculcadoideae), and Orontium aquaticum (Orontioideae), and performed comparative genomics using these chloroplast genomes. The sizes of the chloroplast genomes ranged from 163,770 bp to 169,982 bp. These genomes comprise 113 unique genes, including 79 protein-coding, 4 rRNA, and 30 tRNA genes. Among these genes, 17–18 genes are duplicated in the inverted repeat (IR) regions, comprising 6–7 protein-coding (including trans-splicing gene rps12), 4 rRNA, and 7 tRNA genes. The total number of genes ranged between 130 and 131. The infA gene was found to be a pseudogene in all four genomes reported here. These genomes exhibited high similarities in codon usage, amino acid frequency, RNA editing sites, and microsatellites. The oligonucleotide repeats and junctions JSB (IRb/SSC) and JSA (SSC/IRa) were highly variable among the genomes. The patterns of IR contraction and expansion were shown to be homoplasious, and therefore unsuitable for phylogenetic analyses. Signatures of positive selection were seen in three genes in S. bogneri, including ycf2, clpP, and rpl36. This study is a valuable addition to the evolutionary history of chloroplast genome structure in Araceae.

Download Full-text

Assembly of chloroplast genomes with long- and short-read data: a comparison of approaches using Eucalyptus pauciflora as a test case

10.1101/320085 ◽

2018 ◽

Author(s):

weiwen wang ◽

Miriam Schalamun ◽

Alejandro Morales Suarez ◽

David Kainer ◽

Benjamin Schwessinger ◽

...

Keyword(s):

Chloroplast Genome ◽

Inverted Repeat ◽

Single Copy ◽

Test Case ◽

Short Read ◽

Short Reads ◽

Chloroplast Genomes ◽

Long Reads ◽

Long Read ◽

Hybrid Assemblies

AbstractBackgroundChloroplasts are organelles that conduct photosynthesis in plant and algal cells. Chloroplast genomes code for around 130 genes, and the information they contain is widely used in agriculture and studies of evolution and ecology. Correctly assembling complete chloroplast genomes can be challenging because the chloroplast genome contains a pair of long inverted repeats (10–30 kb). The advent of long-read sequencing technologies should alleviate this problem by providing sufficient information to completely span the inverted repeat regions. Yet, long-reads tend to have higher error rates than short-reads, and relatively little is known about the best way to combine long- and short-reads to obtain the most accurate chloroplast genome assemblies. Using Eucalyptus pauciflora, the snow gum, as a test case, we evaluated the effect of multiple parameters, such as different coverage of long (Oxford nanopore) and short (Illumina) reads, different long-read lengths, different assembly pipelines, and different genome polishing steps, with a view to determining the most accurate and efficient approach to chloroplast genome assembly.ResultsHybrid assemblies combining at least 20x coverage of both long-reads and short-reads generated a single contig spanning the entire chloroplast genome with few or no detectable errors. Short-read-only assemblies generated three contigs representing the long single copy, short single copy and inverted repeat regions of the chloroplast genome. These contigs contained few single-base errors but tended to exclude several bases at the beginning or end of each contig. Long-read-only assemblies tended to create multiple contigs with a much higher single-base error rate, even after polishing. The chloroplast genome of Eucalyptus pauciflora is 159,942 bp, contains 131 genes of known function, and confirms the phylogenetic position of Eucalyptus pauciflora as a close relative of Eucalyptus regnans.ConclusionsOur results suggest that very accurate assemblies of chloroplast genomes can be achieved using a combination of at least 20x coverage of long- and short-reads respectively, provided that the long-reads contain at least ~5x coverage of reads longer than the inverted repeat region. We show that further increases in coverage give little or no improvement in accuracy, and that hybrid assemblies are more accurate than long-read-only or short-read-only assemblies.

Download Full-text

Comparison among the first representative chloroplast genomes of Orontium, Lasia, Zamioculcas, and Stylochaeton of the plant family Araceae: inverted repeat dynamics are not linked to phylogenetic signaling

10.1101/2020.04.07.029389 ◽

2020 ◽

Author(s):

Abdullah ◽

Claudia L. Henriquez ◽

Furrukh Mehmood ◽

Iram Shahzadi ◽

Zain Ali ◽

...

Keyword(s):

Chloroplast Genome ◽

De Novo ◽

Phylogenetic Analyses ◽

Genome Structure ◽

Trna Genes ◽

Protein Coding ◽

Chloroplast Genomes ◽

History Of ◽

Contraction And Expansion ◽

Insight Into

AbstractThe chloroplast genome provides insight into the evolution of plant species. We de novo assembled and annotated chloroplast genomes of the first representatives of four genera representing three subfamilies: Lasia spinosa (Lasioideae), Stylochaeton bogneri, Zamioculcas zamiifolia (Zamioculcadoideae), and Orontium aquaticum (Orontioideae), and performed comparative genomics using the plastomes. The size of the chloroplast genomes ranged from 163,770–169,982 bp. These genomes comprise 114 unique genes, including 80 protein-coding, 4 rRNA, and 30 tRNA genes. These genomes exhibited high similarities in codon usage, amino acid frequency, RNA editing sites, and microsatellites. The junctions JSB (IRb/SSC) and JSA (SSC/IRa) are highly variable, as is oligonucleotide repeats content among the genomes. The patterns of inverted repeats contraction and expansion were shown to be homoplasious and therefore unsuitable for phylogenetic analyses. Signatures of positive selection were shown for several genes in S. bogneri. This study is a valuable addition to the evolutionary history of chloroplast genome structure in Araceae.

Download Full-text

Comparative Chloroplast Genomics of Litsea Lam. (Lauraceae) and Its Phylogenetic Implications

Forests ◽

10.3390/f12060744 ◽

2021 ◽

Vol 12 (6) ◽

pp. 744

Author(s):

Yunyan Zhang ◽

Yongjing Tian ◽

David Y. P. Tng ◽

Jingbo Zhou ◽

Yuntian Zhang ◽

...

Keyword(s):

Chloroplast Genome ◽

Phylogenetic Analyses ◽

Eastern China ◽

Chloroplast Microsatellites ◽

Endangered Tree ◽

Protein Coding ◽

Intergenic Spacers ◽

Chloroplast Genomes ◽

Phylogenetic Implications ◽

Chloroplast Genome Sequence

Litsea Lam. is an ecological and economic important genus of the “core Lauraceae” group in the Lauraceae. The few studies to date on the comparative chloroplast genomics and phylogenomics of Litsea have been conducted as part of other studies on the Lauraceae. Here, we sequenced the whole chloroplast genome sequence of Litsea auriculata, an endangered tree endemic to eastern China, and compared this with previously published chloroplast genome sequences of 11 other Litsea species. The chloroplast genomes of the 12 Litsea species ranged from 152,132 (L. szemaois) to 154,011 bp (L. garrettii) and exhibited a typical quadripartite structure with conserved genome arrangement and content, with length variations in the inverted repeat regions (IRs). No codon usage preferences were detected within the 30 codons used in the chloroplast genomes, indicating a conserved evolution model for the genus. Ten intergenic spacers (psbE–petL, trnH–psbA, petA–psbJ, ndhF–rpl32, ycf4–cemA, rpl32–trnL, ndhG–ndhI, psbC–trnS, trnE–trnT, and psbM–trnD) and five protein coding genes (ndhD, matK, ccsA, ycf1, and ndhF) were identified as divergence hotspot regions and DNA barcodes of Litsea species. In total, 876 chloroplast microsatellites were located within the 12 chloroplast genomes. Phylogenetic analyses conducted using the 51 additional complete chloroplast genomes of “core Lauraceae” species demonstrated that the 12 Litsea species grouped into four sub-clades within the Laurus-Neolitsea clade, and that Litsea is polyphyletic and closely related to the genera Lindera and Laurus. Our phylogeny strongly supported the monophyly of the following three clades (Laurus–Neolitsea, Cinnamomum–Ocotea, and Machilus–Persea) among the above investigated “core Lauraceae” species. Overall, our study highlighted the taxonomic utility of chloroplast genomes in Litsea, and the genetic markers identified here will facilitate future studies on the evolution, conservation, population genetics, and phylogeography of L. auriculata and other Litsea species.

Download Full-text

Chloroplast genome variation and phylogenetic relationships of Atractylodes species

BMC Genomics ◽

10.1186/s12864-021-07394-8 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Yiheng Wang ◽

Sheng Wang ◽

Yanlei Liu ◽

Qingjun Yuan ◽

Jiahui Sun ◽

...

Keyword(s):

Chloroplast Genome ◽

High Throughput Sequencing ◽

Phylogenetic Analyses ◽

Herbal Medicines ◽

Structural Genomic ◽

Protein Coding ◽

Chloroplast Genomes ◽

Original Plant ◽

Morphological Differences ◽

Rna Genes

Abstract Background Atractylodes DC is the basic original plant of the widely used herbal medicines “Baizhu” and “Cangzhu” and an endemic genus in East Asia. Species within the genus have minor morphological differences, and the universal DNA barcodes cannot clearly distinguish the systemic relationship or identify the species of the genus. In order to solve these question, we sequenced the chloroplast genomes of all species of Atractylodes using high-throughput sequencing. Results The results indicate that the chloroplast genome of Atractylodes has a typical quadripartite structure and ranges from 152,294 bp (A. carlinoides) to 153,261 bp (A. macrocephala) in size. The genome of all species contains 113 genes, including 79 protein-coding genes, 30 transfer RNA genes and four ribosomal RNA genes. Four hotspots, rpl22-rps19-rpl2, psbM-trnD, trnR-trnT(GGU), and trnT(UGU)-trnL, and a total of 42–47 simple sequence repeats (SSR) were identified as the most promising potentially variable makers for species delimitation and population genetic studies. Phylogenetic analyses of the whole chloroplast genomes indicate that Atractylodes is a clade within the tribe Cynareae; Atractylodes species form a monophyly that clearly reflects the relationship within the genus. Conclusions Our study included investigations of the sequences and structural genomic variations, phylogenetics and mutation dynamics of Atractylodes chloroplast genomes and will facilitate future studies in population genetics, taxonomy and species identification.

Download Full-text

Tigridiopalma longmenensis (Melastomataceae), a new species from Guangdong, China

Phytotaxa ◽

10.11646/phytotaxa.500.3.8 ◽

2021 ◽

Vol 500 (3) ◽

pp. 241-247

Author(s):

HUI-FENG WANG ◽

ZHENG-FENG WANG ◽

QIAO-MEI QIN ◽

HONG-LIN CAO ◽

XIAO-MING GUO

Keyword(s):

New Species ◽

Chloroplast Genome ◽

Phylogenetic Analyses ◽

Monotypic Genus ◽

Distribution Map ◽

Complete Chloroplast Genome ◽

Chloroplast Genomes ◽

A New Species

Tigridiopalma longmenensis, a new species from Guangdong, China, is described. This species differs from its ally, T. magnifica, by the polychasium consisting of scorpioid cymes, hypanthium with carinas on angles, and longer stamens with a conspicuously white or pink spur at the connective base of anther. A diagnosis and a distribution map of the two species are also provided. The complete chloroplast genome of T. longmenensis was reported here. Phylogenetic analyses based on complete chloroplast genomes from T. longmenensis and other 15 Melastomataceae species indicated that T. longmenensis is sister to T. magnifica. The discovery of T. longmenensis terminates Tigridiopalma as a monotypic genus.

Download Full-text

A haplotype-resolved, de novo genome assembly for the wood tiger moth (Arctia plantaginis) through trio binning

GigaScience ◽

10.1093/gigascience/giaa088 ◽

2020 ◽

Vol 9 (8) ◽

Cited By ~ 2

Author(s):

Eugenie C Yen ◽

Shane A McCarthy ◽

Juan A Galarza ◽

Tomas N Generalovic ◽

Sarah Pelan ◽

...

Keyword(s):

Population Structure ◽

Genome Assembly ◽

De Novo ◽

Consensus Sequence ◽

Parental Origin ◽

De Novo Genome Assembly ◽

Geographic Population ◽

Long Reads ◽

Tiger Moth ◽

Innovative Solution

ABSTRACT Background Diploid genome assembly is typically impeded by heterozygosity because it introduces errors when haplotypes are collapsed into a consensus sequence. Trio binning offers an innovative solution that exploits heterozygosity for assembly. Short, parental reads are used to assign parental origin to long reads from their F1 offspring before assembly, enabling complete haplotype resolution. Trio binning could therefore provide an effective strategy for assembling highly heterozygous genomes, which are traditionally problematic, such as insect genomes. This includes the wood tiger moth (Arctia plantaginis), which is an evolutionary study system for warning colour polymorphism. Findings We produced a high-quality, haplotype-resolved assembly for Arctia plantaginis through trio binning. We sequenced a same-species family (F1 heterozygosity ∼1.9%) and used parental Illumina reads to bin 99.98% of offspring Pacific Biosciences reads by parental origin, before assembling each haplotype separately and scaffolding with 10X linked reads. Both assemblies are contiguous (mean scaffold N50: 8.2 Mb) and complete (mean BUSCO completeness: 97.3%), with annotations and 31 chromosomes identified through karyotyping. We used the assembly to analyse genome-wide population structure and relationships between 40 wild resequenced individuals from 5 populations across Europe, revealing the Georgian population as the most genetically differentiated with the lowest genetic diversity. Conclusions We present the first invertebrate genome to be assembled via trio binning. This assembly is one of the highest quality genomes available for Lepidoptera, supporting trio binning as a potent strategy for assembling heterozygous genomes. Using our assembly, we provide genomic insights into the geographic population structure of A. plantaginis.

Download Full-text

The complete chloroplast genome of Saxifraga sinomontana (Saxifragaceae) and comparative analysis with other Saxifragaceae species

Revista Brasileira de Botânica ◽

10.1007/s40415-019-00561-y ◽

2019 ◽

Vol 42 (4) ◽

pp. 601-611 ◽

Cited By ~ 1

Author(s):

Yan Li ◽

Liukun Jia ◽

Zhihua Wang ◽

Rui Xing ◽

Xiaofeng Chi ◽

...

Keyword(s):

Comparative Analysis ◽

Chloroplast Genome ◽

Phylogenetic Relationships ◽

De Novo ◽

Single Copy ◽

Bootstrap Support ◽

Protein Coding ◽

Complete Chloroplast Genome ◽

Protein Coding Genes ◽

Chloroplast Genomes

Abstract Saxifraga sinomontana J.-T. Pan & Gornall belongs to Saxifraga sect. Ciliatae subsect. Hirculoideae, a lineage containing ca. 110 species whose phylogenetic relationships are largely unresolved due to recent rapid radiations. Analyses of complete chloroplast genomes have the potential to significantly improve the resolution of phylogenetic relationships in this young plant lineage. The complete chloroplast genome of S. sinomontana was de novo sequenced, assembled and then compared with that of other six Saxifragaceae species. The S. sinomontana chloroplast genome is 147,240 bp in length with a typical quadripartite structure, including a large single-copy region of 79,310 bp and a small single-copy region of 16,874 bp separated by a pair of inverted repeats (IRs) of 25,528 bp each. The chloroplast genome contains 113 unique genes, including 79 protein-coding genes, four rRNAs and 30 tRNAs, with 18 duplicates in the IRs. The gene content and organization are similar to other Saxifragaceae chloroplast genomes. Sixty-one simple sequence repeats were identified in the S. sinomontana chloroplast genome, mostly represented by mononucleotide repeats of polyadenine or polythymine. Comparative analysis revealed 12 highly divergent regions in the intergenic spacers, as well as coding genes of matK, ndhK, accD, cemA, rpoA, rps19, ndhF, ccsA, ndhD and ycf1. Phylogenetic reconstruction of seven Saxifragaceae species based on 66 protein-coding genes received high bootstrap support values for nearly all identified nodes, suggesting a promising opportunity to resolve infrasectional relationships of the most species-rich section Ciliatae of Saxifraga.

Download Full-text

Full chloroplast genome assembly and phylogeny of “red and “yellow” Bixa orellana, “achiote”; popular source of food coloring and traditional medicine.

10.21203/rs.3.rs-20035/v1 ◽

2020 ◽

Author(s):

Jorge Villacres Vallejo ◽

José Aranda Ventura ◽

Anna Wallis ◽

Robin Cagle ◽

Sara M. Handy ◽

...

Keyword(s):

Chloroplast Genome ◽

Base Pair ◽

Phylogenetic Analyses ◽

Color Variation ◽

Synthetic Dyes ◽

Bixa Orellana ◽

Chloroplast Genomes ◽

Base Pair Deletion ◽

Wide Range ◽

Future Work

Abstract BackgroundSeeds from Bixa orellana, commonly known as “achiote” and “annatto” produce bixin and norbixin apocarotenoids which impart bright red and orange colors that have been used for thousands of years for food, medicine and body painting by indigenous Americans, and by Europeans for ~ 500 years as food coloring, especially for cheeses. Use of Bixa colorants continues to grow as synthetic dyes come under increased scrutiny for toxicity to human and environmental systems. There is a wide range of color variation in pods of Bixa orellana for which genetic loci that delineate phenotypes have not yet been identified. Whole chloroplast genomes and raw genome skims provide a wide variety of genetic markers that can be used for identification purposes as well as phylogenetic inference of broad scale evolutionary relationships. Here we apply whole chloroplast genome sequencing of “red” and “yellow” individuals of Bixa orellana for phylogenetic analyses to explore the position of Bixaceae relative to other families within the Malvales as well as to underpin future work that may delineate diverse color phenotypes.ResultsFully assembled chloroplast genomes were produced for both red and yellow Bixa orellana accessions (158,918 and 158,823 bp respectively). Synteny and gene content was identical to the only other previously reported full chloroplast genome of Bixa orellana (NC_041550). We observed a 17 base pair deletion at position 58399-58415 in both of our accessions, relative to NC_041550 and a 6 base pair deletion at position 75531-75526 in the accession of “red” Bixa. A phylogeny based on alignment free kmer distance metrics was used to confirm monophylly of Bixa accessions, and to place Bixaceae relative to other families within the Malvales.ConclusionsOur data support Bixaceae as sister to Malvaceae and identified several potentially diagnostic insertion-deletion mutations that may with future work, reliably distinguish between red and yellow phenotypes. In addition to utility for phylogenic questions and development of identity markers, we demonstrate that chloroplast genomes can be used in conjunction with modern bioinformatic search tools (kmer based) to provide rapid and precise identification of Bixa orellana for Next Generation Sequencing approaches to natural product authentication.

Download Full-text