The complex architecture of plant transgene insertions

Mapping Intimacies ◽

10.1101/282772 ◽

2018 ◽

Cited By ~ 1

Author(s):

Florian Jupe ◽

Todd P. Michael ◽

Angeline C. Rivkin ◽

Mark Zander ◽

S. Timothy Motley ◽

...

Keyword(s):

Dna Methylation ◽

Single Molecule ◽

Large Scale ◽

Genome Engineering ◽

De Novo ◽

Plant Genome ◽

Nanopore Sequencing ◽

Dna Arrays ◽

In Planta ◽

Actual Length

AbstractOver the last 35 years the soil bacterium Agrobacterium tumefaciens has been the workhorse tool for plant genome engineering. Replacement of native tumor-inducing (Ti) plasmid elements with customizable cassettes enabled insertion of a sequence of interest called Transfer DNA (T-DNA) into any plant genome. Although these T-DNA transfer mechanisms are well understood, detailed understanding of structure and epigenomic status of insertion events was limited by current technologies. To fill this gap, we analyzed transgenic Arabidopsis thaliana lines from three widely used collections (SALK, SAIL and WISC) with two single molecule technologies, optical genome mapping and nanopore sequencing. Optical maps for four randomly selected T-DNA lines revealed between one and seven insertions/rearrangements, and for the first time the actual length of individual transgene insertions from 27 to 236 kilobases. De novo nanopore sequencing-based genome assemblies for two segregating lines resolved T-DNA structures up to 36 kb into the insertions and revealed large-scale T-DNA associated translocations and exchange of chromosome arm ends. The multiple internally rearranged nature of T-DNA arrays made full assembly impossible, even with long nanopore reads. For the current TAIR10 reference genome, nanopore contigs corrected 83% of non-centromeric misassemblies. This unprecedented nucleotide-level definition of T-DNA insertions enabled the mapping of epigenome data. We identify variable small RNA transgene targeting and DNA methylation. SALK_059379 T-DNA insertions were enriched for 24nt siRNAs and contained dense cytosine DNA methylation. Transgene silencing via the RNA-directed DNA methylation pathway was confirmed by in planta assays. In contrast, SAIL_232 T-DNA insertions are predominantly targeted by 21/22nt siRNAs, with DNA methylation and silencing limited to a reporter, but not the resistance gene. With the emergence of genome editing technologies that rely on Agrobacterium for gene delivery, this study provides new insights into the structural impact of engineering plant genomes and demonstrates the utility of state-of-the-art long-range sequencing technologies to rapidly identify unanticipated genomic changes.

Download Full-text

MBRS-47. RAPID MOLECULAR SUBGROUPING OF MEDULLOBLASTOMA BASED ON DNA METHYLATION BY NANOPORE SEQUENCING

Neuro-Oncology ◽

10.1093/neuonc/noaa222.556 ◽

2020 ◽

Vol 22 (Supplement_3) ◽

pp. iii406-iii406

Author(s):

Julien Masliah-Planchon ◽

Elodie Girard ◽

Philipp Euskirchen ◽

Christine Bourneix ◽

Delphine Lequin ◽

...

Keyword(s):

Dna Methylation ◽

Single Molecule ◽

Nanopore Sequencing ◽

Molecular Subgroup ◽

Group Assignment ◽

Group 4 ◽

Methylation Assay ◽

Tumor Group ◽

Long Read ◽

Group 3

Abstract Medulloblastoma (MB) can be classified into four molecular subgroups (WNT group, SHH group, group 3, and group 4). The gold standard of assignment of molecular subgroup through DNA methylation profiling uses Illumina EPIC array. However, this tool has some limitation in terms of cost and timing, in order to get the results soon enough for clinical use. We present an alternative DNA methylation assay based on nanopore sequencing efficient for rapid, cheaper, and reliable subgrouping of clinical MB samples. Low-depth whole genome with long-read single-molecule nanopore sequencing was used to simultaneously assess copy number profile and MB subgrouping based on DNA methylation. The DNA methylation data generated by Nanopore sequencing were compared to a publicly available reference cohort comprising over 2,800 brain tumors including the four subgroups of MB (Capper et al. Nature; 2018) to generate a score that estimates a confidence with a tumor group assignment. Among the 24 MB analyzed with nanopore sequencing (six WNT, nine SHH, five group 3, and four group 4), all of them were classified in the appropriate subgroup established by expression-based Nanostring subgrouping. In addition to the subgrouping, we also examine the genomic profile. Furthermore, all previously identified clinically relevant genomic rearrangements (mostly MYC and MYCN amplifications) were also detected with our assay. In conclusion, we are confirming the full reliability of nanopore sequencing as a novel rapid and cheap assay for methylation-based MB subgrouping. We now plan to implement this technology to other embryonal tumors of the central nervous system.

Download Full-text

The sequencing and de novo assembly of the Larimichthys crocea genome using PacBio and Hi-C technologies

Scientific Data ◽

10.1038/s41597-019-0194-3 ◽

2019 ◽

Vol 6 (1) ◽

Cited By ~ 8

Author(s):

Baohua Chen ◽

Zhixiong Zhou ◽

Qiaozhen Ke ◽

Yidi Wu ◽

Huaqiang Bai ◽

...

Keyword(s):

Marine Fish ◽

Single Molecule ◽

Large Scale ◽

Reference Genome ◽

De Novo ◽

Larimichthys Crocea ◽

Chromosome Conformation ◽

Protein Coding ◽

Total Length ◽

Chromosome Level

Abstract Larimichthys crocea is an endemic marine fish in East Asia that belongs to Sciaenidae in Perciformes. L. crocea has now been recognized as an “iconic” marine fish species in China because not only is it a popular food fish in China, it is a representative victim of overfishing and still provides high value fish products supported by the modern large-scale mariculture industry. Here, we report a chromosome-level reference genome of L. crocea generated by employing the PacBio single molecule sequencing technique (SMRT) and high-throughput chromosome conformation capture (Hi-C) technologies. The genome sequences were assembled into 1,591 contigs with a total length of 723.86 Mb and a contig N50 length of 2.83 Mb. After chromosome-level scaffolding, 24 scaffolds were constructed with a total length of 668.67 Mb (92.48% of the total length). Genome annotation identified 23,657 protein-coding genes and 7262 ncRNAs. This highly accurate, chromosome-level reference genome of L. crocea provides an essential genome resource to support the development of genome-scale selective breeding and restocking strategies of L. crocea.

Download Full-text

Hybrid assembly of the large and highly repetitive genome of Aegilops tauschii, a progenitor of bread wheat, with the mega-reads algorithm

10.1101/066100 ◽

2016 ◽

Cited By ~ 6

Author(s):

Aleksey V. Zimin ◽

Daniela Puiu ◽

Ming-Cheng Luo ◽

Tingting Zhu ◽

Sergey Koren ◽

...

Keyword(s):

Single Molecule ◽

Large Scale ◽

Aegilops Tauschii ◽

Large Data ◽

Artificial Chromosome ◽

Error Rates ◽

Plant Genome ◽

Hybrid Assembly ◽

Data Set ◽

Long Reads

AbstractLong sequencing reads generated by single-molecule sequencing technology offer the possibility of dramatically improving the contiguity of genome assemblies. The biggest challenge today is that long reads have relatively high error rates, currently around 15%. The high error rates make it difficult to use this data alone, particularly with highly repetitive plant genomes. Errors in the raw data can lead to insertion or deletion errors (indels) in the consensus genome sequence, which in turn create significant problems for downstream analysis; for example, a single indel may shift the reading frame and incorrectly truncate a protein sequence. Here we describe an algorithm that solves the high error rate problem by combining long, high-error reads with shorter but much more accurate Illumina sequencing reads, whose error rates average <1%. Our hybrid assembly algorithm combines these two types of reads to construct mega-reads, which are both long and accurate, and then assembles the mega-reads using the CABOG assembler, which was designed for long reads. We apply this technique to a large data set of Illumina and PacBio sequences from the species Aegilops tauschii, a large and highly repetitive plant genome that has resisted previous attempts at assembly. We show that the resulting assembled contigs are far larger than in any previous assembly, with an N50 contig size of 486,807. We compare the contigs to independently produced optical maps to evaluate their large-scale accuracy, and to a set of high-quality bacterial artificial chromosome (BAC)-based assemblies to evaluate base-level accuracy.

Download Full-text

Common alleles of CMT2 and NRPE1 are major determinants of de novo DNA methylation variation in Arabidopsis thaliana

10.1101/819516 ◽

2019 ◽

Author(s):

Eriko Sasaki ◽

Taiji Kawakatsu ◽

Joseph Ecker ◽

Magnus Nordborg

Keyword(s):

Dna Methylation ◽

Large Scale ◽

Cytosine Methylation ◽

De Novo ◽

Association Studies ◽

Published Data ◽

Epigenetic Mark ◽

Stabilizing Selection ◽

Genome Wide Association Studies ◽

Genome Wide

AbstractDNA cytosine methylation is an epigenetic mark associated with silencing of transposable elements (TEs) and heterochromatin formation. In plants, it occurs in three sequence contexts: CG, CHG, and CHH (where H is A, T, or C). The latter does not allow direct inheritance of methylation during DNA replication due to lack of symmetry, and methylation must therefore be re-established every cell generation. Genome-wide association studies (GWAS) have previously shown that CMT2 and NRPE1 are major determinants of genome-wide patterns of TE CHH-methylation. Here we instead focus on CHH-methylation of individual TEs and TE-families, allowing us to identify the pathways involved in CHH-methylation simply from natural variation and confirm the associations by comparing them with mutant phenotypes. Methylation at TEs targeted by the RNA-directed DNA methylation (RdDM) pathway is unaffected by CMT2 variation, but is strongly affected by variation at NRPE1, which is largely responsible for the longitudinal cline in this phenotype. In contrast, CMT2-targeted TEs are affected by both loci, which jointly explain 7.3% of the phenotypic variation (13.2% of total genetic effects). There is no longitudinal pattern for this phenotype, however, because the geographic patterns appear to compensate for each other in a pattern suggestive of stabilizing selection.Author SummaryDNA methylation is a major component of transposon silencing, and essential for genomic integrity. Recent studies revealed large-scale geographic variation as well as the existence of major trans-acting polymorphisms that partly explained this variation. In this study, we re-analyze previously published data (The 1001 Epigenomes), focusing on de novo DNA methylation patterns of individual TEs and TE families rather than on genome-wide averages (as was done in previous studies). GWAS of the patterns reveals the underlying regulatory networks, and allowed us to comprehensively characterize trans-regulation of de novo DNA methylation and its role in the striking geographic pattern for this phenotype.

Download Full-text

Taking the Wheel – de novo DNA Methylation as a Driving Force of Plant Embryonic Development

Frontiers in Plant Science ◽

10.3389/fpls.2021.764999 ◽

2021 ◽

Vol 12 ◽

Author(s):

Lucija Markulin ◽

Andreja Škiljaica ◽

Mirta Tokić ◽

Mateja Jagić ◽

Tamara Vuk ◽

...

Keyword(s):

Dna Methylation ◽

Somatic Embryogenesis ◽

Embryonic Development ◽

Large Scale ◽

De Novo ◽

Current Knowledge ◽

Early Embryo ◽

Egg Cell ◽

Zygotic Embryogenesis ◽

Transcription Control

During plant embryogenesis, regardless of whether it begins with a fertilized egg cell (zygotic embryogenesis) or an induced somatic cell (somatic embryogenesis), significant epigenetic reprogramming occurs with the purpose of parental or vegetative transcript silencing and establishment of a next-generation epigenetic patterning. To ensure genome stability of a developing embryo, large-scale transposon silencing occurs by an RNA-directed DNA methylation (RdDM) pathway, which introduces methylation patterns de novo and as such potentially serves as a global mechanism of transcription control during developmental transitions. RdDM is controlled by a two-armed mechanism based around the activity of two RNA polymerases. While PolIV produces siRNAs accompanied by protein complexes comprising the methylation machinery, PolV produces lncRNA which guides the methylation machinery toward specific genomic locations. Recently, RdDM has been proposed as a dominant methylation mechanism during gamete formation and early embryo development in Arabidopsis thaliana, overshadowing all other methylation mechanisms. Here, we bring an overview of current knowledge about different roles of DNA methylation with emphasis on RdDM during plant zygotic and somatic embryogenesis. Based on published chromatin immunoprecipitation data on PolV binding sites within the A. thaliana genome, we uncover groups of auxin metabolism, reproductive development and embryogenesis-related genes, and discuss possible roles of RdDM at the onset of early embryonic development via targeted methylation at sites involved in different embryogenesis-related developmental mechanisms.

Download Full-text

DNA methylation-calling tools for Oxford Nanopore sequencing: a survey and human epigenome-wide evaluation

Genome Biology ◽

10.1186/s13059-021-02510-z ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Yang Liu ◽

Wojciech Rosikiewicz ◽

Ziwei Pan ◽

Nathaniel Jillette ◽

Ping Wang ◽

...

Keyword(s):

Dna Methylation ◽

Single Molecule ◽

Evaluation Criteria ◽

Systematic Evaluation ◽

Whole Genome ◽

Nanopore Sequencing ◽

Sequencing Data ◽

Long Read ◽

Genome Scale ◽

Analytical Tools

Abstract Background Nanopore long-read sequencing technology greatly expands the capacity of long-range, single-molecule DNA-modification detection. A growing number of analytical tools have been developed to detect DNA methylation from nanopore sequencing reads. Here, we assess the performance of different methylation-calling tools to provide a systematic evaluation to guide researchers performing human epigenome-wide studies. Results We compare seven analytic tools for detecting DNA methylation from nanopore long-read sequencing data generated from human natural DNA at a whole-genome scale. We evaluate the per-read and per-site performance of CpG methylation prediction across different genomic contexts, CpG site coverage, and computational resources consumed by each tool. The seven tools exhibit different performances across the evaluation criteria. We show that the methylation prediction at regions with discordant DNA methylation patterns, intergenic regions, low CG density regions, and repetitive regions show room for improvement across all tools. Furthermore, we demonstrate that 5hmC levels at least partly contribute to the discrepancy between bisulfite and nanopore sequencing. Lastly, we provide an online DNA methylation database (https://nanome.jax.org) to display the DNA methylation levels detected by nanopore sequencing and bisulfite sequencing data across different genomic contexts. Conclusions Our study is the first systematic benchmark of computational methods for detection of mammalian whole-genome DNA modifications in nanopore sequencing. We provide a broad foundation for cross-platform standardization and an evaluation of analytical tools designed for genome-scale modified base detection using nanopore sequencing.

Download Full-text

Expanding the scope of plant genome engineering with Cas12a orthologs and highly multiplexable editing systems

Nature Communications ◽

10.1038/s41467-021-22330-w ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Yingxiao Zhang ◽

Qiurong Ren ◽

Xu Tang ◽

Shishi Liu ◽

Aimee A. Malzahn ◽

...

Keyword(s):

Large Scale ◽

Genome Engineering ◽

Plant Genome ◽

Gene Repression ◽

Wild Type ◽

Editing Efficiency ◽

Editing Activity ◽

Single Transcript ◽

Genomic Regions ◽

Crop Genome

AbstractCRISPR-Cas12a is a promising genome editing system for targeting AT-rich genomic regions. Comprehensive genome engineering requires simultaneous targeting of multiple genes at defined locations. Here, to expand the targeting scope of Cas12a, we screen nine Cas12a orthologs that have not been demonstrated in plants, and identify six, ErCas12a, Lb5Cas12a, BsCas12a, Mb2Cas12a, TsCas12a and MbCas12a, that possess high editing activity in rice. Among them, Mb2Cas12a stands out with high editing efficiency and tolerance to low temperature. An engineered Mb2Cas12a-RVRR variant enables editing with more relaxed PAM requirements in rice, yielding two times higher genome coverage than the wild type SpCas9. To enable large-scale genome engineering, we compare 12 multiplexed Cas12a systems and identify a potent system that exhibits nearly 100% biallelic editing efficiency with the ability to target as many as 16 sites in rice. This is the highest level of multiplex edits in plants to date using Cas12a. Two compact single transcript unit CRISPR-Cas12a interference systems are also developed for multi-gene repression in rice and Arabidopsis. This study greatly expands the targeting scope of Cas12a for crop genome engineering.

Download Full-text

Reconstructing the Gigabase Plant Genome of Solanum pennellii using Nanopore Sequencing

10.1101/129148 ◽

2017 ◽

Cited By ~ 2

Author(s):

Maximilian H.-W. Schmidt ◽

Alxander Vogel ◽

Alisandra K. Denton ◽

Benjamin Istace ◽

Alexandra Wormit ◽

...

Keyword(s):

Error Rate ◽

De Novo ◽

Sequence Data ◽

Fragment Size ◽

Plant Genome ◽

Nanopore Sequencing ◽

Sequencing Data ◽

Sequencing Technology ◽

Solanum Pennellii ◽

Wild Tomato Species

Recent updates in sequencing technology have made it possible to obtain Gigabases of sequence data from one single flowcell. Prior to this update, the nanopore sequencing technology was mainly used to analyze and assemble microbial samples1-3. Here, we describe the generation of a comprehensive nanopore sequencing dataset with a median fragment size of 11,979 bp for the wild tomato species Solanum pennellii featuring an estimated genome size of ca 1.0 to 1.1 Gbases. We describe its genome assembly to a contig N50 of 2.5 MB using a pipeline comprising a Canu4 pre-processing and a subsequent assembly using SMARTdenovo. We show that the obtained nanopore based de novo genome reconstruction is structurally highly similar to that of the reference S. pennellii LA7165 genome but has a high error rate caused mostly by deletions in homopolymers. After polishing the assembly with Illumina short read data we obtained an error rate of <0.02 % when assessed versus the same Illumina data. More importantly however we obtained a gene completeness of 96.53% which even slightly surpasses that of the reference S. pennellii genome5. Taken together our data indicate such long read sequencing data can be used to affordably sequence and assemble Gbase sized diploid plant genomes.Raw data is available at http://www.plabipd.de/portal/solanum-pennellii and has been deposited as PRJEB19787.

Download Full-text

A de novo DNA Sequencing and Variant Calling Algorithm for Nanopores

10.1101/019448 ◽

2015 ◽

Author(s):

Tamas Szalay ◽

Jene A Golovchenko

Keyword(s):

Single Molecule ◽

Statistical Models ◽

De Novo ◽

Variant Calling ◽

High Accuracy ◽

Nanopore Sequencing ◽

M13 Bacteriophage ◽

Assembly Pipeline ◽

Calling Algorithm ◽

Novel Algorithm

The single-molecule accuracy of nanopore sequencing has been an area of rapid academic and commercial advancement, but remains insufficient for the de novo analysis of genomes. We introduce here a novel algorithm for the error correction of nanopore data, utilizing statistical models of the physical system in order to obtain high accuracy de novo sequences at a range of coverage depths. We demonstrate the technique by sequencing M13 bacteriophage DNA to 99% accuracy at moderate coverage as well as its use in an assembly pipeline by sequencing λ DNA at a range of coverages. We also show the algorithm’s ability to accurately classify sequence variants at far lower coverage than existing methods.

Download Full-text

DNA Methylation Diversification at the Integrated Organellar DNA-Like Sequence

Genes ◽

10.3390/genes9120602 ◽

2018 ◽

Vol 9 (12) ◽

pp. 602 ◽

Cited By ~ 1

Author(s):

Takanori Yoshida ◽

Yoshiaki Tarutani ◽

Tetsuji Kakutani ◽

Akira Kawabe

Keyword(s):

Dna Methylation ◽

De Novo ◽

Epigenetic Modification ◽

Natural Populations ◽

Nuclear Genome ◽

Amplicon Sequencing ◽

Epigenetic Modifications ◽

Plant Genome ◽

Organellar Dna ◽

Epigenetic Diversity

Plants have a lot of diversity in epigenetic modifications such as DNA methylation in their natural populations or cultivars. Although many studies observing the epigenetic diversity within and among species have been reported, the mechanisms how these variations are generated are still not clear. In addition to the de novo spontaneous epi-mutation, the intra- and inter-specific crossing can also cause a change of epigenetic modifications in their progenies. Here we report an example of diversification of DNA methylation by crossing and succeeding selfing. We traced the inheritance pattern of epigenetic modification during the crossing experiment between two natural strains Columbia (Col), and Landsberg electa (Ler) in model plant Arabidopsis thaliana to observe the inheritance of DNA methylation in two organellar DNA-like sequence regions in the nuclear genome. Because organellar DNA integration to the nuclear genome is common in flowering plants and these sequences are occasionally methylated, such DNA could be the novel source of plant genome evolution. The amplicon sequencing, using bisulfite-converted DNA and a next-generation auto-sequencer, was able to efficiently track the heredity of DNA methylation in F1 and F2 populations. One region showed hypomethylation in the F1 population and succeeding elevation of DNA methylation with large variance in the F2 population. The methylation level of Col and Ler alleles in F2 heterozygotes showed a significant positive correlation, implying the trans-chromosomal effect on DNA methylation. The results may suggest the possible mechanism causing the natural epigenetic diversity within plant populations.

Download Full-text