scholarly journals Single-molecule real-time sequencing identifies massive full-length cDNAs and alternative-splicing events that facilitate comparative and functional genomics study in the hexaploid crop sweet potato

PeerJ ◽  
2019 ◽  
Vol 7 ◽  
pp. e7933 ◽  
Author(s):  
Na Ding ◽  
Huihui Cui ◽  
Ying Miao ◽  
Jun Tang ◽  
Qinghe Cao ◽  
...  

Background Sweet potato (Ipomoea batatas (L.) Lam.) is one of the most important crops in many developing countries and provides a candidate source of bioenergy. However, neither a complete reference genome nor large-scale full-length cDNA sequences for this outcrossing hexaploid crop are available, which in turn impedes progress in research studies in I. batatas functional genomics and molecular breeding. Methods In this study, we sequenced full-length transcriptomes in I. batatas and its diploid ancestor I. trifida by single-molecule real-time sequencing and Illumina second-generation sequencing technologies. With the generated datasets, we conducted comprehensive intraspecific and interspecific sequence analyses and experimental characterization. Results A total of 53,861/51,184 high-quality long-read transcripts were obtained, which covered about 10,439/10,452 loci in the I. batatas/I. trifida genome. These datasets enabled us to predict open reading frames successfully in 96.83%/96.82% of transcripts and identify 34,963/33,637 full-length cDNA sequences, 1,401/1,457 transcription factors, 25,315/27,090 simple sequence repeats, 1,656/1,389 long non-coding RNAs, and 5,251/8,901 alternative splicing events. Approximately, 32.34%/38.54% of transcripts and 46.22%/51.18% multi-exon transcripts underwent alternative splicing in I. batatas/I. trifida. Moreover, we validated one alternative splicing event in each of 10 genes and identified tuberous-root-specific expressed isoforms from a starch-branching enzyme, an alpha-glucan phosphorylase, a neutral invertase, and several ABC transporters. Overall, the collection and analysis of large-scale long-read transcripts generated in this study will serve as a valuable resource for the I. batatas research community, which may accelerate the progress in its structural, functional, and comparative genomics studies.

DNA Research ◽  
2019 ◽  
Vol 26 (4) ◽  
pp. 301-311 ◽  
Author(s):  
Yue Zhang ◽  
Tonny Maraga Nyong'A ◽  
Tao Shi ◽  
Pingfang Yang

Abstract Alternative splicing (AS) plays a critical role in regulating different physiological and developmental processes in eukaryotes, by dramatically increasing the diversity of the transcriptome and the proteome. However, the saturation and complexity of AS remain unclear in lotus due to its limitation of rare obtainment of full-length multiple-splice isoforms. In this study, we apply a hybrid assembly strategy by combining single-molecule real-time sequencing and Illumina RNA-seq to get a comprehensive insight into the lotus transcriptomic landscape. We identified 211,802 high-quality full-length non-chimeric reads, with 192,690 non-redundant isoforms, and updated the lotus reference gene model. Moreover, our analysis identified a total of 104,288 AS events from 16,543 genes, with alternative 3ʹ splice-site being the predominant model, following by intron retention. By exploring tissue datasets, 370 tissue-specific AS events were identified among 12 tissues. Both the tissue-specific genes and isoforms might play important roles in tissue or organ development, and are suitable for ‘ABCE’ model partly in floral tissues. A large number of AS events and isoform variants identified in our study enhance the understanding of transcriptional diversity in lotus, and provide valuable resource for further functional genomic studies.


2017 ◽  
Author(s):  
Yonghai Luo ◽  
Na Ding ◽  
Xuan Shi ◽  
Yunxiang Wu ◽  
Ruyuan Wang ◽  
...  

AbstractSweetpotato [Ipomoea batatas (L.) Lam.] is one of the most important crops in many developing countries and provides a candidate source of bioenergy. However, neither high-quality reference genome nor large-scale full-length cDNA sequences for this outcrossing hexaploid are still lacking, which in turn impedes progress in research studies in sweetpotato functional genomics and molecular breeding. In this study, we apply a combination of second- and third-generation sequencing technologies to sequence full-length transcriptomes in sweetpotato and its putative ancestor I. trifida. In total, we obtained 53,861/51,184 high-quality transcripts, which includes 34,963/33,637 putative full-length cDNA sequences, from sweetpotato/I. trifida. Amongst, we identified 104,540/94,174 open reading frames, 1476/1475 transcription factors, 25,315/27,090 simple sequence repeats, 417/531 long non-coding RNAs out of the sweetpotato/I. trifida dataset. By utilizing public available genomic contigs, we analyzed the gene features (including exon number, exon size, intron number, intron size, exon-intron structure) of 33,119 and 32,793 full-length transcripts in sweetpotato and I. trifida, respectively. Furthermore, comparative analysis between our transcript datasets and other large-scale cDNA datasets from different plant species enables us assessing the quality of public datasets, estimating the genetic similarity across relative species, and surveyed the evolutionary pattern of genes. Overall, our study provided fundamental resources of large-scale full-length transcripts in sweetpotato and its putative ancestor, for the first time, and would facilitate structural, functional and comparative genomics studies in this important crop.


2021 ◽  
Author(s):  
Jing Song ◽  
Ping Li ◽  
De-Long Guan ◽  
Yan Sun

Abstract Although leeches are of great medical and economic value in anticoagulant therapy, full-length transcriptomes for leeches remain scarce. Here, we generated the first full-length transcriptome for the paddy leech Whitmania pigra (the most widely utilized medical leech in Chinese traditional medicine) through Pacific Biosciences (Pacbio) single-molecule long-read sequencing. A total of 191,676 full-length non-chimeric (FLNC) reads were obtained, 30,660 were high-quality unique full-length transcripts. The BUSCO (Bench-marking Universal Single-Copy Orthologues) accession of completeness demonstrated that 74.8% of BUSCOs were complete. We functionally annotated 28,144 transcripts were in public databases, including NR, gene ontology (GO), Pfam, etc. Furthermore, 1,314 long non-coding RNAs (LncRNAs), 2,574 alternative splicing (AS) events, 932 transcript factors (TFs), and 33,258 simple sequence repeats (SSRs) we identified across all transcripts. From the generated data, a total of 426 anticoagulant genes, including 122 Antistasins, 124 with the Fibrinogen beta and gamma chains, and 62 Kazal-type serine protease inhibitors were screened out. Twenty-five novel proteins were revealed following the evaluation of the annotations and products of these anticoagulant transcripts. The regulation network between LncRNAs and corresponding coding transcripts was found with the typical mang-to-many pattern, especially obvious in a specific type of protein, Guamerin. Collectively, the present findings provide a rich set of full-length cDNA sequences for W. pigra, which will greatly facilitate research on transcriptomic genetic for this species and leeches.


2020 ◽  
Author(s):  
Yanping Long ◽  
Zhijian Liu ◽  
Jinbu Jia ◽  
Weipeng Mo ◽  
Liang Fang ◽  
...  

AbstractThe broad application of large-scale single-cell RNA profiling in plants has been restricted by the prerequisite of protoplasting. We recently found that the Arabidopsis nucleus contains abundant polyadenylated mRNAs, many of which are incompletely spliced. To capture the isoform information, we combined 10x Genomics and Nanopore long-read sequencing to develop a protoplasting-free full-length single-nucleus RNA profiling method in plants. Our results demonstrated using Arabidopsis root that nuclear mRNAs faithfully retain cell identity information, and single-molecule full-length RNA sequencing could further improve cell type identification by revealing splicing status and alternative polyadenylation at single-cell level.


2021 ◽  
Author(s):  
Zhi-hong Fang ◽  
Yonghong Shi ◽  
Xin-ming Wu ◽  
Bin-lin Ren ◽  
Yan Zhang ◽  
...  

Abstract Background: Medicago sativa L. (M. sativa L.) is a legume with high salt tolerance and a major forage crop with high biomass production. However, the large-scale full-length cDNA sequences of M. sativa L. in response to abiotic stress remain unclear. Results: We provided the complete transcriptome for M. sativa L. roots under different abiotic stressors using a combination of single-molecule real-time sequencing and next generation sequencing. Our results indicated that there were 21.53 Gb clean reads, which consisted of 566,076 insert reads and 409,291 full-length non-chimeric reads. We obtained 194,286 consistent transcripts based on a cluster analysis of full-length reads, and 41,248 high quality transcript sequences based on non-full-length reads. After correction using second-generation data for third-generation low-quality data, we obtained 81,017 transcript sequences according to a cogent analysis. The sequence structural analysis acquired 33,058 simple sequence repeats and 42,725 complete coding sequence regions. In addition, 77,221 transcripts were annotated by eight functional databases; 3,043 lncRNAs were predicted and 4,971 alternative splicings were acquired. Moreover, we confirmed the levels of highly differentially expressed transcripts (ADH1, PEPC, MJG19.6, PCKA and GAPC1) in M. sativa L. roots under NaCl and polyethylene glycol stress. Conclusions: Therefore, we fully and massively exposed the full-length transcripts related to abiotic stress in M. sativa L., which will lay the foundation for understanding gene regulation in M. sativa L. under abiotic stress.


2020 ◽  
Vol 41 (Supplement_2) ◽  
Author(s):  
D Oehler ◽  
A Goedecke ◽  
A Spychala ◽  
K Lu ◽  
N Gerdes ◽  
...  

Abstract Background Alternative splicing is a process by which exons within a pre-mRNA are joined or skipped, resulting in isoforms being encoded by a single gene. Alternative Splicing affecting transcription factors may have substantial impact on cellular dynamics. The PPARG Coactivator 1 Alpha (PGC1-α), is a major modulator in energy metabolism. Data from murine skeletal muscle revealed distinctive isoform patterns giving rise to different phenotypes, i.e. mitogenesis and hypertrophy. Here, we aimed to establish a complete dataset of isoforms in murine and human heart applying single-molecule real-time (SMRT)-sequencing as novel approach to identify transcripts without need for assembly, resulting in true full-length sequences. Moreover, we aimed to unravel functional relevance of the various isoforms during experimental ischemia reperfusion (I/R). Methods RNA-Isolation was performed in murine (C57Bl/6J) or human heart tissue (obtained during LVAD-surgery), followed by library preparation and SMRT-Sequencing. Bioinformatic analysis was done using a modified IsoSeq3-Pipeline and OS-tools. Identification of PGC1-α isoforms was fulfilled by similarity search against exonic sequences within the full-length, non-concatemere (FLNC) reads. Isoforms with Open-Reading-Frame (ORF) were manually curated and validated by PCR and Sanger-Sequencing. I/R was induced by ligature of the LAD for 45 min in mice on standard chow as well as on high-fat-high-sucrose diet. Area At Risk (AAR) and remote tissue were collected three and 16 days after I/R or sham-surgery (n=4 per time point). Promotor patterns were analyzed by qPCR. Results Deciphering the full-length transcriptome of murine and human heart resulted in ∼60000 Isoforms with 99% accuracy on mRNA-sequence. Focusing on murine PGC1-α-isoforms we discovered and verified 15 novel transcripts generated by hitherto unknown splicing events. Additionally, we identified a novel Exon 1 originating between the known promoters followed by a valid ORF, suggesting the discovery of a novel promoter. Remarkably, we found a homologous novel Exon1 in human heart, suggesting conservation of the postulated promoter. In I/R the AAR exhibited a significant lower expression of established and novel promoters compared to remote under standard chow 3d post I/R. 16d post I/R, the difference between AAR & Remote equalized in standard chow while remaining under High-Fat-Diet. Conclusion Applying SMRT-technique, we generated the first time a complete full-length-transcriptome of the murine and human heart, identifying 15 novel potentially coding transcripts of PGC1-α and a novel exon 1. These transcripts are differentially regulated in experimental I/R in AAR and remote myocardium, suggesting transcriptional regulation and alternative splicing modulating PGC1-α function in heart. Differences between standard chow and high fat diet suggest impact of impaired glucose metabolism on regulatory processes after myocardial infarction. Funding Acknowledgement Type of funding source: Public grant(s) – National budget only. Main funding source(s): Collaborative Research Centre 1116 (German Research Foundation)


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Yanping Long ◽  
Zhijian Liu ◽  
Jinbu Jia ◽  
Weipeng Mo ◽  
Liang Fang ◽  
...  

AbstractThe broad application of single-cell RNA profiling in plants has been hindered by the prerequisite of protoplasting that requires digesting the cell walls from different types of plant tissues. Here, we present a protoplasting-free approach, flsnRNA-seq, for large-scale full-length RNA profiling at a single-nucleus level in plants using isolated nuclei. Combined with 10x Genomics and Nanopore long-read sequencing, we validate the robustness of this approach in Arabidopsis root cells and the developing endosperm. Sequencing results demonstrate that it allows for uncovering alternative splicing and polyadenylation-related RNA isoform information at the single-cell level, which facilitates characterizing cell identities.


DNA Research ◽  
2017 ◽  
pp. dsw056 ◽  
Author(s):  
Yuko Makita ◽  
Kiaw Kiaw Ng ◽  
G. Veera Singham ◽  
Mika Kawashima ◽  
Hideki Hirakawa ◽  
...  

2021 ◽  
Vol 12 ◽  
Author(s):  
Fiza Liaquat ◽  
Muhammad Farooq Hussain Munis ◽  
Samiah Arif ◽  
Urooj Haroon ◽  
Jianxin Shi ◽  
...  

Schima superba (Theaceae) is a subtropical evergreen tree and is used widely for forest firebreaks and gardening. It is a plant that tolerates salt and typically accumulates elevated amounts of manganese in the leaves. With large ecological amplitude, this tree species grows quickly. Due to its substantial biomass, it has a great potential for soil remediation. To evaluate the thorough framework of the mRNA, we employed PacBio sequencing technology for the first time to generate S. Superba transcriptome. In this analysis, overall, 511,759 full length non-chimeric reads were acquired, and 163,834 high-quality full-length reads were obtained. Overall, 93,362 open reading frames were obtained, of which 78,255 were complete. In gene annotation analyses, the Kyoto Encyclopedia of Genes and Genomes (KEGG), Clusters of Orthologous Genes (COG), Gene Ontology (GO), and Non-Redundant (Nr) databases were allocated 91,082, 71,839, 38,914, and 38,376 transcripts, respectively. To identify long non-coding RNAs (lncRNAs), we utilized four computational methods associated with protein families (Pfam), Cooperative Data Classification (CPC), Coding Assessing Potential Tool (CPAT), and Coding Non-Coding Index (CNCI) databases and observed 8,551, 9,174, 20,720, and 18,669 lncRNAs, respectively. Moreover, nine genes were randomly selected for the expression analysis, which showed the highest expression of Gene 6 (Na_Ca_ex gene), and CAX (CAX-interacting protein 4) was higher in manganese (Mn)-treated group. This work provided significant number of full-length transcripts and refined the annotation of the reference genome, which will ease advanced genetic analyses of S. superba.


Sign in / Sign up

Export Citation Format

Share Document