Structural variation and evolution of chloroplast tRNAs in green algae

PeerJ ◽

10.7717/peerj.11524 ◽

2021 ◽

Vol 9 ◽

pp. e11524

Author(s):

Fangbing Qi ◽

Yajing Zhao ◽

Ningbo Zhao ◽

Kai Wang ◽

Zhonghu Li ◽

...

Keyword(s):

Chloroplast Genome ◽

Green Algae ◽

Sequence Alignment ◽

Structural Variation ◽

Evolutionary Relationship ◽

Multiple Sequence ◽

Evolutionary Patterns ◽

Systematic Analysis ◽

Origin And Evolution ◽

Green Algal

As one of the important groups of the core Chlorophyta (Green algae), Chlorophyceae plays an important role in the evolution of plants. As a carrier of amino acids, tRNA plays an indispensable role in life activities. However, the structural variation of chloroplast tRNA and its evolutionary characteristics in Chlorophyta species have not been well studied. In this study, we analyzed the chloroplast genome tRNAs of 14 species in five categories in the green algae. We found that the number of chloroplasts tRNAs of Chlorophyceae is maintained between 28–32, and the length of the gene sequence ranges from 71 nt to 91 nt. There are 23–27 anticodon types of tRNAs, and some tRNAs have missing anticodons that are compensated for by other types of anticodons of that tRNA. In addition, three tRNAs were found to contain introns in the anti-codon loop of the tRNA, but the analysis scored poorly and it is presumed that these introns are not functional. After multiple sequence alignment, the Ψ-loop is the most conserved structural unit in the tRNA secondary structure, containing mostly U-U-C-x-A-x-U conserved sequences. The number of transitions in tRNA is higher than the number of transversions. In the replication loss analysis, it was found that green algal chloroplast tRNAs may have undergone substantial gene loss during the course of evolution. Based on the constructed phylogenetic tree, mutations were found to accompany the evolution of the Green algae chloroplast tRNA. Moreover, chloroplast tRNAs of Chlorophyceae are consistent with those of monocotyledons and gymnosperms in terms of evolutionary patterns, sharing a common multi-phylogenetic pattern and rooted in a rich common ancestor. Sequence alignment and systematic analysis of tRNA in chloroplast genome of Chlorophyceae, clarified the characteristics and rules of tRNA changes, which will promote the evolutionary relationship of tRNA and the origin and evolution of chloroplast.

Download Full-text

Promising prospects of nanopore sequencing for algal hologenomics and structural variation discovery

BMC Genomics ◽

10.1186/s12864-019-6248-2 ◽

2019 ◽

Vol 20 (1) ◽

Cited By ~ 1

Author(s):

Thomas Sauvage ◽

William E. Schmidt ◽

Hwan Su Yoon ◽

Valerie J. Paul ◽

Suzanne Fredericq

Keyword(s):

Chloroplast Genome ◽

Structural Variation ◽

Rapid Development ◽

Nuclear Genome ◽

Nanopore Sequencing ◽

Homing Endonucleases ◽

Hybrid Assembly ◽

Bacterial Genomes ◽

Current Output ◽

Green Algal

Abstract Background The MinION Access Program (MAP, 2014–2016) allowed selected users to test the prospects of long nanopore reads for diverse organisms and applications through the rapid development of improving chemistries. In 2014, faced with a fragmented Illumina assembly for the chloroplast genome of the green algal holobiont Caulerpa ashmeadii, we applied to the MAP to test the prospects of nanopore reads to investigate such intricacies, as well as further explore the hologenome of this species with native and hybrid approaches. Results The chloroplast genome could only be resolved as a circular molecule in nanopore assemblies, which also revealed structural variants (i.e. chloroplast polymorphism or heteroplasmy). Signal and Illumina polishing of nanopore-assembled organelle genomes (chloroplast and mitochondrion) reflected the importance of coverage on final quality and current limitations. In hybrid assembly, our modest nanopore data sets showed encouraging results to improve assembly length, contiguity, repeat content, and binning of the larger nuclear and bacterial genomes. Profiling of the holobiont with nanopore or Illumina data unveiled a dominant Rhodospirillaceae (Alphaproteobacteria) species among six putative endosymbionts. While very fragmented, the cumulative hybrid assembly length of C. ashmeadii’s nuclear genome reached 24.4 Mbp, including 2.1 Mbp in repeat, ranging closely with GenomeScope’s estimate (> 26.3 Mbp, including 4.8 Mbp in repeat). Conclusion Our findings relying on a very modest number of nanopore R9 reads as compared to current output with newer chemistries demonstrate the promising prospects of the technology for the assembly and profiling of an algal hologenome and resolution of structural variation. The discovery of polymorphic ‘chlorotypes’ in C. ashmeadii, most likely mediated by homing endonucleases and/or retrohoming by reverse transcriptases, represents the first report of chloroplast heteroplasmy in the siphonous green algae. Improving contiguity of C. ashmeadii’s nuclear and bacterial genomes will require deeper nanopore sequencing to greatly increase the coverage of these larger genomic compartments.

Download Full-text

Identification of Polycistronic Transcriptional Units and Non-canonical Introns in Green Algal Chloroplasts Based on Long-read RNA Sequencing Data

10.21203/rs.3.rs-114353/v1 ◽

2020 ◽

Author(s):

Xiaoxiao Zou ◽

Heroen Verbruggen ◽

Tianjingwei Li ◽

Jun Zhu ◽

Zuo Chen ◽

...

Keyword(s):

Chloroplast Genome ◽

Green Algae ◽

Higher Plants ◽

Green Algal ◽

Chloroplast Genomes ◽

Caulerpa Lentillifera ◽

Transcriptional Units ◽

Group Ii ◽

Distinct Features ◽

Algal Chloroplast

Abstract Background: Chloroplasts are important semi-autonomous organelles in plants and algae. Unlike higher plants, the chloroplast genomes of green algal linage have distinct features both in organization and expression. Despite the architecture of chloroplast genome have been extensively studied in higher plants and several model species of algae, little is known about transcriptional features in green algal lineages. Results: Based on full-length cDNA (Iso-Seq) sequencing, we identified widely co-transcribed polycistronic transcriptional units (PTUs) in the green alga Caulerpa lentillifera. In addition to clusters of genes from the same pathway, we identified a series of PTUs of up to nine genes whose function in the plastid is not understood. The RNA data further allowed us to confirm widespread expression of fragmented genes and conserved open reading frames, which are both important features in green algal chloroplast genomes. In addition, a newly fragmented gene specific to C. lentillifera was discovered, which may represent a recent gene fragmentation event in chloroplast genome.Taking the accurate exon-intron boundary information, gene structural annotation was greatly improved across the siphonous green algae lineages. Our data also revealed a type of non-canonical Group II introns, with a deviant secondary structure and intronic ORFs lacking known splicing or mobility domains. These widespread introns have conserved positions in their genes and are excised precisely despite lacking clear consensus intron boundaries.Conclusion: Our study fills important knowledge gaps in chloroplast genome organization and transcription in green algae, and providing new insights into expression of polycistronic transcripts, freestanding ORFs and fragmented genes in algal chloroplast genomes. Moreover, we revealed an unusual type of Group II intron with distinct features and conserved positions in Bryopsidales. Our data represents interesting additions to knowledge of chloroplast intron structure and highlights clusters of uncharacterized genes that probably play important roles in plastid.

Download Full-text

A Comparative Genomic and Phylogenetic Analysis of the Origin and Evolution of the CCN Gene Family

BioMed Research International ◽

10.1155/2019/8620878 ◽

2019 ◽

Vol 2019 ◽

pp. 1-12 ◽

Cited By ~ 1

Author(s):

Kuan Hu ◽

Yiming Tao ◽

Juanni Li ◽

Zhuang Liu ◽

Xinyan Zhu ◽

...

Keyword(s):

Phylogenetic Analysis ◽

Gene Family ◽

Evolutionary Relationship ◽

Evolutionary Process ◽

Structural Features ◽

Comparative Genomic ◽

Sequence Alignments ◽

Multiple Sequence ◽

Origin And Evolution ◽

Functional Sites

CCN gene family members have recently been identified as multifunctional regulators involved in diverse biological functions, especially in vascular and skeletal development. In the present study, a comparative genomic and phylogenetic analysis was performed to show the similarities and differences in structure and function of CCNs from different organisms and to reveal their potential evolutionary relationship. First, CCN homologs of metazoans from different species were identified. Then we made multiple sequence alignments, MEME analysis, and functional sites prediction, which show the highly conserved structural features among CCN metazoans. The phylogenetic tree was further established, and thus CCNs were found undergoing extensive lineage-specific duplication events and lineage-specific expansion during the evolutionary process. Besides, comparative analysis about the genomic organization and chromosomal CCN gene surrounding indicated a clear orthologous relationship among these species counterparts. At last, based on these research results above, a potential evolutionary scenario was generated to overview the origin and evolution of the CCN gene family.

Download Full-text

Performance Evaluation of Leading Protein Multiple Sequence Alignment Methods

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.a1369.109119 ◽

2019 ◽

Vol 9 (1) ◽

pp. 771-776

Keyword(s):

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Evolutionary Relationship ◽

Biological Data ◽

Sequence Alignments ◽

Multiple Sequence ◽

Sequencing Technologies ◽

Benchmark Database ◽

Execution Speed ◽

Protein Multiple Sequence Alignment

Protein Multiple sequence alignment (MSA) is a process, that helps in alignment of more than two protein sequences to establish an evolutionary relationship between the sequences. As part of Protein MSA, the biological sequences are aligned in a way to identify maximum similarities. Over time the sequencing technologies are becoming more sophisticated and hence the volume of biological data generated is increasing at an enormous rate. This increase in volume of data poses a challenge to the existing methods used to perform effective MSA as with the increase in data volume the computational complexities also increases and the speed to process decreases. The accuracy of MSA is another factor critically important as many bioinformatics inferences are dependent on the output of MSA. This paper elaborates on the existing state of the art methods of protein MSA and performs a comparison of four leading methods namely MAFFT, Clustal Omega, MUSCLE and ProbCons based on the speed and accuracy of these methods. BAliBASE version 3.0 (BAliBASE is a repository of manually refined multiple sequence alignments) has been used as a benchmark database and accuracy of alignment methods is computed through the two widely used criteria named Sum of pair score (SPscore) and total column score (TCscore). We also recorded the execution time for each method in order to compute the execution speed.

Download Full-text

LemK_MSA: A Multiple Sequence Alignment Method with Sequence Vectorization Based on Lempel-Ziv

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.284-287.3203 ◽

2013 ◽

Vol 284-287 ◽

pp. 3203-3207 ◽

Cited By ~ 1

Author(s):

Guo Li Ji ◽

Jing Ci Yao ◽

Zi Jiang Yang ◽

Cong Ting Ye

Keyword(s):

Sequence Alignment ◽

High Throughput ◽

Multiple Sequence Alignment ◽

Clustering Analysis ◽

Large Scale ◽

Evolutionary Relationship ◽

Structural Features ◽

Multiple Sequence ◽

Guide Tree ◽

Mouse Antibody

In this paper, we propose a method for multiple sequence alignment, LemK_MSA, which integrates Lempel-Ziv based sequence vectorization and k-means clustering analysis. LemK_MSA converts multiple sequence alignment into corresponding 10-dimensional vector alignment by 10 types of copy modes. Then it uses k-means algorithm and NJ algorithm to divide the sequences into several groups and calculate guide tree of each part with the vectors of the sequences. A complete guide tree for multiple sequence alignment could be constructed by merging guide tree of every group. Thus, the time efficiency of processing multiple sequence alignment, especially for large-scale sequences, can be improved. The high-throughput mouse antibody sequences are used to validate the proposed method. Compared to ClustalW, MAFFT and Mbed, LemK_MSA is more than ten times efficient while ensuring the alignment accuracy at the same time. LemK_MSA also provides an effective method to analyze the evolutionary relationship and structural features among high-throughput sequences.

Download Full-text

Identification of polycistronic transcriptional units and non-canonical introns in green algal chloroplasts based on long-read RNA sequencing data

BMC Genomics ◽

10.1186/s12864-021-07598-y ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Xiaoxiao Zou ◽

Heroen Verbruggen ◽

Tianjingwei Li ◽

Jun Zhu ◽

Zou Chen ◽

...

Keyword(s):

Chloroplast Genome ◽

Green Algae ◽

Higher Plants ◽

Green Algal ◽

Chloroplast Genomes ◽

Caulerpa Lentillifera ◽

Transcriptional Units ◽

Group Ii ◽

Distinct Features ◽

Algal Chloroplast

Abstract Background Chloroplasts are important semi-autonomous organelles in plants and algae. Unlike higher plants, the chloroplast genomes of green algal linage have distinct features both in organization and expression. Despite the architecture of chloroplast genome having been extensively studied in higher plants and several model species of algae, little is known about the transcriptional features of green algal chloroplast-encoded genes. Results Based on full-length cDNA (Iso-Seq) sequencing, we identified widely co-transcribed polycistronic transcriptional units (PTUs) in the green alga Caulerpa lentillifera. In addition to clusters of genes from the same pathway, we identified a series of PTUs of up to nine genes whose function in the plastid is not understood. The RNA data further allowed us to confirm widespread expression of fragmented genes and conserved open reading frames, which are both important features in green algal chloroplast genomes. In addition, a newly fragmented gene specific to C. lentillifera was discovered, which may represent a recent gene fragmentation event in the chloroplast genome. With the newly annotated exon-intron boundary information, gene structural annotation was greatly improved across the siphonous green algae lineages. Our data also revealed a type of non-canonical Group II introns, with a deviant secondary structure and intronic ORFs lacking known splicing or mobility domains. These widespread introns have conserved positions in their genes and are excised precisely despite lacking clear consensus intron boundaries. Conclusion Our study fills important knowledge gaps in chloroplast genome organization and transcription in green algae, and provides new insights into expression of polycistronic transcripts, freestanding ORFs and fragmented genes in algal chloroplast genomes. Moreover, we revealed an unusual type of Group II intron with distinct features and conserved positions in Bryopsidales. Our data represents interesting additions to knowledge of chloroplast intron structure and highlights clusters of uncharacterized genes that probably play important roles in plastids.

Download Full-text

Multiple Sequence Alignment and Profile Analysis of Protein Family Utsing Hidden Markov Model

International Journal of Scientific Research ◽

10.15373/22778179/june2013/66 ◽

2012 ◽

Vol 2 (6) ◽

pp. 208-211

Author(s):

Navjot Kaur ◽

◽

Rajbir Singh Cheema ◽

Harmandeep Singh Harmandeep Singh

Keyword(s):

Markov Model ◽

Hidden Markov Model ◽

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Profile Analysis ◽

Hidden Markov ◽

Protein Family ◽

Multiple Sequence

Download Full-text

Faculty Opinions recommendation of MAFFT online service: multiple sequence alignment, interactive sequence choice and visualization.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.731078852.793536612 ◽

2017 ◽

Author(s):

Feng Gao

Keyword(s):

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Online Service ◽

Multiple Sequence

Download Full-text

Computational Analysis of Therapeutic Enzyme Uricase from Different Source Organisms

Current Proteomics ◽

10.2174/1570164616666190617165107 ◽

2020 ◽

Vol 17 (1) ◽

pp. 59-77

Author(s):

Anand Kumar Nelapati ◽

JagadeeshBabu PonnanEttiyappan

Keyword(s):

Uric Acid ◽

Amino Acid ◽

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Protein Sequences ◽

Amino Acid Sequences ◽

Amino Acid Residues ◽

Multiple Sequence ◽

Physiochemical Properties ◽

Pharmaceutical Industries

Background:Hyperuricemia and gout are the conditions, which is a response of accumulation of uric acid in the blood and urine. Uric acid is the product of purine metabolic pathway in humans. Uricase is a therapeutic enzyme that can enzymatically reduces the concentration of uric acid in serum and urine into more a soluble allantoin. Uricases are widely available in several sources like bacteria, fungi, yeast, plants and animals.Objective:The present study is aimed at elucidating the structure and physiochemical properties of uricase by insilico analysis.Methods:A total number of sixty amino acid sequences of uricase belongs to different sources were obtained from NCBI and different analysis like Multiple Sequence Alignment (MSA), homology search, phylogenetic relation, motif search, domain architecture and physiochemical properties including pI, EC, Ai, Ii, and were performed.Results:Multiple sequence alignment of all the selected protein sequences has exhibited distinct difference between bacterial, fungal, plant and animal sources based on the position-specific existence of conserved amino acid residues. The maximum homology of all the selected protein sequences is between 51-388. In singular category, homology is between 16-337 for bacterial uricase, 14-339 for fungal uricase, 12-317 for plants uricase, and 37-361 for animals uricase. The phylogenetic tree constructed based on the amino acid sequences disclosed clusters indicating that uricase is from different source. The physiochemical features revealed that the uricase amino acid residues are in between 300- 338 with a molecular weight as 33-39kDa and theoretical pI ranging from 4.95-8.88. The amino acid composition results showed that valine amino acid has a high average frequency of 8.79 percentage compared to different amino acids in all analyzed species.Conclusion:In the area of bioinformatics field, this work might be informative and a stepping-stone to other researchers to get an idea about the physicochemical features, evolutionary history and structural motifs of uricase that can be widely used in biotechnological and pharmaceutical industries. Therefore, the proposed in silico analysis can be considered for protein engineering work, as well as for gout therapy.

Download Full-text

LegumeDB: Development of Legume Medicinal Plant Database and Comparative Molecular Evolutionary Analysis of matK Proteins of Legumes and Mangroves

Current Nutrition & Food Science ◽

10.2174/1573401314666180223143523 ◽

2019 ◽

Vol 15 (4) ◽

pp. 353-362

Author(s):

Sambhaji B. Thakar ◽

Maruti J. Dhanavade ◽

Kailas D. Sonawane

Keyword(s):

Phylogenetic Analysis ◽

Medicinal Plants ◽

Homology Modeling ◽

Sequence Alignment ◽

Vigna Unguiculata ◽

Multiple Sequence Alignment ◽

Legume Species ◽

Mangrove Species ◽

Multiple Sequence ◽

Thespesia Populnea

Background: Legume plants are known for their rich medicinal and nutritional values. Large amount of medicinal information of various legume plants have been dispersed in the form of text. Objective: It is essential to design and construct a legume medicinal plants database, which integrate respective classes of legumes and include knowledge regarding medicinal applications along with their protein/enzyme sequences. Methods: The design and development of Legume Medicinal Plants Database (LegumeDB) has been done by using Microsoft Structure Query Language Server 2017. DBMS was used as back end and ASP.Net was used to lay out front end operations. VB.Net was used as arranged program for coding. Multiple sequence alignment, phylogenetic analysis and homology modeling techniques were also used. Results: This database includes information of 50 Legume medicinal species, which might be helpful to explore the information for researchers. Further, maturase K (matK) protein sequences of legumes and mangroves were retrieved from NCBI for multiple sequence alignment and phylogenetic analysis to understand evolutionary lineage between legumes and mangroves. Homology modeling technique was used to determine three-dimensional structure of matK from Legume species i.e. Vigna unguiculata using matK of mangrove species, Thespesia populnea as a template. The matK sequence analysis results indicate the conserved residues among legume and mangrove species. Conclusion: Phylogenetic analysis revealed closeness between legume species Vigna unguiculata and mangrove species Thespesia populnea to each other, indicating their similarity and origin from common ancestor. Thus, these studies might be helpful to understand evolutionary relationship between legumes and mangroves. : LegumeDB availability: http://legumedatabase.co.in

Download Full-text