Analysis of the genomic basis of functional diversity in dinoflagellates using a transcriptome-based sequence similarity network

Arnaud Meng; Erwan Corre; Ian Probert; Andres Gutierrez-Rodriguez; Raffaele Siano; Anita Annamale; Adriana Alberti; Corinne Da Silva; Patrick Wincker; Stéphane Le Crom; Fabrice Not; Lucie Bittner

doi:10.1111/mec.14579

Analysis of the Genomic Basis of Functional Diversity in Dinoflagellates using a Transcriptome-Based Sequence Similarity Network

10.1101/211243 ◽

2017 ◽

Author(s):

Arnaud Meng ◽

Erwan Corre ◽

Ian Probert ◽

Andres Gutierrez-Rodriguez ◽

Raffaele Siano ◽

...

Keyword(s):

High Throughput Sequencing ◽

De Novo ◽

Sequence Similarity ◽

Connected Components ◽

Similarity Network ◽

Network Analyses ◽

Comprehensive Picture ◽

Core Proteome ◽

Functional Features ◽

Genomic Basis

ABSTRACTDinoflagellates are one of the most abundant and functionally diverse groups of eukaryotes. Despite an overall scarcity of genomic information for dinoflagellates, constantly emerging high-throughput sequencing resources can be used to characterize and compare these organisms. We assembled de novo and processed 46 dinoflagellate transcriptomes and used a sequence similarity network (SSN) to compare the underlying genomic basis of functional features within the group. This approach constitutes the most comprehensive picture to date of the genomic potential of dinoflagellates. A core proteome composed of 252 connected components (CCs) of putative conserved protein domains (pCDs) was identified. Of these, 206 were novel and 16 lacked any functional annotation in public databases. Integration of functional information in our network analyses allowed investigation of pCDs specifically associated to functional traits. With respect to toxicity, sequences homologous to those of proteins involved in toxin biosynthesis pathways (e.g. sxtA1-4 and sxtG) were not specific to known toxin-producing species. Although not fully specific to symbiosis, the most represented functions associated with proteins involved in the symbiotic trait were related to membrane processes and ion transport. Overall, our SSN approach led to identification of 45,207 and 90,794 specific and constitutive pCDs of respectively the toxic and symbiotic species represented in our analyses. Of these, 56% and 57% respectively (i.e. 25,393 and 52,193 pCDs) completely lacked annotation in public databases. This stresses the extent of our lack of knowledge, while emphasizing the potential of SSNs to identify candidate pCDs for further functional genomic characterization.

Download Full-text

Siblings or doppelgängers? Deciphering the evolution of structured cis-regulatory RNAs beyond homology

Biochemical Society Transactions ◽

10.1042/bst20191060 ◽

2020 ◽

Vol 48 (5) ◽

pp. 1941-1951

Author(s):

Elizabeth C. Gray ◽

Daniel M. Beringer ◽

Michelle M. Meyer

Keyword(s):

Functional Diversity ◽

Sequence Similarity ◽

Functional Validation ◽

Rna Sequences ◽

Full Spectrum ◽

Regulatory Molecule ◽

Mrna Transcripts ◽

Domains Of Life ◽

Regulatory Rnas

Structured cis-regulatory RNAs have evolved across all domains of life, highlighting the utility and plasticity of RNA as a regulatory molecule. Homologous RNA sequences and structures often have similar functions, but homology may also be deceiving. The challenges that derive from trying to assign function to structure and vice versa are not trivial. Bacterial riboswitches, viral and eukaryotic IRESes, CITEs, and 3′ UTR elements employ an array of mechanisms to exert their effects. Bioinformatic searches coupled with biochemical and functional validation have elucidated some shared and many unique ways cis-regulators are employed in mRNA transcripts. As cis-regulatory RNAs are resolved in greater detail, it is increasingly apparent that shared homology can mask the full spectrum of mRNA cis-regulator functional diversity. Furthermore, similar functions may be obscured by lack of obvious sequence similarity. Thus looking beyond homology is crucial for furthering our understanding of RNA-based regulation.

Download Full-text

Visualizing the superfamily of metallo-β-lactamases through sequence similarity network neighborhood connectivity analysis

10.1101/2020.04.16.045138 ◽

2020 ◽

Author(s):

Javier M. González

Keyword(s):

Redox Reactions ◽

Sequence Similarity ◽

Phylogenetic Reconstruction ◽

Transition Metal Ions ◽

Sequence Annotation ◽

Connectivity Analysis ◽

Topological Properties ◽

Structural Homology ◽

Similarity Network ◽

Domains Of Life

ABSTRACTThe superfamily of metallo-β-lactamases (MBL) comprises an ancient group of proteins found in all domains of life, sharing a characteristic αββα fold and a histidine-rich motif for binding of transition metal ions, with the ability to catalyze a variety of hydrolysis and redox reactions. Herein, structural homology and sequence similarity network (SSN) analysis are used to assist the phylogenetic reconstruction of the MBL superfamily, introducing tanglegrams to evaluate structure-function relationships. SSN neighborhood connectivity is applied for spotting protein families within SSN clusters, showing that 98 % of the superfamily remains to be explored experimentally. Further SSN research is suggested in order to determine their topological properties, which will be instrumental for the improvement of automated sequence annotation methods.

Download Full-text

Discovery of novel glycoside hydrolases from C-glycoside-degrading bacteria using sequence similarity network analysis

The Journal of Microbiology ◽

10.1007/s12275-021-1292-4 ◽

2021 ◽

Vol 59 (10) ◽

pp. 931-940

Author(s):

Bin Wei ◽

Ya-Kun Wang ◽

Jin-Biao Yu ◽

Si-Jia Wang ◽

Yan-Lei Yu ◽

...

Keyword(s):

Network Analysis ◽

Sequence Similarity ◽

Glycoside Hydrolases ◽

Similarity Network ◽

Degrading Bacteria

Download Full-text

Sequence similarity network analysis, crystallization, and X-ray crystallographic analysis of the lactate metabolism regulator LldR from Pseudomonas aeruginosa

Bioresources and Bioprocessing ◽

10.1186/s40643-016-0109-5 ◽

2016 ◽

Vol 3 (1) ◽

Author(s):

Bo Xin ◽

Geng Wu ◽

Kunzhi Zhang ◽

Yongxing He ◽

Hongzhi Tang ◽

...

Keyword(s):

Pseudomonas Aeruginosa ◽

Network Analysis ◽

Sequence Similarity ◽

Crystallographic Analysis ◽

Lactate Metabolism ◽

Similarity Network ◽

X Ray

Download Full-text

Clustering and Visualizing of Membrane Proteins Sequence Similarity Network

Biophysical Journal ◽

10.1016/j.bpj.2015.11.2648 ◽

2016 ◽

Vol 110 (3) ◽

pp. 495a

Author(s):

Geng-Ming Hu ◽

Te-Lun Mai ◽

Chi-Ming Chen

Keyword(s):

Membrane Proteins ◽

Sequence Similarity ◽

Similarity Network

Download Full-text

Sequence Similarity Network Reveals Common Ancestry of Multidomain Proteins

PLoS Computational Biology ◽

10.1371/journal.pcbi.1000063 ◽

2008 ◽

Vol 4 (5) ◽

pp. e1000063 ◽

Cited By ~ 45

Author(s):

Nan Song ◽

Jacob M. Joseph ◽

George B. Davis ◽

Dannie Durand

Keyword(s):

Sequence Similarity ◽

Common Ancestry ◽

Multidomain Proteins ◽

Similarity Network

Download Full-text

Sequence similarity network reveals the imprints of major diversification events in the evolution of microbial life

Frontiers in Ecology and Evolution ◽

10.3389/fevo.2014.00072 ◽

2014 ◽

Vol 2 ◽

Cited By ~ 13

Author(s):

Shu Cheng ◽

Slim Karkar ◽

Eric Bapteste ◽

Nathan Yee ◽

Paul Falkowski ◽

...

Keyword(s):

Sequence Similarity ◽

Similarity Network ◽

Microbial Life

Download Full-text

Glycoconjugate pathway connections revealed by sequence similarity network analysis of the monotopic phosphoglycosyl transferases

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.2018289118 ◽

2021 ◽

Vol 118 (4) ◽

pp. e2018289118

Author(s):

Katherine H. O’Toole ◽

Barbara Imperiali ◽

Karen N. Allen

Keyword(s):

Catalytic Domain ◽

Sequence Similarity ◽

Phylogenetic Reconstruction ◽

Biosynthetic Pathways ◽

En Bloc ◽

Similarity Network ◽

Regulatory Domains ◽

The Family ◽

Membrane Interfaces ◽

Domains Of Life

The monotopic phosphoglycosyl transferase (monoPGT) superfamily comprises over 38,000 nonredundant sequences represented in bacterial and archaeal domains of life. Members of the superfamily catalyze the first membrane-committed step in en bloc oligosaccharide biosynthetic pathways, transferring a phosphosugar from a soluble nucleoside diphosphosugar to a membrane-resident polyprenol phosphate. The singularity of the monoPGT fold and its employment in the pivotal first membrane-committed step allows confident assignment of both protein and corresponding pathway. The diversity of the family is revealed by the generation and analysis of a sequence similarity network for the superfamily, with fusion of monoPGTs with other pathway members being the most frequent and extensive elaboration. Three common fusions were identified: sugar-modifying enzymes, glycosyl transferases, and regulatory domains. Additionally, unexpected fusions of the monoPGT with members of the polytopic PGT superfamily were discovered, implying a possible evolutionary link through the shared polyprenol phosphate substrate. Notably, a phylogenetic reconstruction of the monoPGT superfamily shows a radial burst of functionalization, with a minority of members comprising only the minimal PGT catalytic domain. The commonality and identity of the fusion partners in the monoPGT superfamily is consistent with advantageous colocalization of pathway members at membrane interfaces.

Download Full-text

Predicting virus-host association by Kernelized logistic matrix factorization and similarity network fusion

BMC Bioinformatics ◽

10.1186/s12859-019-3082-0 ◽

2019 ◽

Vol 20 (S16) ◽

Cited By ~ 2

Author(s):

Dan Liu ◽

Yingjun Ma ◽

Xingpeng Jiang ◽

Tingting He

Keyword(s):

Matrix Factorization ◽

Sequence Similarity ◽

Rapid Development ◽

Frequency Measurement ◽

Host Association ◽

Similarity Network ◽

Oligonucleotide Frequency ◽

Proposed Model ◽

Better Than ◽

Host Associations

Abstract Background Viruses are closely related to bacteria and human diseases. It is of great significance to predict associations between viruses and hosts for understanding the dynamics and complex functional networks in microbial community. With the rapid development of the metagenomics sequencing, some methods based on sequence similarity and genomic homology have been used to predict associations between viruses and hosts. However, the known virus-host association network was ignored in these methods. Results We proposed a kernelized logistic matrix factorization with integrating different information to predict potential virus-host associations on the heterogeneous network (ILMF-VH) which is constructed by connecting a virus network with a host network based on known virus-host associations. The virus network is constructed based on oligonucleotide frequency measurement, and the host network is constructed by integrating oligonucleotide frequency similarity and Gaussian interaction profile kernel similarity through similarity network fusion. The host prediction accuracy of our method is better than other methods. In addition, case studies show that the host of crAssphage predicted by ILMF-VH is consistent with presumed host in previous studies, and another potential host Escherichia coli is also predicted. Conclusions The proposed model is an effective computational tool for predicting interactions between viruses and hosts effectively, and it has great potential for discovering novel hosts of viruses.

Download Full-text