scholarly journals Computational Methods for Predicting Functions at the mRNA Isoform Level

2020 ◽  
Vol 21 (16) ◽  
pp. 5686
Author(s):  
Sambit K. Mishra ◽  
Viraj Muthye ◽  
Gaurav Kandoi

Multiple mRNA isoforms of the same gene are produced via alternative splicing, a biological mechanism that regulates protein diversity while maintaining genome size. Alternatively spliced mRNA isoforms of the same gene may sometimes have very similar sequence, but they can have significantly diverse effects on cellular function and regulation. The products of alternative splicing have important and diverse functional roles, such as response to environmental stress, regulation of gene expression, human heritable, and plant diseases. The mRNA isoforms of the same gene can have dramatically different functions. Despite the functional importance of mRNA isoforms, very little has been done to annotate their functions. The recent years have however seen the development of several computational methods aimed at predicting mRNA isoform level biological functions. These methods use a wide array of proteo-genomic data to develop machine learning-based mRNA isoform function prediction tools. In this review, we discuss the computational methods developed for predicting the biological function at the individual mRNA isoform level.

Author(s):  
Sambit Kumar Mishra ◽  
Viraj Muthye ◽  
Gaurav Kandoi

Multiple mRNA isoforms of the same gene are produced via alternative splicing, a biological mechanism that regulates protein diversity while maintaining genome size. Alternatively spliced mRNA isoforms of the same gene may sometimes have very similar sequence, but they can have significantly diverse effects on cellular function and regulation. The products of alternative splicing have important and diverse functional roles, such as response to environmental stress, regulation of gene expression, human heritable and plant diseases. The mRNA isoforms of the same gene, such as the apoptosis associated CASP3 gene, can have dramatically different functions. The shorter mRNA isoform product CASP3-S inhibits apoptosis, while the longer CASP3-L mRNA isoform promotes apoptosis. Despite the functional importance of mRNA isoforms, very little has been done to annotate their functions. The recent years have however seen the development of several computational methods aimed at predicting mRNA isoform level biological functions. These methods use a wide array of proteo-genomic data to develop machine learning-based mRNA isoform function prediction tools. In this review, we discuss the computational methods developed for predicting the biological function at the individual mRNA isoform level.


2019 ◽  
Author(s):  
Gaurav Kandoi ◽  
Julie A. Dickerson

AbstractAlternative Splicing produces multiple mRNA isoforms of genes which have important diverse roles such as regulation of gene expression, human heritable diseases, and response to environmental stresses. However, little has been done to assign functions at the mRNA isoform level. Functional networks, where the interactions are quantified by their probability of being involved in the same biological process are typically generated at the gene level. We use a diverse array of tissue-specific RNA-seq datasets and sequence information to train random forest models that predict the functional networks. Since there is no mRNA isoform-level gold standard, we use single isoform genes co-annotated to Gene Ontology biological process annotations, Kyoto Encyclopedia of Genes and Genomes pathways, BioCyc pathways and protein-protein interactions as functionally related (positive pair). To generate the non-functional pairs (negative pair), we use the Gene Ontology annotations tagged with “NOT” qualifier. We describe 17 Tissue-spEcific mrNa iSoform functIOnal Networks (TENSION) following a leave-one-tissue-out strategy in addition to an organism level reference functional network for mouse. We validate our predictions by comparing its performance with previous methods, randomized positive and negative class labels, updated Gene Ontology annotations, and by literature evidence. We demonstrate the ability of our networks to reveal tissue-specific functional differences of the isoforms of the same genes.


2021 ◽  
Vol 12 ◽  
Author(s):  
Ting-Lin Pang ◽  
Zhan Ding ◽  
Shao-Bo Liang ◽  
Liang Li ◽  
Bei Zhang ◽  
...  

Interrupted exons in the pre-mRNA transcripts are ligated together through RNA splicing, which plays a critical role in the regulation of gene expression. Exons with a length ≤ 30 nt are defined as microexons that are unique in identification. However, microexons, especially those shorter than 8 nt, have not been well studied in many organisms due to difficulties in mapping short segments from sequencing reads. Here, we analyzed mRNA-seq data from a variety of Drosophila samples with a newly developed bioinformatic tool, ce-TopHat. In addition to the Flybase annotated, 465 new microexons were identified. Differentially alternatively spliced (AS) microexons were investigated between the Drosophila tissues (head, body, and gonad) and genders. Most of the AS microexons were found in the head and two AS microexons were identified in the sex-determination pathway gene fruitless.


2019 ◽  
Vol 9 (1) ◽  
Author(s):  
Gaurav Kandoi ◽  
Julie A. Dickerson

Abstract Alternative Splicing produces multiple mRNA isoforms of genes which have important diverse roles such as regulation of gene expression, human heritable diseases, and response to environmental stresses. However, little has been done to assign functions at the mRNA isoform level. Functional networks, where the interactions are quantified by their probability of being involved in the same biological process are typically generated at the gene level. We use a diverse array of tissue-specific RNA-seq datasets and sequence information to train random forest models that predict the functional networks. Since there is no mRNA isoform-level gold standard, we use single isoform genes co-annotated to Gene Ontology biological process annotations, Kyoto Encyclopedia of Genes and Genomes pathways, BioCyc pathways and protein-protein interactions as functionally related (positive pair). To generate the non-functional pairs (negative pair), we use the Gene Ontology annotations tagged with “NOT” qualifier. We describe 17 Tissue-spEcific mrNa iSoform functIOnal Networks (TENSION) following a leave-one-tissue-out strategy in addition to an organism level reference functional network for mouse. We validate our predictions by comparing its performance with previous methods, randomized positive and negative class labels, updated Gene Ontology annotations, and by literature evidence. We demonstrate the ability of our networks to reveal tissue-specific functional differences of the isoforms of the same genes. All scripts and data from TENSION are available at: 10.25380/iastate.c.4275191.


1999 ◽  
Vol 112 (10) ◽  
pp. 1449-1453
Author(s):  
T. Kumazaki ◽  
Y. Mitsui ◽  
K. Hamada ◽  
H. Sumida ◽  
M. Nishiyama

Pre-fibronectin mRNA is subject to alternative splicing at three sites, EDA, EDB and IIICS. We analyzed the alternative splicing of fibronectin mRNA in a single cell. Reverse transcription-polymerase chain reaction analyses showed cells that produced a single form of mRNA at each one of these sites as well as cells that produced multiple forms at a given site: for example, some cells produced either the EDA(+) or EDA(-) form of the mRNA and other cells produced both forms. About 80% of the cells produced both (+) and (-) forms of the mRNA at the EDA and EDB sites, and the remaining cells contained either the (+) or (-) form. Five forms of fibronectin mRNA can result from alternative splicing at the IIICS site. Complex combinations of alternative splicing products were observed among the individual cells: there were ten different combinations of mRNA isoforms with respect to the IIICS site. Statistically significant changes in alternative splicing at the IIICS site were observed during cellular senescence.


2014 ◽  
Vol 42 (4) ◽  
pp. 1196-1205 ◽  
Author(s):  
Christopher R. Sibley

Alternative splicing is universally accredited for expanding the information encoded within the transcriptome. In recent years, several tightly regulated alternative splicing events have been reported which do not lead to generation of protein products, but lead to unstable mRNA isoforms. Instead these transcripts are targets for NMD (nonsense-mediated decay) or retained in the nucleus and degraded. In the present review I discuss the regulation of these events, and how many have been implicated in control of gene expression that is instrumental to a number of developmental paradigms. I further discuss their relevance to disease settings and conclude by highlighting technologies that will aid identification of more candidate events in future.


2020 ◽  
Vol 7 (1) ◽  
Author(s):  
Wenguang Cao ◽  
Biting Cao ◽  
Xuan Wang ◽  
Jinjuan Bai ◽  
Yong-Zhen Xu ◽  
...  

Abstract The curd of cauliflower (Brassica oleracea L. var. botrytis) is a modified inflorescence that is consumed as a vegetable. Curd formation is proposed to be due to a mutation in the BobCAULIFLOWER (BobCAL) gene, but the genetic relationship between BobCAL variation and curd morphotypes remains obscure. To address this question, we collected and classified a collection of 78 cauliflower accessions into four subpopulations according to curd surface features: smooth, coarse, granular, and hairy curd morphotypes. Through the cDNA sequencing of BobCAL alleles, we showed that smooth and coarse accessions characterized by inflorescence meristem arrest presented a strong association with the 451T SNP (BobCAL_T), whereas granular and hairy accessions marked with floral organ arrest presented an association with 451G (BobCAL_G). Interestingly, all BobCAL alleles were alternatively spliced, resulting in a total of four alternative splice (AS) variants due to the retention of the fourth and/or seventh introns. Among accessions with BobCAL_G alleles, the total expression of all these AS variants in granular plants was almost equal to that in hairy plants; however, the expression of the individual AS variants encoding intact proteins relative to those encoding truncated proteins differed. Hairy accessions showed relatively high expression of the individual variants encoding intact proteins, whereas granular accessions displayed relatively low expression. In smooth cauliflower, the overexpression of the BobCAL_Ga variant caused an alteration in the curd morphotype from smooth to hairy, concurrent with an increase in the expression levels of downstream floral identity genes. These results reveal that alternative splicing of BobCAL transcripts is involved in the determination of cauliflower curd morphotypes.


2001 ◽  
Vol 21 (15) ◽  
pp. 4985-4995 ◽  
Author(s):  
Caroline A. Spike ◽  
Jocelyn E. Shaw ◽  
Robert K. Herman

ABSTRACT Mutations in the smu-1 gene of Caenorhabditis elegans were previously shown to suppress mutations in the genes mec-8 and unc-52.mec-8 encodes a putative RNA binding protein that affects the accumulation of specific alternatively spliced mRNA isoforms produced by unc-52 and other genes.unc-52 encodes a set of basement membrane proteins, homologs of mammalian perlecan, that are important for body wall muscle assembly and attachment to basement membrane, hypodermis, and cuticle. We show that a presumptive null mutation in smu-1suppresses nonsense mutations in exon 17 but not exon 18 ofunc-52 and enhances the phenotype conferred by anunc-52 splice site mutation in intron 16. We have used reverse transcription-PCR and RNase protection to show that loss-of-function smu-1 mutations enhance accumulation in larvae of an alternatively spliced isoform that skips exon 17 but not exon 18 of unc-52. We have identified smu-1 molecularly; it encodes a nuclearly localized protein that contains five WD motifs and is ubiquitously expressed. The SMU-1 amino acid sequence is more than 60% identical to a predicted human protein of unknown function. We propose that smu-1 encodes a trans-acting factor that regulates the alternative splicing of the pre-mRNA ofunc-52 and other genes.


2020 ◽  
Vol 477 (16) ◽  
pp. 3091-3104 ◽  
Author(s):  
Luciana E. Giono ◽  
Alberto R. Kornblihtt

Gene expression is an intricately regulated process that is at the basis of cell differentiation, the maintenance of cell identity and the cellular responses to environmental changes. Alternative splicing, the process by which multiple functionally distinct transcripts are generated from a single gene, is one of the main mechanisms that contribute to expand the coding capacity of genomes and help explain the level of complexity achieved by higher organisms. Eukaryotic transcription is subject to multiple layers of regulation both intrinsic — such as promoter structure — and dynamic, allowing the cell to respond to internal and external signals. Similarly, alternative splicing choices are affected by all of these aspects, mainly through the regulation of transcription elongation, making it a regulatory knob on a par with the regulation of gene expression levels. This review aims to recapitulate some of the history and stepping-stones that led to the paradigms held today about transcription and splicing regulation, with major focus on transcription elongation and its effect on alternative splicing.


2019 ◽  
Vol 14 (6) ◽  
pp. 470-479 ◽  
Author(s):  
Nazia Parveen ◽  
Amen Shamim ◽  
Seunghee Cho ◽  
Kyeong Kyu Kim

Background: Although most nucleotides in the genome form canonical double-stranded B-DNA, many repeated sequences transiently present as non-canonical conformations (non-B DNA) such as triplexes, quadruplexes, Z-DNA, cruciforms, and slipped/hairpins. Those noncanonical DNAs (ncDNAs) are not only associated with many genetic events such as replication, transcription, and recombination, but are also related to the genetic instability that results in the predisposition to disease. Due to the crucial roles of ncDNAs in cellular and genetic functions, various computational methods have been implemented to predict sequence motifs that generate ncDNA. Objective: Here, we review strategies for the identification of ncDNA motifs across the whole genome, which is necessary for further understanding and investigation of the structure and function of ncDNAs. Conclusion: There is a great demand for computational prediction of non-canonical DNAs that play key functional roles in gene expression and genome biology. In this study, we review the currently available computational methods for predicting the non-canonical DNAs in the genome. Current studies not only provide an insight into the computational methods for predicting the secondary structures of DNA but also increase our understanding of the roles of non-canonical DNA in the genome.


Sign in / Sign up

Export Citation Format

Share Document