scholarly journals Improved Identification of Small Open Reading Frames Encoded Peptides by Top-Down Proteomic Approaches and De Novo Sequencing

2021 ◽  
Vol 22 (11) ◽  
pp. 5476
Author(s):  
Bing Wang ◽  
Zhiwei Wang ◽  
Ni Pan ◽  
Jiangmei Huang ◽  
Cuihong Wan

Small open reading frames (sORFs) have translational potential to produce peptides that play essential roles in various biological processes. Nevertheless, many sORF-encoded peptides (SEPs) are still on the prediction level. Here, we construct a strategy to analyze SEPs by combining top-down and de novo sequencing to improve SEP identification and sequence coverage. With de novo sequencing, we identified 1682 peptides mapping to 2544 human sORFs, which were all first characterized in this work. Two-thirds of these new sORFs have reading frame shifts and use a non-ATG start codon. The top-down approach identified 241 human SEPs, with high sequence coverage. The average length of the peptides from the bottom-up database search was 19 amino acids (AA); from de novo sequencing, it was 9 AA; and from the top-down approach, it was 25 AA. The longer peptide positively boosts the sequence coverage, more efficiently distinguishing SEPs from the known gene coding sequence. Top-down has the advantage of identifying peptides with sequential K/R or high K/R content, which is unfavorable in the bottom-up approach. Our method can explore new coding sORFs and obtain highly accurate sequences of their SEPs, which can also benefit future function research.

PROTEOMICS ◽  
2017 ◽  
Vol 17 (23-24) ◽  
pp. 1600321 ◽  
Author(s):  
Kira Vyatkina ◽  
Lennard J. M. Dekker ◽  
Si Wu ◽  
Martijn M. VanDuijn ◽  
Xiaowen Liu ◽  
...  

Author(s):  
Ni Pan ◽  
Zhiwei Wang ◽  
Bing Wang ◽  
Jian Wan ◽  
Cuihong Wan

Small open reading frame encoded peptides (SEPs), also called microproteins, play a vital role in biological processes. Plenty of their open reading frames are located within the non-coding RNA (ncRNA) range. Recent research has demonstrated that ncRNA-encoded polypeptides have essential functions and exist ubiquitously in various tissues. To better understand the role of microproteins, especially ncRNA-encoded proteins, expressed in different tissues, we profiled the proteomic characterization of five mouse tissues by mass spectrometry, including bottom-up, top-down, and de novo sequencing strategies. Bottom-up and top-down with database-dependent searches identified 811 microproteins in the OpenProt database. De novo sequencing identified 290 microproteins, including 12 ncRNA-encoded microproteins that were not found in current databases. In this study, we discovered 1,074 microproteins in total, including 270 ncRNA-encoded microproteins. From the annotation of these microproteins, we found that the brain contains the largest number of neuropeptides, while the spleen contains the most immunoassociated microproteins. This suggests that microproteins in different tissues have tissue-specific functions. These unannotated ncRNA-coded microproteins have predicted domains, such as the macrophage migration inhibitory factor domain and the Prefoldin domain. These results expand the mouse proteome and provide insight into the molecular biology of mouse tissues.


2019 ◽  
Author(s):  
Thomas F. Martinez ◽  
Qian Chu ◽  
Cynthia Donaldson ◽  
Dan Tan ◽  
Maxim N. Shokhirev ◽  
...  

Protein-coding small open reading frames (smORFs) are emerging as an important class of genes, however, the coding capacity of smORFs in the human genome is unclear. By integrating de novo transcriptome assembly and Ribo-Seq, we confidently annotate thousands of novel translated smORFs in three human cell lines. We find that smORF translation prediction is noisier than for annotated coding sequences, underscoring the importance of analyzing multiple experiments and footprinting conditions. These smORFs are located within non-coding and antisense transcripts, the UTRs of mRNAs, and unannotated transcripts. Analysis of RNA levels and translation efficiency during cellular stress identifies regulated smORFs, providing an approach to select smORFs for further investigation. Sequence conservation and signatures of positive selection indicate that encoded microproteins are likely functional. Additionally, proteomics data from enriched human leukocyte antigen complexes validates the translation of hundreds of smORFs and positions them as a source of novel antigens. Thus, smORFs represent a significant number of important, yet unexplored human genes.


2020 ◽  
Author(s):  
Justin A. Bosch ◽  
Berrak Ugur ◽  
Israel Pichardo-Casas ◽  
Jorden Rabasco ◽  
Felipe Escobedo ◽  
...  

SummaryNaturally produced peptides (<100 amino acids) are important regulators of physiology, development, and metabolism. Recent studies have predicted that thousands of peptides may be translated from transcripts containing small open reading frames (smORFs). Here, we describe two previously uncharacterized peptides in Drosophila encoded by conserved smORFs, Sloth1 and Sloth2. These peptides are translated from the same bicistronic transcript and share sequence similarities, suggesting that they encode paralogs. We provide evidence that Sloth1/2 are highly expressed in neurons, localize to mitochondria, and form a complex. Double mutant analysis in animals and cell culture revealed that sloth1 and sloth2 are not functionally redundant, and their loss causes animal lethality, reduced neuronal function, impaired mitochondrial function, and neurodegeneration. These results suggest that phenotypic analysis of smORF genes in Drosophila can provide a wealth of information on the biological functions of this poorly characterized class of genes.


2020 ◽  
Vol 6 (4) ◽  
pp. 41
Author(s):  
Mihnea P. Dragomir ◽  
Ganiraju C. Manyam ◽  
Leonie Florence Ott ◽  
Léa Berland ◽  
Erik Knutsen ◽  
...  

Non-coding RNAs (ncRNAs) are essential players in many cellular processes, from normal development to oncogenic transformation. Initially, ncRNAs were defined as transcripts that lacked an open reading frame (ORF). However, multiple lines of evidence suggest that certain ncRNAs encode small peptides of less than 100 amino acids. The sequences encoding these peptides are known as small open reading frames (smORFs), many initiating with the traditional AUG start codon but terminating with atypical stop codons, suggesting a different biogenesis. The ncRNA-encoded peptides (ncPEPs) are gradually becoming appreciated as a new class of functional molecules that contribute to diverse cellular processes, and are deregulated in different diseases contributing to pathogenesis. As multiple publications have identified unique ncPEPs, we appreciated the need for assembling a new web resource that could gather information about these functional ncPEPs. We developed FuncPEP, a new database of functional ncRNA encoded peptides, containing all experimentally validated and functionally characterized ncPEPs. Currently, FuncPEP includes a comprehensive annotation of 112 functional ncPEPs and specific details regarding the ncRNA transcripts that encode these peptides. We believe that FuncPEP will serve as a platform for further deciphering the biologic significance and medical use of ncPEPs. The link for FuncPEP database can be found at the end of the Introduction Section.


Sign in / Sign up

Export Citation Format

Share Document