Proteomics: The protein complement of the genome

Hannes Ponstingl; Janet M. Thornton

doi:10.1042/bio02501007

Proteomics: The protein complement of the genome

The Biochemist ◽

10.1042/bio02501007 ◽

2003 ◽

Vol 25 (1) ◽

pp. 7-9

Author(s):

Hannes Ponstingl ◽

Janet M. Thornton

Keyword(s):

Messenger Rna ◽

Protein Separation ◽

Large Scale ◽

Protein Complexes ◽

Protein Isoforms ◽

Structural Proteomics ◽

Protein Markers ◽

Proteomics Data ◽

Single Experiment ◽

Large Sets

Recent advances in protein separation technology and mass spectrometry (MS) have enabled the systematic identification and quantification of large sets of proteins from an organelle, cell type or organism. In principle, protein isoforms, enzymically modified variants and protein complexes can be studied, for instance, at a certain stage in development or in response to stress or more subtle changes of the environment. An important pre-clinical application is the search for protein markers in body fluids for diagnostic purposes. Such proteomics studies can be performed increasingly at high-throughput rates that are reminiscent of those of genomic sequencing or the monitoring of messenger RNA levels. Thus, large sets of proteins can be monitored simultaneously in a single experiment. Proteomics data will increasingly be followed up by investigations of the three-dimensional structures of proteins and protein complexes at atomic detail in large-scale structural proteomics projects. We attempt in this article to give a flavour of what to us seem important experimental developments and to point to links with bioinformatics resources where appropriate.

Download Full-text

FLEXIQuant-LF to quantify protein modification extent in label-free proteomics data

eLife ◽

10.7554/elife.58783 ◽

2020 ◽

Vol 9 ◽

Author(s):

Christoph N Schlaffner ◽

Konstantin Kahnert ◽

Jan Muntel ◽

Ruchi Chauhan ◽

Bernhard Y Renard ◽

...

Keyword(s):

Protein Modification ◽

Large Scale ◽

Software Tool ◽

Label Free ◽

Anaphase Promoting Complex ◽

Proteomics Data ◽

Single Experiment ◽

Post Translational Modifications ◽

Data Independent Acquisition ◽

Modified Peptides

Improvements in LC-MS/MS methods and technology have enabled the identification of thousands of modified peptides in a single experiment. However, protein regulation by post-translational modifications (PTMs) is not binary, making methods to quantify the modification extent crucial to understanding the role of PTMs. Here, we introduce FLEXIQuant-LF, a software tool for large-scale identification of differentially modified peptides and quantification of their modification extent without knowledge of the types of modifications involved. We developed FLEXIQuant-LF using label-free quantification of unmodified peptides and robust linear regression to quantify the modification extent of peptides. As proof of concept, we applied FLEXIQuant-LF to data-independent-acquisition (DIA) data of the anaphase promoting complex/cyclosome (APC/C) during mitosis. The unbiased FLEXIQuant-LF approach to assess the modification extent in quantitative proteomics data provides a better understanding of the function and regulation of PTMs. The software is available at https://github.com/SteenOmicsLab/FLEXIQuantLF.

Download Full-text

Large-scale allosteric switch in the 7SK RNA regulates transcription in response to growth and stress

10.1101/2021.09.16.460563 ◽

2021 ◽

Author(s):

Samuel W Olson ◽

Anne-Marie W Turner ◽

J Winston Arney ◽

Irfana Saleem ◽

Chase A Weidmann ◽

...

Keyword(s):

Rna Structure ◽

Large Scale ◽

Protein Complexes ◽

Transcription Elongation ◽

Elongation Factor ◽

Regulatory Function ◽

Significant Heterogeneity ◽

Single Experiment ◽

Eukaryotic Transcription ◽

In Cells

7SK is a highly conserved non-coding RNA that regulates eukaryotic transcription by sequestering positive transcription elongation factor b (P-TEFb). 7SK regulatory function likely entails changes in RNA structure, but characterizing dynamic RNA-protein complexes in cells has remained an unsolved challenge. We describe a new chemical probing strategy (DANCE-MaP) that uses maximum likelihood deconvolution and probabilistic read assignment to define simultaneously (i) per-nucleotide reactivity profiles, (ii) direct base pairing interactions, and (iii) tertiary and higher-order interactions for each conformation of multi-state RNA structural ensembles, all from a single experiment. We show that human 7SK RNA, despite significant heterogeneity, intrinsically codes for a large-scale structural switch that couples dissolution of the P-TEFb binding site to structural remodeling at distal release factor binding sites. The 7SK structural equilibrium is regulated by cell type, shifts dynamically in response to cell growth and stress, and can be exogenously targeted to modulate transcription in cells. Our data support that the 7SK structural ensemble functions as an integrator of diverse cellular signals to control transcription elongation in environment and cell specific ways, and establishes DANCE-MaP as a powerful strategy for comprehensively defining RNA structure and dynamics in cells.

Download Full-text

RoDiCE: Robust differential protein co-expression analysis for cancer complexome

10.1101/2020.12.22.423973 ◽

2020 ◽

Author(s):

Yusuke Matsui ◽

Yuichi Abe ◽

Kohei Uno ◽

Satoru Miyano

Keyword(s):

Protein Complex ◽

Drug Targets ◽

Large Scale ◽

Protein Complexes ◽

Noisy Data ◽

Identification Accuracy ◽

Proteomics Data ◽

Differential Protein ◽

Full Picture ◽

Linear Correlations

AbstractMotivationThe full picture of abnormalities in protein complexes in cancer remains largely unknown. Comparing the co-expression structure of each protein complex between tumor and normal groups could help us understand the cancer-specific dysfunction of proteins. However, the technical limitations of mass spectrometry-based proteomics and biological variations contaminating the protein expression with noise lead to non-negligible over- (or under-) estimating co-expression.ResultsWe propose a robust algorithm for identifying protein complex aberrations in cancer based on differential protein co-expression testing. Our method based on a copula is sufficient for improving the identification accuracy with noisy data over a conventional linear correlation-based approach. As an application, we show that important protein complexes can be identified along with regulatory signaling pathways, and even drug targets can be identified using large-scale proteomics data from renal cancer. The proposed approach goes beyond traditional linear correlations to provide insights into higher order differential co-expression structures.Availability and Implementationhttps://github.com/ymatts/[email protected]

Download Full-text

FLEXIQuant-LF: Robust Regression to Quantify Protein Modification Extent in Label-Free Proteomics Data

10.1101/2020.05.11.088492 ◽

2020 ◽

Author(s):

Konstantin Kahnert ◽

Christoph N. Schlaffner ◽

Jan Muntel ◽

Ruchi Chauhan ◽

Bernhard Y. Renard ◽

...

Keyword(s):

Protein Modification ◽

Large Scale ◽

Robust Regression ◽

Software Tool ◽

Label Free ◽

Anaphase Promoting Complex ◽

Proteomics Data ◽

Single Experiment ◽

Data Independent Acquisition ◽

Modified Peptides

AbstractImprovements in LC-MS/MS methods and technology have enabled the identification of thousands of modified peptides in a single experiment. However, protein regulation by post-translational modifications (PTMs) is not binary, making methods to quantify the modification extent crucial to understanding the role of PTMs. Here, we introduce FLEXIQuant-LF, a software tool for large-scale identification of differentially modified peptides and quantification of their modification extent without prior knowledge of the type of modification. We developed FLEXIQuant-LF using label-free quantification of unmodified peptides and robust linear regression to quantify the modification extent of peptides. As proof of concept, we applied FLEXIQuant-LF to data-independent-acquisition (DIA) data of the anaphase promoting complex/cyclosome (APC/C) during mitosis. The unbiased FLEXIQuant-LF approach to assess the modification extent in quantitative proteomics data provides a better understanding of the function and regulation of PTMs. The software is available at https://github.com/SteenOmicsLab/FLEXIQuantLF.

Download Full-text

Assessing the functional relevance of splice isoforms

NAR Genomics and Bioinformatics ◽

10.1093/nargab/lqab044 ◽

2021 ◽

Vol 3 (2) ◽

Author(s):

Fernando Pozo ◽

Laura Martinez-Gomez ◽

Thomas A Walsh ◽

José Manuel Rodriguez ◽

Tomas Di Domenico ◽

...

Keyword(s):

Alternative Splicing ◽

Messenger Rna ◽

Large Scale ◽

Somatic Mutations ◽

Species Conservation ◽

Purifying Selection ◽

Protein Isoforms ◽

Splice Isoforms ◽

Population Genetic Variation ◽

Functional Relevance

Abstract Alternative splicing of messenger RNA can generate an array of mature transcripts, but it is not clear how many go on to produce functionally relevant protein isoforms. There is only limited evidence for alternative proteins in proteomics analyses and data from population genetic variation studies indicate that most alternative exons are evolving neutrally. Determining which transcripts produce biologically important isoforms is key to understanding isoform function and to interpreting the real impact of somatic mutations and germline variations. Here we have developed a method, TRIFID, to classify the functional importance of splice isoforms. TRIFID was trained on isoforms detected in large-scale proteomics analyses and distinguishes these biologically important splice isoforms with high confidence. Isoforms predicted as functionally important by the algorithm had measurable cross species conservation and significantly fewer broken functional domains. Additionally, exons that code for these functionally important protein isoforms are under purifying selection, while exons from low scoring transcripts largely appear to be evolving neutrally. TRIFID has been developed for the human genome, but it could in principle be applied to other well-annotated species. We believe that this method will generate valuable insights into the cellular importance of alternative splicing.

Download Full-text

Faculty Opinions recommendation of Large-scale analysis of human alternative protein isoforms: pattern classification and correlation with subcellular localization signals.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.1026748.325361 ◽

2005 ◽

Author(s):

Arcady Mushegian

Keyword(s):

Subcellular Localization ◽

Pattern Classification ◽

Large Scale ◽

Protein Isoforms ◽

Scale Analysis ◽

Large Scale Analysis ◽

Alternative Protein

Download Full-text

BonMOLière: Small-Sized Libraries of Readily Purchasable Compounds, Optimized to Produce Genuine Hits in Biological Screens across the Protein Space

International Journal of Molecular Sciences ◽

10.3390/ijms22157773 ◽

2021 ◽

Vol 22 (15) ◽

pp. 7773

Author(s):

Neann Mathai ◽

Conrad Stork ◽

Johannes Kirchmair

Keyword(s):

Large Scale ◽

Computational Approach ◽

Large Sets ◽

Compound Libraries ◽

Wide Range ◽

Protein Space ◽

High Chance ◽

Large Scale Screening ◽

Early Drug ◽

Selection Of

Experimental screening of large sets of compounds against macromolecular targets is a key strategy to identify novel bioactivities. However, large-scale screening requires substantial experimental resources and is time-consuming and challenging. Therefore, small to medium-sized compound libraries with a high chance of producing genuine hits on an arbitrary protein of interest would be of great value to fields related to early drug discovery, in particular biochemical and cell research. Here, we present a computational approach that incorporates drug-likeness, predicted bioactivities, biological space coverage, and target novelty, to generate optimized compound libraries with maximized chances of producing genuine hits for a wide range of proteins. The computational approach evaluates drug-likeness with a set of established rules, predicts bioactivities with a validated, similarity-based approach, and optimizes the composition of small sets of compounds towards maximum target coverage and novelty. We found that, in comparison to the random selection of compounds for a library, our approach generates substantially improved compound sets. Quantified as the “fitness” of compound libraries, the calculated improvements ranged from +60% (for a library of 15,000 compounds) to +184% (for a library of 1000 compounds). The best of the optimized compound libraries prepared in this work are available for download as a dataset bundle (“BonMOLière”).

Download Full-text

Messenger RNA guanylyltransferase from Saccharomyces cerevisiae. Large scale purification, subunit functions, and subcellular localization.

Journal of Biological Chemistry ◽

10.1016/s0021-9258(18)61609-6 ◽

1987 ◽

Vol 262 (5) ◽

pp. 1989-1995 ◽

Cited By ~ 2

Author(s):

N. Itoh ◽

H. Yamada ◽

Y. Kaziro ◽

K. Mizumoto

Keyword(s):

Saccharomyces Cerevisiae ◽

Subcellular Localization ◽

Messenger Rna ◽

Large Scale

Download Full-text

A primary human T-cell spectral library to facilitate large scale quantitative T-cell proteomics

Scientific Data ◽

10.1038/s41597-020-00744-3 ◽

2020 ◽

Vol 7 (1) ◽

Author(s):

Harshi Weerakoon ◽

Jeremy Potriquet ◽

Alok K. Shah ◽

Sarah Reed ◽

Buddhika Jayakody ◽

...

Keyword(s):

T Cell ◽

Large Scale ◽

Large Data ◽

Spectral Library ◽

Proteomics Data ◽

Cell Library ◽

Public Resource ◽

Human T Cell ◽

Theoretical Mass ◽

Current Resource

AbstractData independent analysis (DIA) exemplified by sequential window acquisition of all theoretical mass spectra (SWATH-MS) provides robust quantitative proteomics data, but the lack of a public primary human T-cell spectral library is a current resource gap. Here, we report the generation of a high-quality spectral library containing data for 4,833 distinct proteins from human T-cells across genetically unrelated donors, covering ~24% proteins of the UniProt/SwissProt reviewed human proteome. SWATH-MS analysis of 18 primary T-cell samples using the new human T-cell spectral library reliably identified and quantified 2,850 proteins at 1% false discovery rate (FDR). In comparison, the larger Pan-human spectral library identified and quantified 2,794 T-cell proteins in the same dataset. As the libraries identified an overlapping set of proteins, combining the two libraries resulted in quantification of 4,078 human T-cell proteins. Collectively, this large data archive will be a useful public resource for human T-cell proteomic studies. The human T-cell library is available at SWATHAtlas and the data are available via ProteomeXchange (PXD019446 and PXD019542) and PeptideAtlas (PASS01587).

Download Full-text

Upstream – News in Genomics

Comparative and Functional Genomics ◽

10.1002/cfg.172 ◽

2002 ◽

Vol 3 (3) ◽

pp. 221-225

Keyword(s):

Saccharomyces Cerevisiae ◽

Ovarian Cancer ◽

Oryza Sativa ◽

Schizosaccharomyces Pombe ◽

Fission Yeast ◽

Proteomic Analysis ◽

Large Scale ◽

Protein Complexes ◽

The Other ◽

Blood Samples

In recent months a bumper crop of genomes has been completed, including the fission yeast (Schizosaccharomyces pombe) and rice (Oryza sativa). Two large-scale studies ofSaccharomyces cerevisiaeprotein complexes provided a picture of the eukaryotic proteome as a network of complexes. Amongst the other stories of interest was a demonstration that proteomic analysis of blood samples can be used to detect ovarian cancer, perhaps even as early as stage I.

Download Full-text