scholarly journals Dimensionality reduction by UMAP to visualize physical and genetic interactions

2019 ◽  
Author(s):  
Michael W. Dorrity ◽  
Lauren M. Saunders ◽  
Christine Queitsch ◽  
Stanley Fields ◽  
Cole Trapnell

Dimensionality reduction is often used to visualize complex expression profiling data. Here, we use the Uniform Manifold Approximation and Projection (UMAP) method on published transcript profiles of 1484 single gene deletions of Saccharomyces cerevisiae. Proximity in low-dimensional UMAP space identifies clusters of genes that correspond to protein complexes and pathways, and finds novel protein interactions even within well-characterized complexes. This approach is more sensitive than previous methods and should be broadly useful as additional transcriptome datasets become available for other organisms.

2007 ◽  
Vol 8 (12) ◽  
pp. R256 ◽  
Author(s):  
Sara Zanivan ◽  
Ilaria Cascone ◽  
Chiara Peyron ◽  
Ivan Molineris ◽  
Serena Marchio ◽  
...  

2020 ◽  
Author(s):  
Andrea Fossati ◽  
Chen Li ◽  
Peter Sykacek ◽  
Moritz Heusel ◽  
Fabian Frommelt ◽  
...  

AbstractProtein complexes, macro-molecular assemblies of two or more proteins, play vital roles in numerous cellular activities and collectively determine the cellular state. Despite the availability of a range of methods for analysing protein complexes, systematic analysis of complexes under multiple conditions has remained challenging. Approaches based on biochemical fractionation of intact, native complexes and correlation of protein profiles have shown promise, for instance in the combination of size exclusion chromatography (SEC) with accurate protein quantification by SWATH/DIA-MS. However, most approaches for interpreting co-fractionation datasets to yield complex composition, abundance and rearrangements between samples depend heavily on prior evidence. We introduce PCprophet, a computational framework to identify novel protein complexes from SEC-SWATH-MS data and to characterize their changes across different experimental conditions. We demonstrate accurate prediction of protein complexes (AUC >0.99 and accuracy around 97%) via five-fold cross-validation on SEC-SWATH-MS data, show improved performance over state-of-the-art approaches on multiple annotated co-fractionation datasets, and describe a Bayesian approach to analyse altered protein-protein interactions across conditions. PCprophet is a generic computational tool consisting of modules for data pre-processing, hypothesis generation, machine-learning prediction, post-prediction processing, and differential analysis. It can be applied to any co-fractionation MS dataset, independent of separation or quantitative LC-MS workflow employed, and to support the detection and quantitative tracking of novel protein complexes and their physiological dynamics.


2008 ◽  
Vol 06 (03) ◽  
pp. 435-466 ◽  
Author(s):  
HON NIAN CHUA ◽  
KANG NING ◽  
WING-KIN SUNG ◽  
HON WAI LEONG ◽  
LIMSOON WONG

Protein complexes are fundamental for understanding principles of cellular organizations. As the sizes of protein–protein interaction (PPI) networks are increasing, accurate and fast protein complex prediction from these PPI networks can serve as a guide for biological experiments to discover novel protein complexes. However, it is not easy to predict protein complexes from PPI networks, especially in situations where the PPI network is noisy and still incomplete. Here, we study the use of indirect interactions between level-2 neighbors (level-2 interactions) for protein complex prediction. We know from previous work that proteins which do not interact but share interaction partners (level-2 neighbors) often share biological functions. We have proposed a method in which all direct and indirect interactions are first weighted using topological weight (FS-Weight), which estimates the strength of functional association. Interactions with low weight are removed from the network, while level-2 interactions with high weight are introduced into the interaction network. Existing clustering algorithms can then be applied to this modified network. We have also proposed a novel algorithm that searches for cliques in the modified network, and merge cliques to form clusters using a "partial clique merging" method. Experiments show that (1) the use of indirect interactions and topological weight to augment protein–protein interactions can be used to improve the precision of clusters predicted by various existing clustering algorithms; and (2) our complex-finding algorithm performs very well on interaction networks modified in this way. Since no other information except the original PPI network is used, our approach would be very useful for protein complex prediction, especially for prediction of novel protein complexes.


2019 ◽  
Vol 26 (21) ◽  
pp. 3890-3910 ◽  
Author(s):  
Branislava Gemovic ◽  
Neven Sumonja ◽  
Radoslav Davidovic ◽  
Vladimir Perovic ◽  
Nevena Veljkovic

Background: The significant number of protein-protein interactions (PPIs) discovered by harnessing concomitant advances in the fields of sequencing, crystallography, spectrometry and two-hybrid screening suggests astonishing prospects for remodelling drug discovery. The PPI space which includes up to 650 000 entities is a remarkable reservoir of potential therapeutic targets for every human disease. In order to allow modern drug discovery programs to leverage this, we should be able to discern complete PPI maps associated with a specific disorder and corresponding normal physiology. Objective: Here, we will review community available computational programs for predicting PPIs and web-based resources for storing experimentally annotated interactions. Methods: We compared the capacities of prediction tools: iLoops, Struck2Net, HOMCOS, COTH, PrePPI, InterPreTS and PRISM to predict recently discovered protein interactions. Results: We described sequence-based and structure-based PPI prediction tools and addressed their peculiarities. Additionally, since the usefulness of prediction algorithms critically depends on the quality and quantity of the experimental data they are built on; we extensively discussed community resources for protein interactions. We focused on the active and recently updated primary and secondary PPI databases, repositories specialized to the subject or species, as well as databases that include both experimental and predicted PPIs. Conclusion: PPI complexes are the basis of important physiological processes and therefore, possible targets for cell-penetrating ligands. Reliable computational PPI predictions can speed up new target discoveries through prioritization of therapeutically relevant protein–protein complexes for experimental studies.


2020 ◽  
Vol 27 (37) ◽  
pp. 6306-6355 ◽  
Author(s):  
Marian Vincenzi ◽  
Flavia Anna Mercurio ◽  
Marilisa Leone

Background:: Many pathways regarding healthy cells and/or linked to diseases onset and progression depend on large assemblies including multi-protein complexes. Protein-protein interactions may occur through a vast array of modules known as protein interaction domains (PIDs). Objective:: This review concerns with PIDs recognizing post-translationally modified peptide sequences and intends to provide the scientific community with state of art knowledge on their 3D structures, binding topologies and potential applications in the drug discovery field. Method:: Several databases, such as the Pfam (Protein family), the SMART (Simple Modular Architecture Research Tool) and the PDB (Protein Data Bank), were searched to look for different domain families and gain structural information on protein complexes in which particular PIDs are involved. Recent literature on PIDs and related drug discovery campaigns was retrieved through Pubmed and analyzed. Results and Conclusion:: PIDs are rather versatile as concerning their binding preferences. Many of them recognize specifically only determined amino acid stretches with post-translational modifications, a few others are able to interact with several post-translationally modified sequences or with unmodified ones. Many PIDs can be linked to different diseases including cancer. The tremendous amount of available structural data led to the structure-based design of several molecules targeting protein-protein interactions mediated by PIDs, including peptides, peptidomimetics and small compounds. More studies are needed to fully role out, among different families, PIDs that can be considered reliable therapeutic targets, however, attacking PIDs rather than catalytic domains of a particular protein may represent a route to obtain selective inhibitors.


2021 ◽  
Vol 7 (1) ◽  
pp. 11 ◽  
Author(s):  
André P. Gerber

RNA–protein interactions frame post-transcriptional regulatory networks and modulate transcription and epigenetics. While the technological advances in RNA sequencing have significantly expanded the repertoire of RNAs, recently developed biochemical approaches combined with sensitive mass-spectrometry have revealed hundreds of previously unrecognized and potentially novel RNA-binding proteins. Nevertheless, a major challenge remains to understand how the thousands of RNA molecules and their interacting proteins assemble and control the fate of each individual RNA in a cell. Here, I review recent methodological advances to approach this problem through systematic identification of proteins that interact with particular RNAs in living cells. Thereby, a specific focus is given to in vivo approaches that involve crosslinking of RNA–protein interactions through ultraviolet irradiation or treatment of cells with chemicals, followed by capture of the RNA under study with antisense-oligonucleotides and identification of bound proteins with mass-spectrometry. Several recent studies defining interactomes of long non-coding RNAs, viral RNAs, as well as mRNAs are highlighted, and short reference is given to recent in-cell protein labeling techniques. These recent experimental improvements could open the door for broader applications and to study the remodeling of RNA–protein complexes upon different environmental cues and in disease.


Author(s):  
Rohan Dandage ◽  
Caroline M Berger ◽  
Isabelle Gagnon-Arsenault ◽  
Kyung-Mee Moon ◽  
Richard Greg Stacey ◽  
...  

Abstract Hybrids between species often show extreme phenotypes, including some that take place at the molecular level. In this study, we investigated the phenotypes of an interspecies diploid hybrid in terms of protein-protein interactions inferred from protein correlation profiling. We used two yeast species, Saccharomyces cerevisiae and Saccharomyces uvarum, which are interfertile, but yet have proteins diverged enough to be differentiated using mass spectrometry. Most of the protein-protein interactions are similar between hybrid and parents, and are consistent with the assembly of chimeric complexes, which we validated using an orthogonal approach for the prefoldin complex. We also identified instances of altered protein-protein interactions in the hybrid, for instance in complexes related to proteostasis and in mitochondrial protein complexes. Overall, this study uncovers the likely frequent occurrence of chimeric protein complexes with few exceptions, which may result from incompatibilities or imbalances between the parental proteins.


Life ◽  
2020 ◽  
Vol 11 (1) ◽  
pp. 15
Author(s):  
Radek Kaňa ◽  
Gábor Steinbach ◽  
Roman Sobotka ◽  
György Vámosi ◽  
Josef Komenda

Biological membranes were originally described as a fluid mosaic with uniform distribution of proteins and lipids. Later, heterogeneous membrane areas were found in many membrane systems including cyanobacterial thylakoids. In fact, cyanobacterial pigment–protein complexes (photosystems, phycobilisomes) form a heterogeneous mosaic of thylakoid membrane microdomains (MDs) restricting protein mobility. The trafficking of membrane proteins is one of the key factors for long-term survival under stress conditions, for instance during exposure to photoinhibitory light conditions. However, the mobility of unbound ‘free’ proteins in thylakoid membrane is poorly characterized. In this work, we assessed the maximal diffusional ability of a small, unbound thylakoid membrane protein by semi-single molecule FCS (fluorescence correlation spectroscopy) method in the cyanobacterium Synechocystis sp. PCC6803. We utilized a GFP-tagged variant of the cytochrome b6f subunit PetC1 (PetC1-GFP), which was not assembled in the b6f complex due to the presence of the tag. Subsequent FCS measurements have identified a very fast diffusion of the PetC1-GFP protein in the thylakoid membrane (D = 0.14 − 2.95 µm2s−1). This means that the mobility of PetC1-GFP was comparable with that of free lipids and was 50–500 times higher in comparison to the mobility of proteins (e.g., IsiA, LHCII—light-harvesting complexes of PSII) naturally associated with larger thylakoid membrane complexes like photosystems. Our results thus demonstrate the ability of free thylakoid-membrane proteins to move very fast, revealing the crucial role of protein–protein interactions in the mobility restrictions for large thylakoid protein complexes.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Joshua T. Vogelstein ◽  
Eric W. Bridgeford ◽  
Minh Tang ◽  
Da Zheng ◽  
Christopher Douville ◽  
...  

AbstractTo solve key biomedical problems, experimentalists now routinely measure millions or billions of features (dimensions) per sample, with the hope that data science techniques will be able to build accurate data-driven inferences. Because sample sizes are typically orders of magnitude smaller than the dimensionality of these data, valid inferences require finding a low-dimensional representation that preserves the discriminating information (e.g., whether the individual suffers from a particular disease). There is a lack of interpretable supervised dimensionality reduction methods that scale to millions of dimensions with strong statistical theoretical guarantees. We introduce an approach to extending principal components analysis by incorporating class-conditional moment estimates into the low-dimensional projection. The simplest version, Linear Optimal Low-rank projection, incorporates the class-conditional means. We prove, and substantiate with both synthetic and real data benchmarks, that Linear Optimal Low-Rank Projection and its generalizations lead to improved data representations for subsequent classification, while maintaining computational efficiency and scalability. Using multiple brain imaging datasets consisting of more than 150 million features, and several genomics datasets with more than 500,000 features, Linear Optimal Low-Rank Projection outperforms other scalable linear dimensionality reduction techniques in terms of accuracy, while only requiring a few minutes on a standard desktop computer.


Sign in / Sign up

Export Citation Format

Share Document