scholarly journals PTMphinder: an R package for PTM site localization and motif extraction from proteomic datasets

PeerJ ◽  
2019 ◽  
Vol 7 ◽  
pp. e7046 ◽  
Author(s):  
Jacob M. Wozniak ◽  
David J. Gonzalez

Background Mass-spectrometry-based proteomics is a prominent field of study that allows for the unbiased quantification of thousands of proteins from a particular sample. A key advantage of these techniques is the ability to detect protein post-translational modifications (PTMs) and localize them to specific amino acid residues. These approaches have led to many significant findings in a wide range of biological disciplines, from developmental biology to cancer and infectious diseases. However, there is a current lack of tools available to connect raw PTM site information to biologically meaningful results in a high-throughput manner. Furthermore, many of the available tools require significant programming knowledge to implement. Results The R package PTMphinder was designed to enable researchers, particularly those with minimal programming background, to thoroughly analyze PTMs in proteomic data sets. The package contains three functions: parseDB, phindPTMs and extractBackground. Together, these functions allow users to reformat proteome databases for easier analysis, localize PTMs within full proteins, extract motifs surrounding the identified sites and create proteome-specific motif backgrounds for statistical purposes. Beta-testing of this R package has demonstrated its simplicity and ease of integration with existing tools. Conclusion PTMphinder empowers researchers to fully analyze and interpret PTMs derived from proteomic data. This package is simple enough for researchers with limited programming experience to understand and implement. The data produced from this package can inform subsequent research by itself and also be used in conjunction with other tools, such as motif-x, for further analysis.

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Yance Feng ◽  
Lei M. Li

Abstract Background Normalization of RNA-seq data aims at identifying biological expression differentiation between samples by removing the effects of unwanted confounding factors. Explicitly or implicitly, the justification of normalization requires a set of housekeeping genes. However, the existence of housekeeping genes common for a very large collection of samples, especially under a wide range of conditions, is questionable. Results We propose to carry out pairwise normalization with respect to multiple references, selected from representative samples. Then the pairwise intermediates are integrated based on a linear model that adjusts the reference effects. Motivated by the notion of housekeeping genes and their statistical counterparts, we adopt the robust least trimmed squares regression in pairwise normalization. The proposed method (MUREN) is compared with other existing tools on some standard data sets. The goodness of normalization emphasizes on preserving possible asymmetric differentiation, whose biological significance is exemplified by a single cell data of cell cycle. MUREN is implemented as an R package. The code under license GPL-3 is available on the github platform: github.com/hippo-yf/MUREN and on the conda platform: anaconda.org/hippo-yf/r-muren. Conclusions MUREN performs the RNA-seq normalization using a two-step statistical regression induced from a general principle. We propose that the densities of pairwise differentiations are used to evaluate the goodness of normalization. MUREN adjusts the mode of differentiation toward zero while preserving the skewness due to biological asymmetric differentiation. Moreover, by robustly integrating pre-normalized counts with respect to multiple references, MUREN is immune to individual outlier samples.


2003 ◽  
Vol 3 ◽  
pp. 138-155 ◽  
Author(s):  
R. Gordon Paul ◽  
Allen J. Bailey

Collagen is the most abundant protein in animals and because of its high mechanical strength and good resistance to degradation has been utilized in a wide range of products in industry whilst its low antigenicity has resulted in its widespread use in medicine. Collagen products can be purified from fibres, molecules reconstituted as fibres or from specific recombinant polypeptides with preferred properties. A common feature of all these biomaterials is the need for stable chemical cross-linking to control the mechanical properties and the residence time in the body, and to some extent the immunogenicity of the device. This can be achieved by a number of different cross-linking agents that react with specific amino acid residues on the collagen molecule imparting individual biochemical, thermal and mechanical characteristics to the biomaterial. In this review we have summarised the major techniques for testing these characteristics and the mechanisms involved in the variety of cross-linking reactions to achieve particular properties..


2012 ◽  
Vol 90 (3) ◽  
pp. 362-377 ◽  
Author(s):  
Evan F. Haney ◽  
Kamran Nazmi ◽  
Jan G.M. Bolscher ◽  
Hans J. Vogel

Lactoferrin is an 80 kDa iron binding protein found in the secretory fluids of mammals and it plays a major role in host defence. An antimicrobial peptide, lactoferrampin, was identified through sequence analysis of bovine lactoferrin and its antimicrobial activity against a wide range of bacteria and yeast species is well documented. In the present work, the contribution of specific amino acid residues of lactoferrampin was examined to evaluate the role that they play in membrane binding and bilayer disruption. The structures of all the bovine lactoferrampin derivatives were examined with circular dichroism and nuclear magnetic resonance spectroscopy, and their interactions with phospholipids were evaluated with differential scanning calorimetry and isothermal titration calorimetry techniques. From our results it is apparent that the amphipathic N-terminal helix anchors the peptide to membranes with Trp 268 and Phe 278 playing important roles in determining the strength of the interaction and for inducing peptide folding. In addition, the N-terminal helix capping residues (DLI) increase the affinity for negatively charged vesicles and they mediate the depth of membrane insertion. Finally, the unique flexibility in the cationic C-terminal region of bovine lactoferrampin does not appear to be essential for the antimicrobial activity of the peptide.


2019 ◽  
Vol 63 (2) ◽  
pp. 267-279
Author(s):  
Huipeng Yang ◽  
Jie Wu

AbstractAn increasingly amount of evidence supports that the evolution of eusociality is accompanies by shifts in ancient molecular and physiological pathways. The juvenile hormone, one of the most important hormones in the post-embryonic development of insects, attracts the most attention in the context of social organization. Allatoregulatory neuropeptides (Allatotropin, Allatostatin-A and Allatostatin-C) are known to regulate juvenile hormone synthesis and release in insects. In order to clarify the transitions of juvenile hormone synthesis involved in eusocial evolution, the substitutions of amino acid residues and the complexity of post-translational modifications in allatoregulatory neuropeptide receptors were characterized. Both allatotropin and allatostatin receptors are identified in all examined bee species regardless if they are solitary or eusocial. Although the amino acid sequences are highly conserved, phylogenetic results are consistent with the eusocial status. The abundance of predicted post-translational modifications correlates with social complexity except for that in allatostatin-C receptors. Even though the consequences of these specific amino acid substitutions and various post-translational modification complexity have not been studied, they likely contribute to the localizing, binding and coupling characteristics of the receptor groups.


2021 ◽  
Vol 14 (12) ◽  
pp. 612
Author(s):  
Jianan Zhu ◽  
Yang Feng

We propose a new ensemble classification algorithm, named super random subspace ensemble (Super RaSE), to tackle the sparse classification problem. The proposed algorithm is motivated by the random subspace ensemble algorithm (RaSE). The RaSE method was shown to be a flexible framework that can be coupled with any existing base classification. However, the success of RaSE largely depends on the proper choice of the base classifier, which is unfortunately unknown to us. In this work, we show that Super RaSE avoids the need to choose a base classifier by randomly sampling a collection of classifiers together with the subspace. As a result, Super RaSE is more flexible and robust than RaSE. In addition to the vanilla Super RaSE, we also develop the iterative Super RaSE, which adaptively changes the base classifier distribution as well as the subspace distribution. We show that the Super RaSE algorithm and its iterative version perform competitively for a wide range of simulated data sets and two real data examples. The new Super RaSE algorithm and its iterative version are implemented in a new version of the R package RaSEn.


2017 ◽  
Author(s):  
Florian Rohart ◽  
Benoît Gautier ◽  
Amrit Singh ◽  
Kim-Anh Lê Cao

AbstractThe advent of high throughput technologies has led to a wealth of publicly available ‘omics data coming from different sources, such as transcriptomics, proteomics, metabolomics. Combining such large-scale biological data sets can lead to the discovery of important biological insights, provided that relevant information can be extracted in a holistic manner. Current statistical approaches have been focusing on identifying small subsets of molecules (a ‘molecular signature’) to explain or predict biological conditions, but mainly for a single type of ‘omics. In addition, commonly used methods are univariate and consider each biological feature independently.We introducemixOmics, an R package dedicated to the multivariate analysis of biological data sets with a specific focus on data exploration, dimension reduction and visualisation. By adopting a system biology approach, the toolkit provides a wide range of methods that statistically integrate several data sets at once to probe relationships between heterogeneous ‘omics data sets. Our recent methods extend Projection to Latent Structure (PLS) models for discriminant analysis, for data integration across multiple ‘omics data or across independent studies, and for the identification of molecular signatures. We illustrate our latestmixOmicsintegrative frameworks for the multivariate analyses of ‘omics data available from the package.


2020 ◽  
Vol 19 (03) ◽  
pp. 2040003 ◽  
Author(s):  
Jonathon E. Mohl ◽  
Thomas Gerken ◽  
Ming-Ying Leung

Mucin-type O-glycosylation is one of the most common post-translational modifications of proteins. This glycosylation is initiated in the Golgi by the addition of the sugar N-acetylgalactosamine (GalNAc) onto protein Ser and Thr residues by a family of polypeptide GalNAc transferases. In humans, there are 20 isoforms that are differentially expressed across tissues that serve multiple important biological roles. Using random peptide substrates, isoform specific amino acid preferences have been obtained in the form of enhancement values (EV). These EVs alone have previously been used to predict O-glycosylation sites via the web based ISOGlyP (Isoform Specific O-Glycosylation Prediction) tool. Here, we explore additional protein features to determine whether these can complement the random peptide derived enhancement values and increase the predictive power of ISOGlyP. The inclusion of additional protein substrate features (such as secondary structure and surface accessibility) was found to increase sensitivity with minimal loss of specificity, when tested with three different published in vivo O-glycoproteomics data sets, thus increasing the overall accuracy of the ISOGlyP predictions.


Cancers ◽  
2021 ◽  
Vol 13 (17) ◽  
pp. 4247
Author(s):  
Alexandra De Zutter ◽  
Jo Van Damme ◽  
Sofie Struyf

Chemokines are a large family of small chemotactic cytokines that fulfill a central function in cancer. Both tumor-promoting and -impeding roles have been ascribed to chemokines, which they exert in a direct or indirect manner. An important post-translational modification that regulates chemokine activity is the NH2-terminal truncation by peptidases. CD26 is a dipeptidyl peptidase (DPPIV), which typically clips a NH2-terminal dipeptide from the chemokine. With a certain degree of selectivity in terms of chemokine substrate, CD26 only recognizes chemokines with a penultimate proline or alanine. Chemokines can be protected against CD26 recognition by specific amino acid residues within the chemokine structure, by oligomerization or by binding to cellular glycosaminoglycans (GAGs). Upon truncation, the binding affinity for receptors and GAGs is altered, which influences chemokine function. The consequences of CD26-mediated clipping vary, as unchanged, enhanced, and reduced activities are reported. In tumors, CD26 most likely has the most profound effect on CXCL12 and the interferon (IFN)-inducible CXCR3 ligands, which are converted into receptor antagonists upon truncation. Depending on the tumor type, expression of CD26 is upregulated or downregulated and often results in the preferential generation of the chemokine isoform most favorable for tumor progression. Considering the tight relationship between chemokine sequence and chemokine binding specificity, molecules with the appropriate characteristics can be chemically engineered to provide innovative therapeutic strategies in a cancer setting.


2020 ◽  
Vol 64 (1) ◽  
pp. 97-110
Author(s):  
Christian Sibbersen ◽  
Mogens Johannsen

Abstract In living systems, nucleophilic amino acid residues are prone to non-enzymatic post-translational modification by electrophiles. α-Dicarbonyl compounds are a special type of electrophiles that can react irreversibly with lysine, arginine, and cysteine residues via complex mechanisms to form post-translational modifications known as advanced glycation end-products (AGEs). Glyoxal, methylglyoxal, and 3-deoxyglucosone are the major endogenous dicarbonyls, with methylglyoxal being the most well-studied. There are several routes that lead to the formation of dicarbonyl compounds, most originating from glucose and glucose metabolism, such as the non-enzymatic decomposition of glycolytic intermediates and fructosyl amines. Although dicarbonyls are removed continuously mainly via the glyoxalase system, several conditions lead to an increase in dicarbonyl concentration and thereby AGE formation. AGEs have been implicated in diabetes and aging-related diseases, and for this reason the elucidation of their structure as well as protein targets is of great interest. Though the dicarbonyls and reactive protein side chains are of relatively simple nature, the structures of the adducts as well as their mechanism of formation are not that trivial. Furthermore, detection of sites of modification can be demanding and current best practices rely on either direct mass spectrometry or various methods of enrichment based on antibodies or click chemistry followed by mass spectrometry. Future research into the structure of these adducts and protein targets of dicarbonyl compounds may improve the understanding of how the mechanisms of diabetes and aging-related physiological damage occur.


Author(s):  
Parag A Pathade ◽  
Vinod A Bairagi ◽  
Yogesh S. Ahire ◽  
Neela M Bhatia

‘‘Proteomics’’, is the emerging technology leading to high-throughput identification and understanding of proteins. Proteomics is the protein equivalent of genomics and has captured the imagination of biomolecular scientists, worldwide. Because proteome reveals more accurately the dynamic state of a cell, tissue, or organism, much is expected from proteomics to indicate better disease markers for diagnosis and therapy monitoring. Proteomics is expected to play a major role in biomedical research, and it will have a significant impact on the development of diagnostics and therapeutics for cancer, heart ailments and infectious diseases, in future. Proteomics research leads to the identification of new protein markers for diagnostic purposes and novel molecular targets for drug discovery.  Though the potential is great, many challenges and issues remain to be solved, such as gene expression, peptides, generation of low abundant proteins, analytical tools, drug target discovery and cost. A systematic and efficient analysis of vast genomic and proteomic data sets is a major challenge for researchers, today. Nevertheless, proteomics is the groundwork for constructing and extracting useful comprehension to biomedical research. This review article covers some opportunities and challenges offered by proteomics.   


Sign in / Sign up

Export Citation Format

Share Document