Structural and sequence features of two residue turns in beta-hairpins

Background: Epigenetic repression mechanisms play an important role in gene regulation, specifically in cancer development. In many cases, a CpG island’s (CGI) susceptibility or resistance to methylation are shown to be contributed by local DNA sequence features. Objective: To develop unbiased machine learning models–individually and combined for different biological features–that predict the methylation propensity of a CGI. Methods: We developed our model consisting of CGI sequence features on a dataset of 75 sequences (28 prone, 47 resistant) representing a genome-wide methylation structure. We tested our model on two independent datasets that are chromosome (132 sequences) and disease (70 sequences) specific. Results: We provided improvements in prediction accuracy over previous models. Our results indicate that combined features better predict the methylation propensity of a CGI (area under the curve (AUC) ~0.81). Our global methylation classifier performs well on independent datasets reaching an AUC of ~0.82 for the complete model and an AUC of ~0.88 for the model using select sequences that better represent their classes in the training set. We report certain de novo motifs and transcription factor binding site (TFBS) motifs that are consistently better in separating prone and resistant CGIs. Conclusion: Predictive models for the methylation propensity of CGIs lead to a better understanding of disease mechanisms and can be used to classify genes based on their tendency to contain methylation prone CGIs, which may lead to preventative treatment strategies. MATLAB and Python™ scripts used for model building, prediction, and downstream analyses are available at https://github.com/dicleyalcin/methylProp_predictor.

Download Full-text

PredNTS: Improved and Robust Prediction of Nitrotyrosine Sites by Integrating Multiple Sequence Features

International Journal of Molecular Sciences ◽

10.3390/ijms22052704 ◽

2021 ◽

Vol 22 (5) ◽

pp. 2704

Author(s):

Andi Nur Nilamyani ◽

Firda Nurul Auliah ◽

Mohammad Ali Moni ◽

Watshara Shoombuatong ◽

Md Mehedi Hasan ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Web Application ◽

Computational Prediction ◽

Vital Role ◽

Machine Learning Algorithms ◽

Recursive Feature Elimination ◽

Post Translational Modification ◽

Multiple Sequence ◽

Sequence Features

Nitrotyrosine, which is generated by numerous reactive nitrogen species, is a type of protein post-translational modification. Identification of site-specific nitration modification on tyrosine is a prerequisite to understanding the molecular function of nitrated proteins. Thanks to the progress of machine learning, computational prediction can play a vital role before the biological experimentation. Herein, we developed a computational predictor PredNTS by integrating multiple sequence features including K-mer, composition of k-spaced amino acid pairs (CKSAAP), AAindex, and binary encoding schemes. The important features were selected by the recursive feature elimination approach using a random forest classifier. Finally, we linearly combined the successive random forest (RF) probability scores generated by the different, single encoding-employing RF models. The resultant PredNTS predictor achieved an area under a curve (AUC) of 0.910 using five-fold cross validation. It outperformed the existing predictors on a comprehensive and independent dataset. Furthermore, we investigated several machine learning algorithms to demonstrate the superiority of the employed RF algorithm. The PredNTS is a useful computational resource for the prediction of nitrotyrosine sites. The web-application with the curated datasets of the PredNTS is publicly available.

Download Full-text

Multi-function Prediction of Unknown Protein Sequences Using Multilabel Classifiers and Augmented Sequence Features

Iranian Journal of Science and Technology Transactions A Science ◽

10.1007/s40995-021-01134-z ◽

2021 ◽

Author(s):

Saurabh Agrawal ◽

Dilip Singh Sisodia ◽

Naresh Kumar Nagwani

Keyword(s):

Protein Sequences ◽

Function Prediction ◽

Sequence Features ◽

Unknown Protein

Download Full-text

Sequence features, structure, ligand interaction, and diseases in small leucine rich repeat proteoglycans

Journal of Cell Communication and Signaling ◽

10.1007/s12079-021-00616-4 ◽

2021 ◽

Author(s):

Norio Matsushima ◽

Hiroki Miyashita ◽

Robert H. Kretsinger

Keyword(s):

Leucine Rich Repeat ◽

Ligand Interaction ◽

Sequence Features

Download Full-text

Correlation of mRNA Expression and Protein Abundance Affected by Multiple Sequence Features Related to Translational Efficiency inDesulfovibrio vulgaris: A Quantitative Analysis

Genetics ◽

10.1534/genetics.106.065862 ◽

2006 ◽

Vol 174 (4) ◽

pp. 2229-2243 ◽

Cited By ~ 133

Author(s):

Lei Nie ◽

Gang Wu ◽

Weiwen Zhang

Keyword(s):

Quantitative Analysis ◽

Mrna Expression ◽

Translational Efficiency ◽

Protein Abundance ◽

Multiple Sequence ◽

Sequence Features

Download Full-text

Construction of a Contiguous 874-kb Sequence of the Escherichia coli-K12 Genome Corresponding to 50.0-68.8 min on the Linkage Map and Analysis of Its Sequence Features

DNA Research ◽

10.1093/dnares/4.2.91 ◽

1997 ◽

Vol 4 (2) ◽

pp. 91-95 ◽

Cited By ~ 34

Author(s):

Y. Yamamoto

Keyword(s):

Escherichia Coli ◽

Linkage Map ◽

Its Sequence ◽

Sequence Features ◽

Escherichia Coli K12

Download Full-text

Sequence features of monoclonal antiphospholipid antibodies: Comment on the article by Ikematsu et al

Arthritis & Rheumatism ◽

10.1002/1529-0131(199908)42:8<1783::aid-anr37>3.0.co;2-t ◽

1999 ◽

Vol 42 (8) ◽

pp. 1783-1784

Author(s):

Anisur Rahman ◽

David Latchman ◽

David Isenberg

Keyword(s):

Antiphospholipid Antibodies ◽

Sequence Features

Download Full-text

Extracting DNA words based on the sequence features: non-uniform distribution and integrity

Theoretical Biology and Medical Modelling ◽

10.1186/s12976-016-0028-3 ◽

2016 ◽

Vol 13 (1) ◽

Cited By ~ 2

Author(s):

Zhi Li ◽

Hongyan Cao ◽

Yuehua Cui ◽

Yanbo Zhang

Keyword(s):

Uniform Distribution ◽

Sequence Features

Download Full-text

Sequence Analysis of the Genome of the Unicellular Cyanobacterium Synechocystis sp. strain PCC6803. I. Sequence Features in the 1 Mb Region from Map Positions 64% to 92% of the Genome (Supplement)

DNA Research ◽

10.1093/dnares/2.4.191 ◽

1995 ◽

Vol 2 (4) ◽

pp. 191-198 ◽

Cited By ~ 14

Author(s):

T. Kaneko

Keyword(s):

Sequence Analysis ◽

Unicellular Cyanobacterium ◽

Sequence Features ◽

Synechocystis Sp

Download Full-text