Computer-aided prediction of antigen presenting cell modulators for designing peptide-based vaccine adjuvants

Mapping Intimacies ◽

10.1101/232025 ◽

2017 ◽

Author(s):

Gandharva Nagpal ◽

Kumardeep Chaudhary ◽

Piyush Agrawal ◽

Gajendra P.S. Raghava

Keyword(s):

Training Dataset ◽

Support Vector ◽

T Cell Epitopes ◽

Vaccine Adjuvants ◽

Web Based ◽

Link Type ◽

Motif Occurrence ◽

A Cell ◽

Immunomodulatory Peptides ◽

Antigen Presenting

ABSTRACTBackgroundEvidences in literature strongly advocate the potential of immunomodulatory peptides for use as vaccine adjuvants. All the mechanisms of vaccine adjuvants ensuing immunostimulatory effects directly or indirectly stimulate Antigen Presenting Cells (APCs). While numerous methods have been developed in the past for predicting B-cell and T-cell epitopes; no method is available for predicting the peptides that can modulate the APCs.MethodsWe named the peptides that can activate APCs as A-cell epitopes and developed methods for their prediction in this study. A dataset of experimentally validated A-cell epitopes was collected and compiled from various resources. To predict A-cell epitopes, we developed Support Vector Machine-based machine learning models using different sequence-based features.ResultsA hybrid model developed on a combination of sequence-based features (dipeptide composition and motif occurrence), achieved the highest accuracy of 96.91% with Matthews Correlation Coefficient (MCC) value of 0.94 on the training dataset. We also evaluated the hybrid models on an independent dataset and achieved a comparable accuracy of 94.93% with MCC 0.90.ConclusionThe models developed in this study were implemented in a web-based platform VaxinPAD to predict and design immunomodulatory peptides or A-cell epitopes. This web server available at http://webs.iiitd.edu.in/raghava/vaxinpad/ and http://crdd.osdd.net/raghava/vaxinpad/ will facilitate researchers in designing peptide-based vaccine adjuvants.

Download Full-text

CrosstalkNet: mining large-scale bipartite co-expression networks to characterize epi-stroma crosstalk

10.1101/102848 ◽

2017 ◽

Author(s):

Venkata Manem ◽

George Adam ◽

Tina Gruosso ◽

Mathieu Gigoux ◽

Nicholas Bertos ◽

...

Keyword(s):

Stromal Cells ◽

Large Scale ◽

Network Visualization ◽

Visualization Tool ◽

Biological Processes ◽

Web Based ◽

Link Type ◽

A Cell ◽

Large Scale Networks ◽

User Friendly

ABSTRACTBackground:Over the last several years, we have witnessed the metamorphosis of network biology from being a mere representation of molecular interactions to models enabling inference of complex biological processes. Networks provide promising tools to elucidate intercellular interactions that contribute to the functioning of key biological pathways in a cell. However, the exploration of these large-scale networks remains a challenge due to their high-dimensionality.Results:CrosstalkNet is a user friendly, web-based network visualization tool to retrieve and mine interactions in large-scale bipartite co-expression networks. In this study, we discuss the use of gene co-expression networks to explore the rewiring of interactions between tumor epithelial and stromal cells. We show how CrosstalkNet can be used to efficiently visualize, mine, and interpret large co-expression networks representing the crosstalk occurring between the tumour and its microenvironment.Conclusion:CrosstalkNet serves as a tool to assist biologists and clinicians in exploring complex, large interaction graphs to obtain insights into the biological processes that govern the tumor epithelial-stromal crosstalk. A comprehensive tutorial along with case studies are provided with the application.Availability:The web-based application is available at the following location: http://epistroma.pmgenomics.ca/app/. The code is open-source and freely available from http://github.com/bhklab/EpiStroma-webapp.Contact:[email protected]

Download Full-text

A cell-based microarray to investigate combinatorial effects of microparticle-encapsulated adjuvants on dendritic cell activation

Journal of Materials Chemistry B ◽

10.1039/c5tb01754h ◽

2016 ◽

Vol 4 (9) ◽

pp. 1672-1685 ◽

Cited By ~ 9

Author(s):

Abhinav P. Acharya ◽

Matthew R. Carstens ◽

Jamal S. Lewis ◽

Natalia Dolgova ◽

C. Q. Xia ◽

...

Keyword(s):

Dendritic Cells ◽

Dendritic Cell ◽

Cell Activation ◽

Antigen Presenting Cells ◽

Vaccine Adjuvants ◽

Toll Like Receptors ◽

Dendritic Cell Activation ◽

A Cell ◽

Antigen Presenting

Experimental vaccine adjuvants are being designed to target specific toll-like receptors (TLRs) alone or in combination, expressed by antigen presenting cells, notably dendritic cells (DCs).

Download Full-text

ABC-Gly: identifying protein lysine glycation sites with artificial bee colony algorithm

Current Proteomics ◽

10.2174/1570164617666191227120136 ◽

2019 ◽

Vol 17 ◽

Author(s):

Yanqiu Yao ◽

Xiaosa Zhao ◽

Qiao Ning ◽

Junping Zhou

Keyword(s):

Support Vector Machine ◽

Amino Acid ◽

Artificial Bee Colony Algorithm ◽

Artificial Bee Colony ◽

Training Dataset ◽

Support Vector ◽

Supplementary File ◽

Feature Subset ◽

Lipid Molecule ◽

Bee Colony

Background: Glycation is a nonenzymatic post-translational modification process by attaching a sugar molecule to a protein or lipid molecule. It may impair the function and change the characteristic of the proteins which may lead to some metabolic diseases. In order to understand the underlying molecular mechanisms of glycation, computational prediction methods have been developed because of their convenience and high speed. However, a more effective computational tool is still a challenging task in computational biology. Methods: In this study, we showed an accurate identification tool named ABC-Gly for predicting lysine glycation sites. At first, we utilized three informative features, including position-specific amino acid propensity, secondary structure and the composition of k-spaced amino acid pairs to encode the peptides. Moreover, to sufficiently exploit discriminative features thus can improve the prediction and generalization ability of the model, we developed a two-step feature selection, which combined the Fisher score and an improved binary artificial bee colony algorithm based on support vector machine. Finally, based on the optimal feature subset, we constructed the effective model by using Support Vector Machine on the training dataset. Results: The performance of the proposed predictor ABC-Gly was measured with the sensitivity of 76.43%, the specificity of 91.10%, the balanced accuracy of 83.76%, the area under the receiver-operating characteristic curve (AUC) of 0.9313, a Matthew’s Correlation Coefficient (MCC) of 0.6861 by 10-fold cross-validation on training dataset, and a balanced accuracy of 59.05% on independent dataset. Compared to the state-of-the-art predictors on the training dataset, the proposed predictor achieved significant improvement in the AUC of 0.156 and MCC of 0.336. Conclusion: The detailed analysis results indicated that our predictor may serve as a powerful complementary tool to other existing methods for predicting protein lysine glycation. The source code and datasets of the ABC-Gly were provided in the Supplementary File 1.

Download Full-text

Prediction of unconventional protein secretion by exosomes

BMC Bioinformatics ◽

10.1186/s12859-021-04219-z ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Alvaro Ras-Carmona ◽

Marta Gomez-Perosanz ◽

Pedro A. Reche

Keyword(s):

Protein Secretion ◽

Random Forests ◽

Signal Peptide ◽

Area Under The Curve ◽

Dipeptide Composition ◽

Web Based ◽

Independent Dataset ◽

Link Type ◽

Tenfold Cross Validation ◽

Dependent Pathway

Abstract Motivation In eukaryotes, proteins targeted for secretion contain a signal peptide, which allows them to proceed through the conventional ER/Golgi-dependent pathway. However, an important number of proteins lacking a signal peptide can be secreted through unconventional routes, including that mediated by exosomes. Currently, no method is available to predict protein secretion via exosomes. Results Here, we first assembled a dataset including the sequences of 2992 proteins secreted by exosomes and 2961 proteins that are not secreted by exosomes. Subsequently, we trained different random forests models on feature vectors derived from the sequences in this dataset. In tenfold cross-validation, the best model was trained on dipeptide composition, reaching an accuracy of 69.88% ± 2.08 and an area under the curve (AUC) of 0.76 ± 0.03. In an independent dataset, this model reached an accuracy of 75.73% and an AUC of 0.840. After these results, we developed ExoPred, a web-based tool that uses random forests to predict protein secretion by exosomes. Conclusion ExoPred is available for free public use at http://imath.med.ucm.es/exopred/. Datasets are available at http://imath.med.ucm.es/exopred/datasets/.

Download Full-text

A new hybrid record linkage process to make epidemiological databases interoperable: application to the GEMO and GENEPSO studies involving BRCA1 and BRCA2 mutation carriers

BMC Medical Research Methodology ◽

10.1186/s12874-021-01299-6 ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Yue Jiao ◽

Fabienne Lesueur ◽

Chloé-Agathe Azencott ◽

Maïté Laurent ◽

Noura Mebirouk ◽

...

Keyword(s):

Record Linkage ◽

Gold Standard ◽

Brca2 Mutation ◽

Epidemiological Studies ◽

Supervised Machine Learning ◽

Training Dataset ◽

Support Vector ◽

Genetic Modifiers ◽

Brca1 And Brca2 ◽

Mutation Carriers

Abstract Background Linking independent sources of data describing the same individuals enable innovative epidemiological and health studies but require a robust record linkage approach. We describe a hybrid record linkage process to link databases from two independent ongoing French national studies, GEMO (Genetic Modifiers of BRCA1 and BRCA2), which focuses on the identification of genetic factors modifying cancer risk of BRCA1 and BRCA2 mutation carriers, and GENEPSO (prospective cohort of BRCAx mutation carriers), which focuses on environmental and lifestyle risk factors. Methods To identify as many as possible of the individuals participating in the two studies but not registered by a shared identifier, we combined probabilistic record linkage (PRL) and supervised machine learning (ML). This approach (named “PRL + ML”) combined together the candidate matches identified by both approaches. We built the ML model using the gold standard on a first version of the two databases as a training dataset. This gold standard was obtained from PRL-derived matches verified by an exhaustive manual review. Results The Random Forest (RF) algorithm showed a highest recall (0.985) among six widely used ML algorithms: RF, Bagged trees, AdaBoost, Support Vector Machine, Neural Network. Therefore, RF was selected to build the ML model since our goal was to identify the maximum number of true matches. Our combined linkage PRL + ML showed a higher recall (range 0.988–0.992) than either PRL (range 0.916–0.991) or ML (0.981) alone. It identified 1995 individuals participating in both GEMO (6375 participants) and GENEPSO (4925 participants). Conclusions Our hybrid linkage process represents an efficient tool for linking GEMO and GENEPSO. It may be generalizable to other epidemiological studies involving other databases and registries.

Download Full-text

Proposing a machine-learning based method to predict stillbirth before and during delivery and ranking the features: nationwide retrospective cross-sectional study

BMC Pregnancy and Childbirth ◽

10.1186/s12884-021-03658-z ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Toktam Khatibi ◽

Elham Hanifi ◽

Mohammad Mehdi Sepehri ◽

Leila Allahqoli

Keyword(s):

Machine Learning ◽

External Validation ◽

Fetal Loss ◽

Null Distribution ◽

Training Dataset ◽

Gradient Boosting ◽

Support Vector ◽

Cross Sectional ◽

Boosting Method ◽

Demographic Features

Abstract Background Stillbirth is defined as fetal loss in pregnancy beyond 28 weeks by WHO. In this study, a machine-learning based method is proposed to predict stillbirth from livebirth and discriminate stillbirth before and during delivery and rank the features. Method A two-step stack ensemble classifier is proposed for classifying the instances into stillbirth and livebirth at the first step and then, classifying stillbirth before delivery from stillbirth during the labor at the second step. The proposed SE has two consecutive layers including the same classifiers. The base classifiers in each layer are decision tree, Gradient boosting classifier, logistics regression, random forest and support vector machines which are trained independently and aggregated based on Vote boosting method. Moreover, a new feature ranking method is proposed in this study based on mean decrease accuracy, Gini Index and model coefficients to find high-ranked features. Results IMAN registry dataset is used in this study considering all births at or beyond 28th gestational week from 2016/04/01 to 2017/01/01 including 1,415,623 live birth and 5502 stillbirth cases. A combination of maternal demographic features, clinical history, fetal properties, delivery descriptors, environmental features, healthcare service provider descriptors and socio-demographic features are considered. The experimental results show that our proposed SE outperforms the compared classifiers with the average accuracy of 90%, sensitivity of 91%, specificity of 88%. The discrimination of the proposed SE is assessed and the average AUC of ±95%, CI of 90.51% ±1.08 and 90% ±1.12 is obtained on training dataset for model development and test dataset for external validation, respectively. The proposed SE is calibrated using isotopic nonparametric calibration method with the score of 0.07. The process is repeated 10,000 times and AUC of SE classifiers using random different training datasets as null distribution. The obtained p-value to assess the specificity of the proposed SE is 0.0126 which shows the significance of the proposed SE. Conclusions Gestational age and fetal height are two most important features for discriminating livebirth from stillbirth. Moreover, hospital, province, delivery main cause, perinatal abnormality, miscarriage number and maternal age are the most important features for classifying stillbirth before and during delivery.

Download Full-text

Effectiveness of a psychological online training to promote physical activity among students: protocol of a randomized-controlled trial

Trials ◽

10.1186/s13063-021-05333-2 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Lena Violetta Krämer ◽

Nadine Eschrig ◽

Lena Keinhorst ◽

Luisa Schöchlin ◽

Lisa Stephan ◽

...

Keyword(s):

Physical Activity ◽

Randomized Controlled Trial ◽

Mixed Model ◽

Controlled Trial ◽

Control Group ◽

Web Based ◽

Intention To Treat ◽

Randomized Controlled ◽

Link Type

Abstract Background Many students in Germany do not meet recommended amounts of physical activity. In order to promote physical activity in students, web-based interventions are increasingly implemented. Yet, data on effectiveness of web-based interventions in university students is low. Our study aims at investigating a web-based intervention for students. The intervention is based on the Health Action Process Approach (HAPA), which discriminates between processes of intention formation (motivational processes) and processes of intention implementation (volitional processes). Primary outcome is change in physical activity; secondary outcomes are motivational and volitional variables as proposed by the HAPA as well as quality of life and depressive symptoms. Methods A two-armed randomized controlled trial (RCT) of parallel design is conducted. Participants are recruited via the internet platform StudiCare (www.studicare.com). After the baseline assessment (t1), participants are randomized to either intervention group (immediate access to web-based intervention) or control group (access only after follow-up assessment). Four weeks later, post-assessment (t2) is performed in both groups followed by a follow-up assessment (t3) 3 months later. Assessments take place online. Main outcome analyses will follow an intention-to-treat principle by including all randomized participants into the analyses. Outcomes will be analysed using a linear mixed model, assuming data are missing at random. The mixed model will include group, time, and the interaction of group and time as fixed effects and participant and university as random effect. Discussion This study is a high-quality RCT with three assessment points and intention-to-treat analysis meeting the state-of-the-art of effectiveness studies. Recruitment covers almost 20 universities in three countries, leading to high external validity. The results of this study will be of great relevance for student health campaigns, as they reflect the effectiveness of self-help interventions for young adults with regard to behaviour change as well as motivational and volitional determinants. From a lifespan perspective, it is important to help students find their way into regular physical activity. Trial registration The German clinical trials register (DRKS) DRKS00016889. Registered on 28 February 2019

Download Full-text

A randomized wait-list controlled trial of a social support intervention for caregivers of patients with primary malignant brain tumor

BMC Health Services Research ◽

10.1186/s12913-021-06372-w ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Maija Reblin ◽

Dana Ketcher ◽

Rachael McCormick ◽

Veronica Barrios-Monroy ◽

Steven K. Sutton ◽

...

Keyword(s):

Social Support ◽

Social Network ◽

Controlled Trial ◽

Well Being ◽

Wait List ◽

Web Based ◽

Caregiver Support ◽

Support Intervention ◽

Link Type ◽

Prospective Longitudinal

Abstract Background Informal family caregivers constitute an important and increasingly demanding role in the cancer healthcare system. This is especially true for caregivers of patients with primary malignant brain tumors based on the rapid progression of disease, including physical and cognitive debilitation. Informal social network resources such as friends and family can provide social support to caregivers, which lowers caregiver burden and improves overall quality of life. However, barriers to obtaining needed social support exist for caregivers. To address this need, our team developed and is assessing a multi-component caregiver support intervention that uses a blend of technology and personal contact to improve caregiver social support. Methods We are currently conducting a prospective, longitudinal 2-group randomized controlled trial which compares caregivers who receive the intervention to a wait-list control group. Only caregivers directly receive the intervention, but the patient-caregiver dyads are enrolled so we can assess outcomes in both. The 8-week intervention consists of two components: (1) The electronic Social Network Assessment Program, a web-based tool to visualize existing social support resources and provide a tailored list of additional resources; and (2) Caregiver Navigation, including weekly phone sessions with a Caregiver Navigator to address caregiver social support needs. Outcomes are assessed by questionnaires completed by the caregiver (baseline, 4-week, 8-week) and the cancer patient (baseline, and 8-week). At 8 weeks, caregivers in the wait-list condition may opt into the intervention. Our primary outcome is caregiver well-being; we also explore patient well-being and caregiver and patient health care utilization. Discussion This protocol describes a study testing a novel social support intervention that pairs a web-based social network visualization tool and resource list (eSNAP) with personalized caregiver navigation. This intervention is responsive to a family-centered model of care and calls for clinical and research priorities focused on informal caregiving research. Trial registration clinicaltrials.gov, Registration number: NCT04268979; Date of registration: February 10, 2020, retrospectively registered.

Download Full-text

Opinion Mining Using Support Vector Machine with Web Based Diverse Data

Lecture Notes in Computer Science - Pattern Recognition and Machine Intelligence ◽

10.1007/978-3-319-69900-4_85 ◽

2017 ◽

pp. 673-678 ◽

Cited By ~ 3

Author(s):

Mir Shahriar Sabuj ◽

Zakia Afrin ◽

K. M. Azharul Hasan

Keyword(s):

Support Vector Machine ◽

Opinion Mining ◽

Support Vector ◽

Web Based ◽

Diverse Data

Download Full-text

KDClassifier: Urinary Proteomic Spectra Analysis Based on Machine Learning for Classification of Kidney Diseases

10.1101/2020.12.01.20242198 ◽

2020 ◽

Author(s):

Wanjun Zhao ◽

Yong Zhang ◽

Xinming Li ◽

Yonghong Mao ◽

Changwei Wu ◽

...

Keyword(s):

Machine Learning ◽

Mass Spectrum ◽

Kidney Disease ◽

Kidney Diseases ◽

Training Dataset ◽

Validation Dataset ◽

Support Vector ◽

Urinary Proteomics ◽

Diagnosis Model

AbstractBackgroundBy extracting the spectrum features from urinary proteomics based on an advanced mass spectrometer and machine learning algorithms, more accurate reporting results can be achieved for disease classification. We attempted to establish a novel diagnosis model of kidney diseases by combining machine learning with an extreme gradient boosting (XGBoost) algorithm with complete mass spectrum information from the urinary proteomics.MethodsWe enrolled 134 patients (including those with IgA nephropathy, membranous nephropathy, and diabetic kidney disease) and 68 healthy participants as a control, and for training and validation of the diagnostic model, applied a total of 610,102 mass spectra from their urinary proteomics produced using high-resolution mass spectrometry. We divided the mass spectrum data into a training dataset (80%) and a validation dataset (20%). The training dataset was directly used to create a diagnosis model using XGBoost, random forest (RF), a support vector machine (SVM), and artificial neural networks (ANNs). The diagnostic accuracy was evaluated using a confusion matrix. We also constructed the receiver operating-characteristic, Lorenz, and gain curves to evaluate the diagnosis model.ResultsCompared with RF, the SVM, and ANNs, the modified XGBoost model, called a Kidney Disease Classifier (KDClassifier), showed the best performance. The accuracy of the diagnostic XGBoost model was 96.03% (CI = 95.17%-96.77%; Kapa = 0.943; McNemar’s Test, P value = 0.00027). The area under the curve of the XGBoost model was 0.952 (CI = 0.9307-0.9733). The Kolmogorov-Smirnov (KS) value of the Lorenz curve was 0.8514. The Lorenz and gain curves showed the strong robustness of the developed model.ConclusionsThis study presents the first XGBoost diagnosis model, i.e., the KDClassifier, combined with complete mass spectrum information from the urinary proteomics for distinguishing different kidney diseases. KDClassifier achieves a high accuracy and robustness, providing a potential tool for the classification of all types of kidney diseases.

Download Full-text