Profile-guided receiver class prediction

Author(s):  
David Grove ◽  
Jeffrey Dean ◽  
Charles Garrett ◽  
Craig Chambers
Keyword(s):  
2019 ◽  
Vol 16 (4) ◽  
pp. 317-324
Author(s):  
Liang Kong ◽  
Lichao Zhang ◽  
Xiaodong Han ◽  
Jinfeng Lv

Protein structural class prediction is beneficial to protein structure and function analysis. Exploring good feature representation is a key step for this prediction task. Prior works have demonstrated the effectiveness of the secondary structure based feature extraction methods especially for lowsimilarity protein sequences. However, the prediction accuracies still remain limited. To explore the potential of secondary structure information, a novel feature extraction method based on a generalized chaos game representation of predicted secondary structure is proposed. Each protein sequence is converted into a 20-dimensional distance-related statistical feature vector to characterize the distribution of secondary structure elements and segments. The feature vectors are then fed into a support vector machine classifier to predict the protein structural class. Our experiments on three widely used lowsimilarity benchmark datasets (25PDB, 1189 and 640) show that the proposed method achieves superior performance to the state-of-the-art methods. It is anticipated that our method could be extended to other graphical representations of protein sequence and be helpful in future protein research.


2021 ◽  
Vol 13 (1) ◽  
Author(s):  
Janna Hastings ◽  
Martin Glauer ◽  
Adel Memariani ◽  
Fabian Neuhaus ◽  
Till Mossakowski

AbstractChemical data is increasingly openly available in databases such as PubChem, which contains approximately 110 million compound entries as of February 2021. With the availability of data at such scale, the burden has shifted to organisation, analysis and interpretation. Chemical ontologies provide structured classifications of chemical entities that can be used for navigation and filtering of the large chemical space. ChEBI is a prominent example of a chemical ontology, widely used in life science contexts. However, ChEBI is manually maintained and as such cannot easily scale to the full scope of public chemical data. There is a need for tools that are able to automatically classify chemical data into chemical ontologies, which can be framed as a hierarchical multi-class classification problem. In this paper we evaluate machine learning approaches for this task, comparing different learning frameworks including logistic regression, decision trees and long short-term memory artificial neural networks, and different encoding approaches for the chemical structures, including cheminformatics fingerprints and character-based encoding from chemical line notation representations. We find that classical learning approaches such as logistic regression perform well with sets of relatively specific, disjoint chemical classes, while the neural network is able to handle larger sets of overlapping classes but needs more examples per class to learn from, and is not able to make a class prediction for every molecule. Future work will explore hybrid and ensemble approaches, as well as alternative network architectures including neuro-symbolic approaches.


2006 ◽  
Vol 17 (3) ◽  
pp. 337-352 ◽  
Author(s):  
J. J. Chen ◽  
C.-A. Tsai ◽  
H. Moon ◽  
H. Ahn ◽  
J. J. Young ◽  
...  

2014 ◽  
Vol 38 (6) ◽  
pp. 1681-1693 ◽  
Author(s):  
Braz Calderano Filho ◽  
Helena Polivanov ◽  
César da Silva Chagas ◽  
Waldir de Carvalho Júnior ◽  
Emílio Velloso Barroso ◽  
...  

Soil information is needed for managing the agricultural environment. The aim of this study was to apply artificial neural networks (ANNs) for the prediction of soil classes using orbital remote sensing products, terrain attributes derived from a digital elevation model and local geology information as data sources. This approach to digital soil mapping was evaluated in an area with a high degree of lithologic diversity in the Serra do Mar. The neural network simulator used in this study was JavaNNS and the backpropagation learning algorithm. For soil class prediction, different combinations of the selected discriminant variables were tested: elevation, declivity, aspect, curvature, curvature plan, curvature profile, topographic index, solar radiation, LS topographic factor, local geology information, and clay mineral indices, iron oxides and the normalized difference vegetation index (NDVI) derived from an image of a Landsat-7 Enhanced Thematic Mapper Plus (ETM+) sensor. With the tested sets, best results were obtained when all discriminant variables were associated with geological information (overall accuracy 93.2 - 95.6 %, Kappa index 0.924 - 0.951, for set 13). Excluding the variable profile curvature (set 12), overall accuracy ranged from 93.9 to 95.4 % and the Kappa index from 0.932 to 0.948. The maps based on the neural network classifier were consistent and similar to conventional soil maps drawn for the study area, although with more spatial details. The results show the potential of ANNs for soil class prediction in mountainous areas with lithological diversity.


Circulation ◽  
2015 ◽  
Vol 131 (suppl_2) ◽  
Author(s):  
Preeti Jaggi ◽  
Asuncion Mejias ◽  
Adriana Tremoulet ◽  
Jane Burns ◽  
Wei Wang ◽  
...  

Background: The diagnosis of Kawasaki disease (KD) is often difficult to distinguish from HAdV. Objective: 1) To characterize the specific transcriptional profiles of KD patients versus acute HAdV infection 2) To determine whether the molecular distance to health (MDTH) score (a molecular score that reflects the perturbation derived from whole genome transcriptional analysis) correlates with response to therapy. Methods: Whole blood RNA samples collected in Tempus tubes were analyzed using Illumina chips and GeneSpring software 7.4 from 76 pediatric patients with complete KD, 13 with incomplete KD, and 19 patients with HadV, and 20 age- and sex-matched healthy controls (HC). We used class comparison algorithms (Mann-Whitney p< 0.01, Benjamini-Hochberg, and 1.25- fold change filter) and modular analysis to define the KD profiles; class prediction algorithm was used to identify genes that best differentiate KD and HAdV. Results: Statistical group comparisons identified 7,899 genes differentially expressed in 39 complete KD patients versus HC (KD biosignature). This signature was validated in another 37 patients with complete KD and in 13 patients with incomplete KD. Modular analysis in children with complete KD demonstrated overexpression of inflammation, neutrophils, myeloid cell, coagulation cascade, and cell cycle genes. The class prediction algorithm identified 25-classifier genes that differentiated children with KD vs HAdV infection in two independent cohorts of patients with 92% (95% CI [73%-99%]) sensitivity and 90% [67%-98%] specificity. MDTH scores in KD patients significantly correlated with the baseline c-reactive protein (R=0.29, p=0.008) and was four fold higher than in children with HAdV (p<0.01). In addition, KD patients that remained febrile 36 hours after treatment with IVIG (non-responders) demonstrated higher baseline, pre-treatment MDTH values compared with responders [12,290 vs. 5572 respectively; p=0.009]. Conclusion: Transcriptional signatures can be used as a tool to discriminate between KD and HAdV infection, and may also provide prognostic information.


Sign in / Sign up

Export Citation Format

Share Document