The Promise of Class Prediction: How Multivariate Statistics Can Help Determine Botanical Quality and Authenticity

Botanicals ◽  
2015 ◽  
pp. 126-143 ◽  
1973 ◽  
Vol 18 (7) ◽  
pp. 325-326
Author(s):  
STANLEY A. MULAIK

2019 ◽  
Vol 16 (4) ◽  
pp. 317-324
Author(s):  
Liang Kong ◽  
Lichao Zhang ◽  
Xiaodong Han ◽  
Jinfeng Lv

Protein structural class prediction is beneficial to protein structure and function analysis. Exploring good feature representation is a key step for this prediction task. Prior works have demonstrated the effectiveness of the secondary structure based feature extraction methods especially for lowsimilarity protein sequences. However, the prediction accuracies still remain limited. To explore the potential of secondary structure information, a novel feature extraction method based on a generalized chaos game representation of predicted secondary structure is proposed. Each protein sequence is converted into a 20-dimensional distance-related statistical feature vector to characterize the distribution of secondary structure elements and segments. The feature vectors are then fed into a support vector machine classifier to predict the protein structural class. Our experiments on three widely used lowsimilarity benchmark datasets (25PDB, 1189 and 640) show that the proposed method achieves superior performance to the state-of-the-art methods. It is anticipated that our method could be extended to other graphical representations of protein sequence and be helpful in future protein research.


2019 ◽  
Vol 40 (4) ◽  
pp. 526-543 ◽  
Author(s):  
Andrew J. Collins ◽  
Craig A. Jordan ◽  
R. Michael Robinson ◽  
Caitlin Cornelius ◽  
Ross Gore

2021 ◽  
Vol 193 (3) ◽  
Author(s):  
Jéssica Bandeira de Melo Carvalho Passos ◽  
David Bruno de Sousa Teixeira ◽  
Jasmine Alves Campos ◽  
Rafael Petruceli Coelho Lima ◽  
Elpídio Inácio Fernandes-Filho ◽  
...  

2021 ◽  
Vol 13 (1) ◽  
Author(s):  
Janna Hastings ◽  
Martin Glauer ◽  
Adel Memariani ◽  
Fabian Neuhaus ◽  
Till Mossakowski

AbstractChemical data is increasingly openly available in databases such as PubChem, which contains approximately 110 million compound entries as of February 2021. With the availability of data at such scale, the burden has shifted to organisation, analysis and interpretation. Chemical ontologies provide structured classifications of chemical entities that can be used for navigation and filtering of the large chemical space. ChEBI is a prominent example of a chemical ontology, widely used in life science contexts. However, ChEBI is manually maintained and as such cannot easily scale to the full scope of public chemical data. There is a need for tools that are able to automatically classify chemical data into chemical ontologies, which can be framed as a hierarchical multi-class classification problem. In this paper we evaluate machine learning approaches for this task, comparing different learning frameworks including logistic regression, decision trees and long short-term memory artificial neural networks, and different encoding approaches for the chemical structures, including cheminformatics fingerprints and character-based encoding from chemical line notation representations. We find that classical learning approaches such as logistic regression perform well with sets of relatively specific, disjoint chemical classes, while the neural network is able to handle larger sets of overlapping classes but needs more examples per class to learn from, and is not able to make a class prediction for every molecule. Future work will explore hybrid and ensemble approaches, as well as alternative network architectures including neuro-symbolic approaches.


Sign in / Sign up

Export Citation Format

Share Document