Protein Structure Prediction and Interpretation with Support Vector Machines and Decision Trees

Closed-circuit television inspection technology is traditionally used to identify aging sewer pipes requiring rehabilitation. While these inspections provide essential information on the condition of pipes hidden from day-to-day view, they are expensive and often limited to small portions of an entire sewer system. Municipalities may benefit from utilizing predictive analytics to leverage existing inspection datasets so that reliable predictions of condition are available for pipes that have not yet been inspected. The predictive capabilities of data mining systems, namely support vector machines (SVMs) and decision tree classifiers, are demonstrated using a case study of sanitary sewer pipe inspection data collected by the municipality of Guelph, Ontario, Canada. The modeling algorithms are implemented using open-source software and are tuned to counteract the negative impact on predictive performance resulting from class imbalance common within pipe inspection datasets. The decision tree classifier outperforms SVM for this classification task – achieving an acceptable area under the receiver operating characteristic curve of 0.77 and an overall accuracy of 76% on a stratified test set. Although predicting individual pipe condition is a notoriously difficult task, decision trees are found to be a useful screening tool for planning future inspection-related activities.

Download Full-text

A Semantic Scattering model for the automatic interpretation of English genitives

Natural Language Engineering ◽

10.1017/s1351324908004798 ◽

2009 ◽

Vol 15 (2) ◽

pp. 215-239 ◽

Cited By ~ 1

Author(s):

ADRIANA BADULESCU ◽

DAN MOLDOVAN

Keyword(s):

Support Vector Machines ◽

Decision Trees ◽

Naive Bayes ◽

Word Sense Disambiguation ◽

Naïve Bayes ◽

Semantic Relations ◽

Support Vector ◽

Word Sense ◽

Vector Machines ◽

Bayes Algorithm

AbstractAn important problem in knowledge discovery from text is the automatic extraction of semantic relations. This paper addresses the automatic classification of thesemantic relationsexpressed by English genitives. A learning model is introduced based on the statistical analysis of the distribution of genitives' semantic relations in a corpus. The semantic and contextual features of the genitive's noun phrase constituents play a key role in the identification of the semantic relation. The algorithm was trained and tested on a corpus of approximately 20,000 sentences and achieved an f-measure of 79.80 per cent for of-genitives, far better than the 40.60 per cent obtained using a Decision Trees algorithm, the 50.55 per cent obtained using a Naive Bayes algorithm, or the 72.13 per cent obtained using a Support Vector Machines algorithm on the same corpus using the same features. The results were similar for s-genitives: 78.45 per cent using Semantic Scattering, 47.00 per cent using Decision Trees, 43.70 per cent using Naive Bayes, and 70.32 per cent using a Support Vector Machines algorithm. The results demonstrate the importance of word sense disambiguation and semantic generalization/specialization for this task. They also demonstrate that different patterns (in our case the two types of genitive constructions) encode different semantic information and should be treated differently in the sense that different models should be built for different patterns.

Download Full-text