Question terminology and representation for question type classification

Noriko Tomuro

doi:10.1075/term.10.1.08tom

Question terminology and representation for question type classification

Terminology ◽

10.1075/term.10.1.08tom ◽

2004 ◽

Vol 10 (1) ◽

pp. 153-168 ◽

Cited By ~ 4

Author(s):

Noriko Tomuro

Keyword(s):

Machine Learning ◽

Classification Accuracy ◽

Cross Validation ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Question Type ◽

Semantic Features ◽

Feature Sets ◽

Fixed Expressions ◽

Type Classification

Question terminology is a set of terms which appear in keywords, idioms and fixed expressions commonly observed in questions. This paper investigates ways to automatically extract question terminology from a corpus of questions and represent them for the purpose of classifying by question type. Our key interest is to see whether or not semantic features can enhance the representation of strongly lexical nature of question sentences. We compare two feature sets: one with lexical features only, and another with a mixture of lexical and semantic features. For evaluation, we measure the classification accuracy made by two machine learning algorithms, C5.0 and PEBLS, by using a procedure called domain cross-validation, which effectively measures the domain transferability of features.

Download Full-text

Using Configuration Semantic Features and Machine Learning Algorithms to Predict Build Result in Cloud-Based Container Environment

2020 IEEE 26th International Conference on Parallel and Distributed Systems (ICPADS) ◽

10.1109/icpads51040.2020.00042 ◽

2020 ◽

Author(s):

Yiwen Wu ◽

Yang Zhang ◽

Junsheng Chang ◽

Bo Ding ◽

Tao Wang ◽

...

Keyword(s):

Machine Learning ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Semantic Features

Download Full-text

Predicting Health Material Cognitive Accessibility Using Multidimensional Semantic Features and Readability Tools as Predicators (Preprint)

10.2196/preprints.29175 ◽

2021 ◽

Author(s):

Meng Ji ◽

Yanmeng Liu ◽

Tianyong Hao

Keyword(s):

Machine Learning ◽

Health Education ◽

Health Information ◽

Domain Knowledge ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Semantic Features ◽

Integrated Models ◽

Advanced Education ◽

Cognitive Accessibility

BACKGROUND Much of current health information understandability research uses medical readability formula (MRF) to assess the cognitive difficulty of health education resources. This is based on an implicit assumption that medical domain knowledge represented by uncommon words or jargons form the sole barriers to health information access among the public. Our study challenged this by showing that for readers from non-English speaking backgrounds with higher education attainment, semantic features of English health texts rather than medical jargons can explain the lack of cognitive access of health materials among readers with better understanding of health terms, yet limited exposure to English health education materials. OBJECTIVE Our study explored combined MRF and multidimensional semantic features (MSF) for developing machine learning algorithms to predict the actual level of cognitive accessibility of English health materials on health risks and diseases for specific populations. We compare algorithms to evaluate the cognitive accessibility of specialised health information for non-native English speaker with advanced education levels yet very limited exposure to English health education environments. METHODS We used 108 semantic features to measure the content complexity and accessibility of original English resources. Using 1000 English health texts collected from international health organization websites, rated by international tertiary students, we compared machine learning (decision tree, SVM, discriminant analysis, ensemble tree and logistic regression) after automatic hyperparameter optimization (grid search for the best combination of hyperparameters of minimal classification errors). We applied 10-fold cross-validation on the whole dataset for the model training and testing, calculated the AUC, sensitivity, specificity, and accuracy as the measured of the model performance. RESULTS Using two sets of predictor features: widely tested MRF and MSF proposed in our study, we developed and compared three sets of machine learning algorithms: the first set of algorithms used MRF as predictors only, the second set of algorithms used MSF as predictors only, and the last set of algorithms used both MRF and MSF as integrated models. The results showed that the integrated models outperformed in terms of AUC, sensitivity, accuracy, and specificity. CONCLUSIONS Our study showed that cognitive accessibility of English health texts is not limited to word length and sentence length conventionally measured by MRF. We compared machine learning algorithms combing MRF and MSF to explore the cognitive accessibility of health information from syntactic and semantic perspectives. The results showed the strength of integrated models in terms of statistically increased AUC, sensitivity, and accuracy to predict health resource accessibility for the target readership, indicating that both MRF and MSF contribute to the comprehension of health information, and that for readers with advanced education, semantic features outweigh syntax and domain knowledge.

Download Full-text

Performance analysis of machine learning algorithms on automated sleep staging feature sets

CAAI Transactions on Intelligence Technology ◽

10.1049/cit2.12042 ◽

2021 ◽

Author(s):

Santosh Satapathy ◽

D Loganathan ◽

Hari Kishan Kondaveeti ◽

RamaKrushna Rath

Keyword(s):

Machine Learning ◽

Performance Analysis ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Sleep Staging ◽

Feature Sets

Download Full-text

Evaluating explorative prediction power of machine learning algorithms for materials discovery using k-fold forward cross-validation

Computational Materials Science ◽

10.1016/j.commatsci.2019.109203 ◽

2020 ◽

Vol 171 ◽

pp. 109203 ◽

Cited By ~ 26

Author(s):

Zheng Xiong ◽

Yuxin Cui ◽

Zhonghao Liu ◽

Yong Zhao ◽

Ming Hu ◽

...

Keyword(s):

Machine Learning ◽

Cross Validation ◽

Learning Algorithms ◽

Machine Learning Algorithms

Download Full-text

Assessment of Machine Learning Algorithms for Prediction of Breast Cancer Malignancy Based on Mammogram Numeric Data

10.1101/2020.01.08.20016949 ◽

2020 ◽

Cited By ~ 1

Author(s):

Peter T. Habib ◽

Alsamman M. Alsamman ◽

Sameh E. Hassnein ◽

Ghada A. Shereif ◽

Aladdin Hamwieh

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Cross Validation ◽

Mean Squared Error ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Adjusted Rand Index ◽

Support Vector ◽

Cancer Information ◽

Term Care

Abstractin 2019, estimated New Cases 268.600, Breast cancer has one of the most common cancers and is one of the world’s leading causes of death for women. Classification and data mining is an efficient way to classify information. Particularly in the medical field where prediction techniques are commonly used for early detection and effective treatment in diagnosis and research.These paper tests models for the mammogram analysis of breast cancer information from 23 of the more widely used machine learning algorithms such as Decision Tree, Random forest, K-nearest neighbors and support vector machine. The spontaneously splits results are distributed from a replicated 10-fold cross-validation method. The accuracy calculated by Regression Metrics such as Mean Absolute Error, Mean Squared Error, R2 Score and Clustering Metrics such as Adjusted Rand Index, Homogeneity, V-measure.accuracy has been checked F-Measure, AUC, and Cross-Validation. Thus, proper identification of patients with breast cancer would create care opportunities, for example, the supervision and the implementation of intervention plans could benefit the quality of long-term care. Experimental results reveal that the maximum precision 100%with the lowest error rate is obtained with Ada-boost Classifier.

Download Full-text

The Impact of Selecting a Validation Method in Machine Learning on Predicting Basketball Game Outcomes

Symmetry ◽

10.3390/sym12030431 ◽

2020 ◽

Vol 12 (3) ◽

pp. 431 ◽

Cited By ~ 1

Author(s):

Tomislav Horvat ◽

Ladislav Havaš ◽

Dunja Srpak

Keyword(s):

Machine Learning ◽

Cross Validation ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Test Validation ◽

Sporting Events ◽

Validation Method ◽

Validation Methods ◽

Independent Events ◽

The Impact

Interest in sports predictions as well as the public availability of large amounts of structured and unstructured data are increasing every day. As sporting events are not completely independent events, but characterized by the influence of the human factor, the adequate selection of the analysis process is very important. In this paper, seven different classification machine learning algorithms are used and validated with two validation methods: Train&Test and cross-validation. Validation methods were analyzed and critically reviewed. The obtained results are analyzed and compared. Analyzing the results of the used machine learning algorithms, the best average prediction results were obtained by using the nearest neighbors algorithm and the worst prediction results were obtained by using decision trees. The cross-validation method obtained better results than the Train&Test validation method. The prediction results of the Train&Test validation method by using disjoint datasets and up-to-date data were also compared. Better results were obtained by using up-to-date data. In addition, directions for future research are also explained.

Download Full-text

A cross-validation scheme for machine learning algorithms in shotgun proteomics

BMC Bioinformatics ◽

10.1186/1471-2105-13-s16-s3 ◽

2012 ◽

Vol 13 (Suppl 16) ◽

pp. S3 ◽

Cited By ~ 15

Author(s):

Viktor Granholm ◽

William Noble ◽

Lukas Käll

Keyword(s):

Machine Learning ◽

Cross Validation ◽

Learning Algorithms ◽

Shotgun Proteomics ◽

Machine Learning Algorithms ◽

Validation Scheme

Download Full-text

Cross-validation of machine learning algorithms for malware detection using static features of Windows portable executables: A Comparative Study

2020 IEEE 17th International Conference on Smart Communities: Improving Quality of Life Using ICT, IoT and AI (HONET) ◽

10.1109/honet50430.2020.9322809 ◽

2020 ◽

Author(s):

Warda Aslam ◽

M. M. Fraz ◽

S.K. Rizvi ◽

S. Saleem

Keyword(s):

Machine Learning ◽

Comparative Study ◽

Cross Validation ◽

Learning Algorithms ◽

Malware Detection ◽

Machine Learning Algorithms

Download Full-text

Identifying the Main Risk Factors for CVD Prediction Using Machine Learning Algorithms

10.20944/preprints202108.0471.v1 ◽

2021 ◽

Author(s):

Luis Rolando Guarneros-Nolasco ◽

Nancy Aracely Cruz-Ramos ◽

Giner Alor-Hernández ◽

Lisbeth Rodríguez-Mazahua ◽

José Luis Sánchez-Cervantes

Keyword(s):

Machine Learning ◽

Cross Validation ◽

Performance Metrics ◽

Learning Algorithms ◽

Predictive Performance ◽

Machine Learning Algorithms ◽

Algorithm Performance ◽

Body Regions ◽

Risks Factors ◽

Fold Cross Validation

CVDs are a leading cause of death globally. In CVDs, the heart is unable to deliver enough blood to other body regions. Since effective and accurate diagnosis of CVDs is essential for CVD prevention and treatment, machine learning (ML) techniques can be effectively and reliably used to discern patients suffering from a CVD from those who do not suffer from any heart condition. Namely, machine learning algorithms (MLAs) play a key role in the diagnosis of CVDs through predictive models that allow us to identify the main risks factors influencing CVD development. In this study, we analyze the performance of ten MLAs on two datasets for CVD prediction and two for CVD diagnosis. Algorithm performance is analyzed on top-two and top-four dataset attributes/features with respect to five performance metrics –accuracy, precision, recall, f1-score, and roc-auc – using the train-test split technique and k-fold cross-validation. Our study identifies the top two and four attributes from each CVD diagnosis/prediction dataset. As our main findings, the ten MLAs exhibited appropriate diagnosis and predictive performance; hence, they can be successfully implemented for improving current CVD diagnosis efforts and help patients around the world, especially in regions where medical staff is lacking.

Download Full-text

Detecting Spam Messages in Twitter Data by Machine learning Algorithms using Cross Validation

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.k1913.1081219 ◽

2019 ◽

Vol 8 (12) ◽

pp. 2941-2946

Keyword(s):

Machine Learning ◽

Social Media ◽

Cross Validation ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Support Vector ◽

Human Relations ◽

Detection Model ◽

Social Media Networks ◽

Twitter Data

Now a day’s human relations are maintained by social media networks. Traditional relationships now days are obsolete. To maintain in association, sharing ideas, exchange knowledge between we use social media networking sites. Social media networking sites like Twitter, Facebook, LinkedIn etc are available in the communication environment. Through Twitter media users share their opinions, interests, knowledge to others by messages. At the same time some of the user’s misguide the genuine users. These genuine users are also called solicited users and the users who misguidance are called spammers. These spammers post unwanted information to the non spam users. The non spammers may retweet them to others and they follow the spammers. To avoid this spam messages we propose a methodology by us using machine learning algorithms. To develop our approach used a set of content based features. In spam detection model we used Support vector machine algorithm(SVM) and Naive bayes classification algorithm. To measure the performance of our model we used precision, recall and F measure metrics.

Download Full-text