Acoustic feature-based sentiment analysis of call center data

Opinion Mining or Sentiment Analysis is the study that analyzes people's opinions or sentiments from the text towards entities such as products and services. It has always been important to know what other people think. With the rapid growth of availability and popularity of online review sites, blogs', forums', and social networking sites' necessity of analysing and understanding these reviews has arisen. The main approaches for sentiment analysis can be categorized into semantic orientation-based approaches, knowledge-based, and machine-learning algorithms. This chapter surveys the machine learning approaches applied to sentiment analysis-based applications. The main emphasis of this chapter is to discuss the research involved in applying machine learning methods mostly for sentiment classification at document level. Machine learning-based approaches work in the following phases, which are discussed in detail in this chapter for sentiment classification: (1) feature extraction, (2) feature weighting schemes, (3) feature selection, and (4) machine-learning methods. This chapter also discusses the standard free benchmark datasets and evaluation methods for sentiment analysis. The authors conclude the chapter with a comparative study of some state-of-the-art methods for sentiment analysis and some possible future research directions in opinion mining and sentiment analysis.

Download Full-text

Machine Learning Approaches for Sentiment Analysis

Big Data ◽

10.4018/978-1-4666-9840-6.ch088 ◽

2016 ◽

pp. 1917-1933

Author(s):

Basant Agarwal ◽

Namita Mittal

Keyword(s):

Machine Learning ◽

Sentiment Analysis ◽

Opinion Mining ◽

Machine Learning Algorithms ◽

Sentiment Classification ◽

Learning Approaches ◽

Learning Methods ◽

Machine Learning Methods ◽

Knowledge Based ◽

Semantic Orientation

Opinion Mining or Sentiment Analysis is the study that analyzes people's opinions or sentiments from the text towards entities such as products and services. It has always been important to know what other people think. With the rapid growth of availability and popularity of online review sites, blogs', forums', and social networking sites' necessity of analysing and understanding these reviews has arisen. The main approaches for sentiment analysis can be categorized into semantic orientation-based approaches, knowledge-based, and machine-learning algorithms. This chapter surveys the machine learning approaches applied to sentiment analysis-based applications. The main emphasis of this chapter is to discuss the research involved in applying machine learning methods mostly for sentiment classification at document level. Machine learning-based approaches work in the following phases, which are discussed in detail in this chapter for sentiment classification: (1) feature extraction, (2) feature weighting schemes, (3) feature selection, and (4) machine-learning methods. This chapter also discusses the standard free benchmark datasets and evaluation methods for sentiment analysis. The authors conclude the chapter with a comparative study of some state-of-the-art methods for sentiment analysis and some possible future research directions in opinion mining and sentiment analysis.

Download Full-text

A Very Large-Scale Bioactivity Comparison of Deep Learning and Multiple Machine Learning Algorithms for Drug Discovery

10.26434/chemrxiv.12781241 ◽

2020 ◽

Author(s):

Thomas R. Lane ◽

Daniel H. Foil ◽

Eni Minerali ◽

Fabio Urbina ◽

Kimberley M. Zorn ◽

...

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Deep Learning ◽

Drug Discovery ◽

Deep Neural Networks ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Support Vector ◽

Learning Methods ◽

Machine Learning Methods

Machine learning methods are attracting considerable attention from the pharmaceutical industry for use in drug discovery and applications beyond. In recent studies we have applied multiple machine learning algorithms, modeling metrics and in some cases compared molecular descriptors to build models for individual targets or properties on a relatively small scale. Several research groups have used large numbers of datasets from public databases such as ChEMBL in order to evaluate machine learning methods of interest to them. The largest of these types of studies used on the order of 1400 datasets. We have now extracted well over 5000 datasets from CHEMBL for use with the ECFP6 fingerprint and comparison of our proprietary software Assay CentralTM with random forest, k-Nearest Neighbors, support vector classification, naïve Bayesian, AdaBoosted decision trees, and deep neural networks (3 levels). Model performance <a>was</a> assessed using an array of five-fold cross-validation metrics including area-under-the-curve, F1 score, Cohen’s kappa and Matthews correlation coefficient. <a>Based on ranked normalized scores for the metrics or datasets all methods appeared comparable while the distance from the top indicated Assay CentralTM and support vector classification were comparable. </a>Unlike prior studies which have placed considerable emphasis on deep neural networks (deep learning), no advantage was seen in this case where minimal tuning was performed of any of the methods. If anything, Assay CentralTM may have been at a slight advantage as the activity cutoff for each of the over 5000 datasets representing over 570,000 unique compounds was based on Assay CentralTMperformance, but support vector classification seems to be a strong competitor. We also apply Assay CentralTM to prospective predictions for PXR and hERG to further validate these models. This work currently appears to be the largest comparison of machine learning algorithms to date. Future studies will likely evaluate additional databases, descriptors and algorithms, as well as further refining methods for evaluating and comparing models.

Download Full-text

A Very Large-Scale Bioactivity Comparison of Deep Learning and Multiple Machine Learning Algorithms for Drug Discovery

10.26434/chemrxiv.12781241.v1 ◽

2020 ◽

Author(s):

Thomas R. Lane ◽

Daniel H. Foil ◽

Eni Minerali ◽

Fabio Urbina ◽

Kimberley M. Zorn ◽

...

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Deep Learning ◽

Drug Discovery ◽

Deep Neural Networks ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Support Vector ◽

Learning Methods ◽

Machine Learning Methods

Machine learning methods are attracting considerable attention from the pharmaceutical industry for use in drug discovery and applications beyond. In recent studies we have applied multiple machine learning algorithms, modeling metrics and in some cases compared molecular descriptors to build models for individual targets or properties on a relatively small scale. Several research groups have used large numbers of datasets from public databases such as ChEMBL in order to evaluate machine learning methods of interest to them. The largest of these types of studies used on the order of 1400 datasets. We have now extracted well over 5000 datasets from CHEMBL for use with the ECFP6 fingerprint and comparison of our proprietary software Assay CentralTM with random forest, k-Nearest Neighbors, support vector classification, naïve Bayesian, AdaBoosted decision trees, and deep neural networks (3 levels). Model performance <a>was</a> assessed using an array of five-fold cross-validation metrics including area-under-the-curve, F1 score, Cohen’s kappa and Matthews correlation coefficient. <a>Based on ranked normalized scores for the metrics or datasets all methods appeared comparable while the distance from the top indicated Assay CentralTM and support vector classification were comparable. </a>Unlike prior studies which have placed considerable emphasis on deep neural networks (deep learning), no advantage was seen in this case where minimal tuning was performed of any of the methods. If anything, Assay CentralTM may have been at a slight advantage as the activity cutoff for each of the over 5000 datasets representing over 570,000 unique compounds was based on Assay CentralTMperformance, but support vector classification seems to be a strong competitor. We also apply Assay CentralTM to prospective predictions for PXR and hERG to further validate these models. This work currently appears to be the largest comparison of machine learning algorithms to date. Future studies will likely evaluate additional databases, descriptors and algorithms, as well as further refining methods for evaluating and comparing models.

Download Full-text

APPLICATION OF MACHINE LEARNING METHODS TO APPROXIMATE THE EXPERIMENTAL CHARACTERISTICS OF A MEMRISTOR

Mathematical modeling in materials science of electronic component ◽

10.29003/m1536.mmmsec-2020/116-119 ◽

2020 ◽

Author(s):

V. Lopatenko

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Scientific Community ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Learning Methods ◽

Passive Element ◽

Machine Learning Methods ◽

Boosting Algorithm

Memristor is a passive element in microelectronics, similar in its properties to a biological synapse. The possibility of using a memristor as an analog element in neural networks increases the interest of the scientific community in the study of its properties. In this paper, we study the possibility of modeling some characteristics of a memristor using machine learning algorithms, in particular, the gradient boosting algorithm.

Download Full-text

Systematic literature review of machine learning methods used in the analysis of real-world data for patient-provider decision making

BMC Medical Informatics and Decision Making ◽

10.1186/s12911-021-01403-2 ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Alan Brnabic ◽

Lisa M. Hess

Keyword(s):

Machine Learning ◽

Decision Making ◽

Literature Review ◽

Systematic Literature Review ◽

Real World ◽

Learning Algorithms ◽

External Validation ◽

Machine Learning Algorithms ◽

Learning Methods ◽

Machine Learning Methods

Abstract Background Machine learning is a broad term encompassing a number of methods that allow the investigator to learn from the data. These methods may permit large real-world databases to be more rapidly translated to applications to inform patient-provider decision making. Methods This systematic literature review was conducted to identify published observational research of employed machine learning to inform decision making at the patient-provider level. The search strategy was implemented and studies meeting eligibility criteria were evaluated by two independent reviewers. Relevant data related to study design, statistical methods and strengths and limitations were identified; study quality was assessed using a modified version of the Luo checklist. Results A total of 34 publications from January 2014 to September 2020 were identified and evaluated for this review. There were diverse methods, statistical packages and approaches used across identified studies. The most common methods included decision tree and random forest approaches. Most studies applied internal validation but only two conducted external validation. Most studies utilized one algorithm, and only eight studies applied multiple machine learning algorithms to the data. Seven items on the Luo checklist failed to be met by more than 50% of published studies. Conclusions A wide variety of approaches, algorithms, statistical software, and validation strategies were employed in the application of machine learning methods to inform patient-provider decision making. There is a need to ensure that multiple machine learning approaches are used, the model selection strategy is clearly defined, and both internal and external validation are necessary to be sure that decisions for patient care are being made with the highest quality evidence. Future work should routinely employ ensemble methods incorporating multiple machine learning algorithms.

Download Full-text

Machine-learning based prediction of Cushing’s syndrome in dogs attending UK primary-care veterinary practice

Scientific Reports ◽

10.1038/s41598-021-88440-z ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Imogen Schofield ◽

David C. Brodbelt ◽

Noel Kennedy ◽

Stijn J. M. Niessen ◽

David B. Church ◽

...

Keyword(s):

Machine Learning ◽

Cushing’S Syndrome ◽

Clinical Decision Making ◽

Predictive Performance ◽

Clinical Decision ◽

Cushing's Syndrome ◽

Machine Learning Algorithms ◽

Learning Methods ◽

Machine Learning Methods ◽

Clinical Records

AbstractCushing’s syndrome is an endocrine disease in dogs that negatively impacts upon the quality-of-life of affected animals. Cushing’s syndrome can be a challenging diagnosis to confirm, therefore new methods to aid diagnosis are warranted. Four machine-learning algorithms were applied to predict a future diagnosis of Cushing's syndrome, using structured clinical data from the VetCompass programme in the UK. Dogs suspected of having Cushing's syndrome were included in the analysis and classified based on their final reported diagnosis within their clinical records. Demographic and clinical features available at the point of first suspicion by the attending veterinarian were included within the models. The machine-learning methods were able to classify the recorded Cushing’s syndrome diagnoses, with good predictive performance. The LASSO penalised regression model indicated the best overall performance when applied to the test set with an AUROC = 0.85 (95% CI 0.80–0.89), sensitivity = 0.71, specificity = 0.82, PPV = 0.75 and NPV = 0.78. The findings of our study indicate that machine-learning methods could predict the future diagnosis of a practicing veterinarian. New approaches using these methods could support clinical decision-making and contribute to improved diagnosis of Cushing’s syndrome in dogs.

Download Full-text

MODIS-FIRMS and ground-truthing based wildfire likelihood mapping of Sikkim Himalaya using machine learning algorithms.

10.21203/rs.3.rs-750123/v1 ◽

2021 ◽

Author(s):

Polash Banerjee

Keyword(s):

Machine Learning ◽

Machine Learning Algorithms ◽

Tree Cover ◽

Anthropogenic Factors ◽

Gradient Boosting ◽

Support Vector ◽

Learning Methods ◽

Sikkim Himalaya ◽

Environmental Features ◽

Machine Learning Methods

Abstract Wildfires in limited extent and intensity can be a boon for the forest ecosystem. However, recent episodes of wildfires of 2019 in Australia and Brazil are sad reminders of their heavy ecological and economical costs. Understanding the role of environmental factors in the likelihood of wildfires in a spatial context would be instrumental in mitigating it. In this study, 14 environmental features encompassing meteorological, topographical, ecological, in situ and anthropogenic factors have been considered for preparing the wildfire likelihood map of Sikkim Himalaya. A comparative study on the efficiency of machine learning methods like Generalized Linear Model (GLM), Support Vector Machine (SVM), Random Forest (RF) and Gradient Boosting Model (GBM) has been performed to identify the best performing algorithm in wildfire prediction. The study indicates that all the machine learning methods are good at predicting wildfires. However, RF has outperformed, followed by GBM in the prediction. Also, environmental features like average temperature, average wind speed, proximity to roadways and tree cover percentage are the most important determinants of wildfires in Sikkim Himalaya. This study can be considered as a decision support tool for preparedness, efficient resource allocation and sensitization of people towards mitigation of wildfires in Sikkim.

Download Full-text

Machine Learning Methods Applied for Modeling the Process of Obtaining Bricks Using Silicon-Based Materials

Materials ◽

10.3390/ma14237232 ◽

2021 ◽

Vol 14 (23) ◽

pp. 7232

Author(s):

Costel Anton ◽

Silvia Curteanu ◽

Cătălin Lisa ◽

Florin Leon

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Raw Materials ◽

Optimization Procedure ◽

Exhaust Emission ◽

Energy Potential ◽

Learning Methods ◽

Machine Learning Methods ◽

Emission Changes ◽

The Impact

Most of the time, industrial brick manufacture facilities are designed and commissioned for a particular type of manufacture mix and a particular type of burning process. Productivity and product quality maintenance and improvement is a challenge for process engineers. Our paper aims at using machine learning methods to evaluate the impact of adding new auxiliary materials on the amount of exhaust emissions. Experimental determinations made in similar conditions enabled us to build a database containing information about 121 brick batches. Various models (artificial neural networks and regression algorithms) were designed to make predictions about exhaust emission changes when auxiliary materials are introduced into the manufacture mix. The best models were feed-forward neural networks with two hidden layers, having MSE < 0.01 and r2 > 0.82 and, as regression model, kNN with error < 0.6. Also, an optimization procedure, including the best models, was developed in order to determine the optimal values for the parameters that assure the minimum quantities for the gas emission. The Pareto front obtained in the multi-objective optimization conducted with grid search method allows the user the chose the most convenient values for the dry product mass, clay, ash and organic raw materials which minimize gas emissions with energy potential.

Download Full-text

Landslide susceptibility mapping using machine learning for Wenchuan County, Sichuan province, China

E3S Web of Conferences ◽

10.1051/e3sconf/202019803023 ◽

2020 ◽

Vol 198 ◽

pp. 03023

Author(s):

Xin Yang ◽

Rui Liu ◽

Luyao Li ◽

Mei Yang ◽

Yuantao Yang

Keyword(s):

Machine Learning ◽

Landslide Susceptibility ◽

Susceptibility Mapping ◽

Machine Learning Algorithms ◽

Landslide Susceptibility Mapping ◽

Support Vector ◽

Roc Curve Analysis ◽

Learning Methods ◽

Machine Learning Methods ◽

Boosted Decision Tree

Landslide susceptibility mapping is a method used to assess the probability and spatial distribution of landslide occurrences. Machine learning methods have been widely used in landslide susceptibility in recent years. In this paper, six popular machine learning algorithms namely logistic regression, multi-layer perceptron, random forests, support vector machine, Adaboost, and gradient boosted decision tree were leveraged to construct landslide susceptibility models with a total of 1365 landslide points and 14 predisposing factors. Subsequently, the landslide susceptibility maps (LSM) were generated by the trained models. LSM shows the main landslide zone is concentrated in the southeastern area of Wenchuan County. The result of ROC curve analysis shows that all models fitted the training datasets and achieved satisfactory results on validation datasets. The results of this paper reveal that machine learning methods are feasible to build robust landslide susceptibility models.

Download Full-text