Incorporating Background Checks with Sentiment Analysis to Identify Violence Risky Chinese Microblogs

Yun-Fei Jia; Shan Li; Renbiao Wu

doi:10.3390/fi11090200

Incorporating Background Checks with Sentiment Analysis to Identify Violence Risky Chinese Microblogs

Future Internet ◽

10.3390/fi11090200 ◽

2019 ◽

Vol 11 (9) ◽

pp. 200

Author(s):

Yun-Fei Jia ◽

Shan Li ◽

Renbiao Wu

Keyword(s):

False Alarm ◽

Sentiment Analysis ◽

False Alarm Rate ◽

Training Data ◽

Support Vector ◽

The Internet ◽

Violence Risk ◽

Multiple Perspectives ◽

Speech Detection ◽

Background Checks

Based on Web 2.0 technology, more and more people tend to express their attitude or opinions on the Internet. Radical ideas, rumors, terrorism, or violent contents are also propagated on the Internet, causing several incidents of social panic every year in China. In fact, most of this content comprises joking or emotional catharsis. To detect this with conventional techniques usually incurs a large false alarm rate. To address this problem, this paper introduces a technique that combines sentiment analysis with background checks. State-of-the-art sentiment analysis usually depends on training datasets in a specific topic area. Unfortunately, for some domains, such as violence risk speech detection, there is no definitive training data. In particular, topic-independent sentiment analysis of short Chinese text has been rarely reported in the literature. In this paper, the violence risk of the Chinese microblogs is calculated from multiple perspectives. First, a lexicon-based method is used to retrieve violence-related microblogs, and then a similarity-based method is used to extract sentiment words. Semantic rules and emoticons are employed to obtain the sentiment polarity and sentiment strength of short texts. Second, the activity risk is calculated based on the characteristics of part of speech (PoS) sequence and by semantic rules, and then a threshold is set to capture the key users. Finally, the risk is confirmed by historical speeches and the opinions of the friend-circle of the key users. The experimental results show that the proposed approach outperforms the support vector machine (SVM) method on a topic-independent corpus and can effectively reduce the false alarm rate.

Download Full-text

A sentiment analysis system for social media using machine learning techniques: Social enablement

Digital Scholarship in the Humanities ◽

10.1093/llc/fqy037 ◽

2018 ◽

Vol 34 (3) ◽

pp. 569-581 ◽

Cited By ~ 1

Author(s):

Sujata Rani ◽

Parteek Kumar

Keyword(s):

Machine Learning ◽

Social Media ◽

Sentiment Analysis ◽

Media Analysis ◽

Training Data ◽

Machine Learning Techniques ◽

Support Vector ◽

Analysis Tool ◽

Data Set ◽

Learning Techniques

Abstract In this article, an innovative approach to perform the sentiment analysis (SA) has been presented. The proposed system handles the issues of Romanized or abbreviated text and spelling variations in the text to perform the sentiment analysis. The training data set of 3,000 movie reviews and tweets has been manually labeled by native speakers of Hindi in three classes, i.e. positive, negative, and neutral. The system uses WEKA (Waikato Environment for Knowledge Analysis) tool to convert these string data into numerical matrices and applies three machine learning techniques, i.e. Naive Bayes (NB), J48, and support vector machine (SVM). The proposed system has been tested on 100 movie reviews and tweets, and it has been observed that SVM has performed best in comparison to other classifiers, and it has an accuracy of 68% for movie reviews and 82% in case of tweets. The results of the proposed system are very promising and can be used in emerging applications like SA of product reviews and social media analysis. Additionally, the proposed system can be used in other cultural/social benefits like predicting/fighting human riots.

Download Full-text

Analisis Sentimen Data Twitter Tentang Pasangan Capres-Cawapres Pemilu 2019 Dengan Metode Lexicon Based Dan Support Vector Machine

Jurnal Ilmiah FIFO ◽

10.22441/fifo.2019.v11i2.004 ◽

2019 ◽

Vol 11 (2) ◽

pp. 144

Author(s):

Danar Wido Seno ◽

Arief Wibowo

Keyword(s):

Social Media ◽

Support Vector Machine ◽

Sentiment Analysis ◽

Vice President ◽

Training Data ◽

Support Vector ◽

New Words ◽

Textual Data ◽

Data Content ◽

Combination Of Methods

Social media writing content growing make a lot of new words that appear on Twitter in the form of words and abbreviations that appear so that sentiment analysis is increasingly difficult to get high accuracy of textual data on Twitter social media. In this study, the authors conducted research on sentiment analysis of the pairs of candidates for President and Vice President of Indonesia in the 2019 Elections. To obtain higher accuracy results and accommodate the problem of textual data development on Twitter, the authors conducted a combination of methods to conduct the sentiment analysis with unsupervised and supervised methods. namely Lexicon Based. This study used Twitter data in October 2018 using the search keywords with the names of each pair of candidates for President and Vice President of the 2019 Elections totaling 800 datasets. From the study with 800 datasets the best accuracy was obtained with a value of 92.5% with 80% training data composition and 20% testing data with a Precision value in each class between 85.7% - 97.2% and Recall value for each class among 78, 2% - 93.5%. With the Lexicon Based method as a labeling dataset, the process of labeling the Support Vector Machine dataset is no longer done manually but is processed by the Lexicon Based method and the dictionary on the lexicon can be added along with the development of data content on Twitter social media.

Download Full-text

Intelligent Agent-Based Intrusion Detection System Using Enhanced Multiclass SVM

Computational Intelligence and Neuroscience ◽

10.1155/2012/850259 ◽

2012 ◽

Vol 2012 ◽

pp. 1-10 ◽

Cited By ~ 29

Author(s):

S. Ganapathy ◽

P. Yogesh ◽

A. Kannan

Keyword(s):

Intrusion Detection ◽

False Alarm ◽

False Alarm Rate ◽

Outlier Detection ◽

Intelligent Agent ◽

Support Vector ◽

Detection Accuracy ◽

Data Set ◽

Agent Based ◽

Multiclass Svm

Intrusion detection systems were used in the past along with various techniques to detect intrusions in networks effectively. However, most of these systems are able to detect the intruders only with high false alarm rate. In this paper, we propose a new intelligent agent-based intrusion detection model for mobile ad hoc networks using a combination of attribute selection, outlier detection, and enhanced multiclass SVM classification methods. For this purpose, an effective preprocessing technique is proposed that improves the detection accuracy and reduces the processing time. Moreover, two new algorithms, namely, an Intelligent Agent Weighted Distance Outlier Detection algorithm and an Intelligent Agent-based Enhanced Multiclass Support Vector Machine algorithm are proposed for detecting the intruders in a distributed database environment that uses intelligent agents for trust management and coordination in transaction processing. The experimental results of the proposed model show that this system detects anomalies with low false alarm rate and high-detection rate when tested with KDD Cup 99 data set.

Download Full-text

Effects of kernels and the proportion of training data on the accuracy of SVM sentiment analysis in lecturer evaluation

IAES International Journal of Artificial Intelligence (IJ-AI) ◽

10.11591/ijai.v9.i4.pp734-743 ◽

2020 ◽

Vol 9 (4) ◽

pp. 734

Author(s):

Daniel Febrian Sengkey ◽

Agustinus Jacobus ◽

Fabian Johanes Manoppo

Keyword(s):

Support Vector Machine ◽

Sentiment Analysis ◽

Statistical Methods ◽

Statistical Test ◽

The Other ◽

Training Data ◽

Support Vector ◽

Linear Kernel ◽

Linear Polynomial ◽

Accuracy Data

Support vector machine (SVM) is a known method for supervised learning in sentiment analysis and there are many studies about the use of SVM in classifying the sentiments in lecturer evaluation. SVM has various parameters that can be tuned and kernels that can be chosen to improve the classifier accuracy. However, not all options have been explored. Therefore, in this study we compared the four SVM kernels: radial, linear, polynomial, and sigmoid, to discover how each kernel influences the accuracy of the classifier. To make a proper assessment, we used our labeled dataset of students’ evaluations toward the lecturer. The dataset was split, one for training the classifier, and another one for testing the model. As an addition, we also used several different ratios of the training:testing dataset. The split ratios are 0.5 to 0.95, with the increment factor of 0.05. The dataset was split randomly, hence the splitting-training-testing processes were repeated 1,000 times for each kernel and splitting ratio. Therefore, at the end of the experiment, we got 40,000 accuracy data. Later, we applied statistical methods to see whether the differences are significant. Based on the statistical test, we found that in this particular case, the linear kernel significantly has higher accuracy compared to the other kernels. However, there is a tradeoff, where the results are getting more varied with a higher proportion of data used for training.

Download Full-text

Aspect Based Sentiment Analysis using POS Tagging and TFIDF

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.f7935.088619 ◽

2019 ◽

Vol 8 (6) ◽

pp. 1960-1963 ◽

Cited By ~ 1

Keyword(s):

Sentiment Analysis ◽

Support Vector ◽

The Internet ◽

Media Content ◽

Product Reviews ◽

Interesting Aspect ◽

Web Based ◽

Pos Tagging ◽

Day By Day ◽

The Web

Social media content on the internet is increasing day by day. Since media knowledge helps people in making decisions, web based businesses give their clients an opportunity to express their opinions about items available on the web in the form of surveys and reviews. Sentiment analysis can be used on product reviews or tweets, comments, blogs to infer individual’s feelings or attitudes. Here Aspect Based Sentiment Analysis is used to extract most interesting aspect of a particular product from unlabeled text. We have developed two models for aspect/feature extraction.Model1 uses POS tagging whereas Model2 utilizes TFIDF .In Model 1 we start with noun phrase algorithm and extend it to adjectives and adverbs to extract all the aspect terms. In model2 after data preprocessing TDIDF technique is used. The relative importances of the aspects are calculated and the most important positive, negative and neutral aspects are presented to the user. Naïve Bayes, Support Vector machine, Decision Tree, KNN were used to classify the sentiment polarity of the generated aspects

Download Full-text

Machine Learning Techniques for Intrusion Detection

Handbook of Research on Intrusion Detection Systems - Advances in Information Security, Privacy, and Ethics ◽

10.4018/978-1-7998-2242-4.ch003 ◽

2020 ◽

pp. 47-65

Author(s):

Tameem Ahmad ◽

Mohd Asad Anwar ◽

Misbahul Haque

Keyword(s):

Random Forest ◽

Intrusion Detection ◽

False Alarm ◽

False Alarm Rate ◽

Detection Rate ◽

Clustering Algorithms ◽

Training Data ◽

Hybrid Classifier ◽

Random Forest Classification ◽

Forest Classification

This chapter proposes a hybrid classifier technique for network Intrusion Detection System by implementing a method that combines Random Forest classification technique with K-Means and Gaussian Mixture clustering algorithms. Random-forest will build patterns of intrusion over a training data in misuse-detection, while anomaly-detection intrusions will be identiðed by the outlier-detection mechanism. The implementation and simulation of the proposed method for various metrics are carried out under varying threshold values. The effectiveness of the proposed method has been carried out for metrics such as precision, recall, accuracy rate, false alarm rate, and detection rate. The various existing algorithms are analyzed extensively. It is observed experimentally that the proposed method gives superior results compared to the existing simpler classifiers as well as existing hybrid classifier techniques. The proposed hybrid classifier technique outperforms other common existing classifiers with an accuracy of 99.84%, false alarm rate as 0.09% and the detection rate as 99.7%.

Download Full-text

Detection of weak monocycle sinusoidal signals with a low constant false alarm rate based on the support vector machine

The Journal of Engineering ◽

10.1049/joe.2018.8584 ◽

2019 ◽

Vol 2019 (16) ◽

pp. 2255-2260

Author(s):

Bo Tan ◽

Jingbo Guo ◽

Guang Chang

Keyword(s):

Support Vector Machine ◽

False Alarm ◽

False Alarm Rate ◽

Support Vector ◽

Constant False Alarm Rate ◽

Sinusoidal Signals

Download Full-text

An Alarm Method for a Loose Parts Monitoring System

Shock and Vibration ◽

10.1155/2012/891085 ◽

2012 ◽

Vol 19 (4) ◽

pp. 753-761 ◽

Cited By ~ 4

Author(s):

Yanlong Cao ◽

Yuanfeng He ◽

Huawen Zheng ◽

Jiangxin Yang

Keyword(s):

False Alarm ◽

False Alarm Rate ◽

Monitoring System ◽

Power Plants ◽

Detection Rate ◽

Recognition Rate ◽

Support Vector ◽

Linear Predictive Coding ◽

Loose Parts ◽

The Impact

In order to reduce the false alarm rate and missed detection rate of a Loose Parts Monitoring System (LPMS) for Nuclear Power Plants, a new hybrid method combining Linear Predictive Coding (LPC) and Support Vector Machine (SVM) together to discriminate the loose part signal is proposed. The alarm process is divided into two stages. The first stage is to detect the weak burst signal for reducing the missed detection rate. Signal is whitened to improve the SNR, and then the weak burst signal can be detected by checking the short-term Root Mean Square (RMS) of the whitened signal. The second stage is to identify the detected burst signal for reducing the false alarm rate. Taking the signal's LPC coefficients as its characteristics, SVM is then utilized to determine whether the signal is generated by the impact of a loose part. The experiment shows that whitening the signal in the first stage can detect a loose part burst signal even at very low SNR and thusly can significantly reduce the rate of missed detection. In the second alarm stage, the loose parts' burst signal can be distinguished from pulse disturbance by using SVM. Even when the SNR is −15 dB, the system can still achieve a 100% recognition rate

Download Full-text

Hybrid Deep Learning Models for Sentiment Analysis

Complexity ◽

10.1155/2021/9986920 ◽

2021 ◽

Vol 2021 ◽

pp. 1-16

Author(s):

Cach N. Dang ◽

María N. Moreno-García ◽

Fernando De la Prieta

Keyword(s):

Deep Learning ◽

Sentiment Analysis ◽

Short Term Memory ◽

Computation Time ◽

Hybrid Models ◽

Training Data ◽

Support Vector ◽

Learning Models ◽

Hybrid Techniques ◽

Wide Range

Sentiment analysis on public opinion expressed in social networks, such as Twitter or Facebook, has been developed into a wide range of applications, but there are still many challenges to be addressed. Hybrid techniques have shown to be potential models for reducing sentiment errors on increasingly complex training data. This paper aims to test the reliability of several hybrid techniques on various datasets of different domains. Our research questions are aimed at determining whether it is possible to produce hybrid models that outperform single models with different domains and types of datasets. Hybrid deep sentiment analysis learning models that combine long short-term memory (LSTM) networks, convolutional neural networks (CNN), and support vector machines (SVM) are built and tested on eight textual tweets and review datasets of different domains. The hybrid models are compared against three single models, SVM, LSTM, and CNN. Both reliability and computation time were considered in the evaluation of each technique. The hybrid models increased the accuracy for sentiment analysis compared with single models on all types of datasets, especially the combination of deep learning models with SVM. The reliability of the latter was significantly higher.

Download Full-text

Effective Smoke Detection Using Spatial-Temporal Energy and Weber Local Descriptors in Three Orthogonal Planes (WLD-TOP)

Journal of Computer Science and Technology ◽

10.24215/16666038.18.e05 ◽

2018 ◽

Vol 18 (01) ◽

pp. e05 ◽

Cited By ~ 1

Author(s):

John Adedapo Ojo ◽

Jamiu Alabi Oladosu

Keyword(s):

False Alarm ◽

False Alarm Rate ◽

Detection Rate ◽

Robot Vision ◽

Fire Detection ◽

Support Vector ◽

False Alarms ◽

High Detection Rate ◽

Local Descriptor ◽

Video Frames

Video-based fire detection (VFD) technologies have received significant attention from both academic and industrial communities recently. However, existing VFD approaches are still susceptible to false alarms due to changes in illumination, camera noise, variability of shape, motion, colour, irregular patterns of smoke and flames, modelling and training inaccuracies. Hence, this work aimed at developing a VSD system that will have a high detection rate, low false-alarm rate and short response time. Moving blocks in video frames were segmented and analysed in HSI colour space, and wavelet energy analysis of the smoke candidate blocks was performed. In addition, Dynamic texture descriptors were obtained using Weber Local Descriptor in Three Orthogonal Planes (WLD-TOP). These features were combined and used as inputs to Support Vector Classifier with radial based kernel function, while post-processing stage employs temporal image filtering to reduce false alarm. The algorithm was implemented in MATLAB 8.1.0.604 (R2013a). Accuracy of 99.30%, detection rate of 99.28% and false alarm rate of 0.65% were obtained when tested with some online videos. The output of this work would find applications in early fire detection systems and other applications such as robot vision and automated inspection.

Download Full-text