Detecting Suspicious Texts Using Machine Learning Techniques

Omar Sharif; Mohammed Moshiul Hoque; A. S. M. Kayes; Raza Nowrozy; Iqbal H. Sarker

doi:10.3390/app10186527

Detecting Suspicious Texts Using Machine Learning Techniques

Applied Sciences ◽

10.3390/app10186527 ◽

2020 ◽

Vol 10 (18) ◽

pp. 6527 ◽

Cited By ~ 1

Author(s):

Omar Sharif ◽

Mohammed Moshiul Hoque ◽

A. S. M. Kayes ◽

Raza Nowrozy ◽

Iqbal H. Sarker

Keyword(s):

Machine Learning ◽

Instant Messaging ◽

Classification Model ◽

Machine Learning Techniques ◽

Text Documents ◽

Digital Platforms ◽

Malicious Activity ◽

Internet Users ◽

Learning Techniques ◽

The Web

Due to the substantial growth of internet users and its spontaneous access via electronic devices, the amount of electronic contents has been growing enormously in recent years through instant messaging, social networking posts, blogs, online portals and other digital platforms. Unfortunately, the misapplication of technologies has increased with this rapid growth of online content, which leads to the rise in suspicious activities. People misuse the web media to disseminate malicious activity, perform the illegal movement, abuse other people, and publicize suspicious contents on the web. The suspicious contents usually available in the form of text, audio, or video, whereas text contents have been used in most of the cases to perform suspicious activities. Thus, one of the most challenging issues for NLP researchers is to develop a system that can identify suspicious text efficiently from the specific contents. In this paper, a Machine Learning (ML)-based classification model is proposed (hereafter called STD) to classify Bengali text into non-suspicious and suspicious categories based on its original contents. A set of ML classifiers with various features has been used on our developed corpus, consisting of 7000 Bengali text documents where 5600 documents used for training and 1400 documents used for testing. The performance of the proposed system is compared with the human baseline and existing ML techniques. The SGD classifier ‘tf-idf’ with the combination of unigram and bigram features are used to achieve the highest accuracy of 84.57%.

Download Full-text

Detecting Suspicious Texts Using Machine Learning Techniques

10.20944/preprints202008.0033.v1 ◽

2020 ◽

Author(s):

Omar Sharif ◽

Mohammed Moshiul Hoque ◽

A. S. M. Kayes ◽

Raza Nowrozy ◽

Iqbal H. Sarker

Keyword(s):

Machine Learning ◽

Instant Messaging ◽

Classification Model ◽

Machine Learning Techniques ◽

Text Documents ◽

Digital Platforms ◽

Malicious Activity ◽

Internet Users ◽

Learning Techniques ◽

The Web

Due to the substantial growth of internet users and its spontaneous access via electronic devices, the amount of electronic contents is growing enormously in recent years through instant messaging, social networking posts, blogs, online portals, and other digital platforms. Unfortunately, the misapplication of technologies has boosted with this rapid growth of online content which leads to the rise in suspicious activities. People misuse the web media to disseminate malicious activity, perform the illegal movement, abuse other people, and publicize suspicious contents on the web. The suspicious contents usually available in the form of text, audio or video, whereas text contents have been used in most of the cases to perform suspicious activities. Thus, one of the most challenging issues for NLP researchers is to develop a system that can identify suspicious text efficiently from the specific contents. In this paper, a Machine Learning (ML)-based classification model is proposed (hereafter called STD) to classify Bengali text into non-suspicious and suspicious categories based on its original contents. A set of ML classifiers with various features has been used on our developed corpus, consisting of 7000 Bengali text documents where 5600 documents used for training and 1400 documents used for testing. The performance of the proposed system is compared with the human baseline and existing ML techniques. The SGD classifier `tf-idf’ with the combination of unigram and bigram features are used to achieve the highest accuracy of 84.57%.

Download Full-text

A Leaf Disease Classification Model in Betel Vine Using Machine Learning Techniques

2021 2nd International Conference on Robotics, Electrical and Signal Processing Techniques (ICREST) ◽

10.1109/icrest51555.2021.9331142 ◽

2021 ◽

Author(s):

Md Zahid Hasan ◽

Nahid Zeba ◽

Md. Abdul Malek ◽

Sanjida Sultana Reya

Keyword(s):

Machine Learning ◽

Disease Classification ◽

Classification Model ◽

Machine Learning Techniques ◽

Leaf Disease ◽

Learning Techniques

Download Full-text

Identification with machine learning techniques of a classification model for the degree of damage to rubber-textile conveyor belts with the aim to achieve sustainability

Engineering Failure Analysis ◽

10.1016/j.engfailanal.2021.105564 ◽

2021 ◽

pp. 105564

Author(s):

Andrejiova Miriam ◽

Anna Grincova ◽

Daniela Marasova

Keyword(s):

Machine Learning ◽

Classification Model ◽

Machine Learning Techniques ◽

Learning Techniques ◽

Conveyor Belts ◽

Degree Of Damage

Download Full-text

A Macrocause Classification Model for Violent Crime Analysis in the Field of Public Safety Based on Machine Learning Techniques

10.1109/isc253183.2021.9562842 ◽

2021 ◽

Author(s):

Ramiro de Vasconcelos dos Santos Junior ◽

Joao Vitor Venceslau Coelho ◽

Nelio Alessandro Azevedo Cacho

Keyword(s):

Machine Learning ◽

Violent Crime ◽

Public Safety ◽

Classification Model ◽

Machine Learning Techniques ◽

Crime Analysis ◽

Learning Techniques

Download Full-text

An Insight of Machine Learning in Web Network Analysis

International Journal of Distributed Artificial Intelligence ◽

10.4018/ijdai.2019070103 ◽

2019 ◽

Vol 11 (2) ◽

pp. 20-34

Author(s):

Meenakshi Sharma ◽

Anshul Garg

Keyword(s):

Machine Learning ◽

Machine Learning Techniques ◽

Effective Solution ◽

Business Organisation ◽

Web Structure ◽

Learning Techniques ◽

Effective Decision ◽

Effective Decision Making ◽

The Given ◽

The Web

The World Wide Web is immensely rich in knowledge. The knowledge comes from both the content and distinctive characteristics of the web like its hyperlink structure. The problem comes in digging the relevant data from the web and giving the most appropriate decision to solve the given problem, which can be used for improving any business organisation. The effective solution of the problem depends on how efficiently and effectively the analysis of the web data is done. In analysing the data on web, not only relevant content analysis is essential but also the analysis of web structure is important. This article gives a brief introduction about the various terminologies and measures like centrality, Page Rank, and density used in the web networking analysis. This article will also give a brief introduction about the various supervised ML techniques such as classification, regression, and unsupervised machine learning techniques such as clustering, etc., which are very useful in analysing the web network so that user can make quick and effective decision making

Download Full-text

Twitter sentiment analysis for the estimation of voting intention in the 2017 Chilean elections

Intelligent Data Analysis ◽

10.3233/ida-194768 ◽

2020 ◽

Vol 24 (5) ◽

pp. 1141-1160

Author(s):

Tomás Alegre Sepúlveda ◽

Brian Keith Norambuena

Keyword(s):

Machine Learning ◽

Support Vector Machines ◽

Sentiment Analysis ◽

Classification Model ◽

Machine Learning Techniques ◽

Support Vector ◽

Traditional Methods ◽

Actual Result ◽

Learning Techniques ◽

Vector Machines

In this paper, we apply sentiment analysis methods in the context of the first round of the 2017 Chilean elections. The purpose of this work is to estimate the voting intention associated with each candidate in order to contrast this with the results from classical methods (e.g., polls and surveys). The data are collected from Twitter, because of its high usage in Chile and in the sentiment analysis literature. We obtained tweets associated with the three main candidates: Sebastián Piñera (SP), Alejandro Guillier (AG) and Beatriz Sánchez (BS). For each candidate, we estimated the voting intention and compared it to the traditional methods. To do this, we first acquired the data and labeled the tweets as positive or negative. Afterward, we built a model using machine learning techniques. The classification model had an accuracy of 76.45% using support vector machines, which yielded the best model for our case. Finally, we use a formula to estimate the voting intention from the number of positive and negative tweets for each candidate. For the last period, we obtained a voting intention of 35.84% for SP, compared to a range of 34–44% according to traditional polls and 36% in the actual elections. For AG we obtained an estimate of 37%, compared with a range of 15.40% to 30.00% for traditional polls and 20.27% in the elections. For BS we obtained an estimate of 27.77%, compared with the range of 8.50% to 11.00% given by traditional polls and an actual result of 22.70% in the elections. These results are promising, in some cases providing an estimate closer to reality than traditional polls. Some differences can be explained due to the fact that some candidates have been omitted, even though they held a significant number of votes.

Download Full-text

Machine Learning Techniques for Land Use/Land Cover Classification of Medium Resolution Optical Satellite Imagery Focusing on Temporary Inundated Areas

Journal of Environmental Geography ◽

10.2478/jengeo-2020-0005 ◽

2020 ◽

Vol 13 (1-2) ◽

pp. 43-52

Author(s):

Boudewijn van Leeuwen ◽

Zalán Tobak ◽

Ferenc Kovács

Keyword(s):

Neural Network ◽

Machine Learning ◽

Land Use ◽

Land Cover ◽

Classification Model ◽

Machine Learning Techniques ◽

Support Vector ◽

Land Use Land Cover ◽

Learning Techniques

AbstractClassification of multispectral optical satellite data using machine learning techniques to derive land use/land cover thematic data is important for many applications. Comparing the latest algorithms, our research aims to determine the best option to classify land use/land cover with special focus on temporary inundated land in a flat area in the south of Hungary. These inundations disrupt agricultural practices and can cause large financial loss. Sentinel 2 data with a high temporal and medium spatial resolution is classified using open source implementations of a random forest, support vector machine and an artificial neural network. Each classification model is applied to the same data set and the results are compared qualitatively and quantitatively. The accuracy of the results is high for all methods and does not show large overall differences. A quantitative spatial comparison demonstrates that the neural network gives the best results, but that all models are strongly influenced by atmospheric disturbances in the image.

Download Full-text

Theme Identification using Machine Learning Techniques

Journal of Integrated and Advanced Engineering (JIAE) ◽

10.51662/jiae.v1i2.24 ◽

2021 ◽

Vol 1 (2) ◽

pp. 123-134

Author(s):

Siti Hajar Jayady ◽

Hasmawati Antong

Keyword(s):

Machine Learning ◽

Confusion Matrix ◽

Classification Model ◽

Machine Learning Techniques ◽

Text Documents ◽

Search Optimization ◽

Automatic Text Classification ◽

Theme Identification ◽

Linear Svm ◽

Time And Energy

With the abundance of online research platforms, much information presented in PDF files, such as articles and journals, can be obtained easily. In this case, students completing research projects would have many downloaded PDF articles on their laptops. However, identifying the target articles manually within the collection can be tiring as most articles consist of several pages that need to be analyzed. Reading each article to determine if the article relates theme and organizing the articles based on themes is time and energy-consuming. Referring to this problem, a PDF files organizer that implemented a theme identifier is necessary. Thus, work will focus on automatic text classification using the machine learning methods to build a theme identifier employed in the PDF files organizer to classify articles into augmented reality and machine learning. A total of 1000 text documents for both themes were used to build the classification model. Moreover, the pre-preprocessing step for data cleaning and TF-IDF feature extraction for text vectorization and to reduce sparse vectors were performed. 80% of the dataset were used for training, and the remaining were used to validate the trained models. The classification models proposed in this work are Linear SVM and Multinomial Naïve Bayes. The accuracy of the models was evaluated using a confusion matrix. For the Linear SVM model, grid-search optimization was performed to determine the optimal value of the Cost parameter.

Download Full-text

Measuring inflation expectations ofthe Russian population with the help of machine learning

Voprosy Ekonomiki ◽

10.32609/0042-8736-2017-6-71-93 ◽

2017 ◽

pp. 71-93 ◽

Cited By ~ 1

Author(s):

I. Goloshchapova ◽

M. Andreev

Keyword(s):

Machine Learning ◽

Text Mining ◽

Population Based ◽

Russian Population ◽

Machine Learning Techniques ◽

The Internet ◽

Inflation Expectations ◽

New Approach ◽

Learning Techniques ◽

The Web

The paper proposes a new approach to measure inflation expectations of the Russian population based on text mining of information on the Internet with the help of machine learning techniques. Two indicators were constructed on the base of readers’ comments to inflation news in major Russian economic media available in the web at the period from 2014 through 2016: with the help of words frequency and sentiment analysis of comments content. During the whole considered period of time both indicators were characterized by dynamics adequate to the development of macroeconomic situation and were also able to forecast dynamics of official Bank of Russia indicators of population inflation expectations for approximately one month in advance.

Download Full-text

Classification Model for Water Quality using Machine Learning Techniques

International Journal of Software Engineering and Its Applications ◽

10.14257/ijseia.2015.9.6.05 ◽

2015 ◽

Vol 9 (6) ◽

pp. 45-52 ◽

Cited By ~ 10

Author(s):

Salisu Yusuf Muhammad ◽

Mokhairi Makhtar ◽

Azilawati Rozaimee ◽

Azwa Abdul Aziz ◽

Azrul Amri Jamal

Keyword(s):

Machine Learning ◽

Water Quality ◽

Classification Model ◽

Machine Learning Techniques ◽

Learning Techniques

Download Full-text