scholarly journals Machine learning methods analysis in the document classification problem

2020 ◽  
pp. 081-087
Author(s):  
A.P. Zhyrkova ◽  
◽  
O.P. Ignatenko ◽  
◽  

Current situation with official documentary in the world, and especially in Ukraine, requires tools for electronical processing. One of the main tasks at this field is seal (or stamp) detection, which leads to documents classification based on mentioned criterion. Current article analyzes some of existed methods to resolve the problem, describes a new approach to classify documentary and reflects dependence of model accuracy to input data amount. As a result of this work is a convolutional neural network that classify 708 out of 804 images of official documents correctly. A corresponded percentage of model accuracy is 88.03, despite the fact of bias presence in input data.

2021 ◽  
Author(s):  
Rui Liu ◽  
Xin Yang ◽  
Chong Xu ◽  
Luyao Li ◽  
Xiangqiang Zeng

Abstract Landslide susceptibility mapping (LSM) is a useful tool to estimate the probability of landslide occurrence, providing a scientific basis for natural hazards prevention, land use planning, and economic development in landslide-prone areas. To date, a large number of machine learning methods have been applied to LSM, and recently the advanced Convolutional Neural Network (CNN) has been gradually adopted to enhance the prediction accuracy of LSM. The objective of this study is to introduce a CNN based model in LSM and systematically compare its overall performance with the conventional machine learning models of random forest, logistic regression, and support vector machine. Herein, we selected the Jiuzhaigou region in Sichuan Province, China as the study area. A total number of 710 landslides and 12 predisposing factors were stacked to form spatial datasets for LSM. The ROC analysis and several statistical metrics, such as accuracy, root mean square error (RMSE), Kappa coefficient, sensitivity, and specificity were used to evaluate the performance of the models in the training and validation datasets. Finally, the trained models were calculated and the landslide susceptibility zones were mapped. Results suggest that both CNN and conventional machine-learning based models have a satisfactory performance (AUC: 85.72% − 90.17%). The CNN based model exhibits excellent good-of-fit and prediction capability, and achieves the highest performance (AUC: 90.17%) but also significantly reduces the salt-of-pepper effect, which indicates its great potential of application to LSM.


2014 ◽  
Vol 5 (3) ◽  
pp. 82-96 ◽  
Author(s):  
Marijana Zekić-Sušac ◽  
Sanja Pfeifer ◽  
Nataša Šarlija

Abstract Background: Large-dimensional data modelling often relies on variable reduction methods in the pre-processing and in the post-processing stage. However, such a reduction usually provides less information and yields a lower accuracy of the model. Objectives: The aim of this paper is to assess the high-dimensional classification problem of recognizing entrepreneurial intentions of students by machine learning methods. Methods/Approach: Four methods were tested: artificial neural networks, CART classification trees, support vector machines, and k-nearest neighbour on the same dataset in order to compare their efficiency in the sense of classification accuracy. The performance of each method was compared on ten subsamples in a 10-fold cross-validation procedure in order to assess computing sensitivity and specificity of each model. Results: The artificial neural network model based on multilayer perceptron yielded a higher classification rate than the models produced by other methods. The pairwise t-test showed a statistical significance between the artificial neural network and the k-nearest neighbour model, while the difference among other methods was not statistically significant. Conclusions: Tested machine learning methods are able to learn fast and achieve high classification accuracy. However, further advancement can be assured by testing a few additional methodological refinements in machine learning methods.


Author(s):  
Denis Sato ◽  
Adroaldo José Zanella ◽  
Ernane Xavier Costa

Vehicle-animal collisions represent a serious problem in roadway infrastructure. To avoid these roadway collisions, different mitigation systems have been applied in various regions of the world. In this article, a system for detecting animals on highways is presented using computer vision and machine learning algorithms. The models were trained to classify two groups of animals: capybaras and donkeys. Two variants of the convolutional neural network called Yolo (You only look once) were used, Yolov4 and Yolov4-tiny (a lighter version of the network). The training was carried out using pre-trained models. Detection tests were performed on 147 images. The accuracy results obtained were 84.87% and 79.87% for Yolov4 and Yolov4-tiny, respectively. The proposed system has the potential to improve road safety by reducing or preventing accidents with animals.


2019 ◽  
Vol 2019 ◽  
pp. 1-8 ◽  
Author(s):  
Keqin Chen ◽  
Amit Yadav ◽  
Asif Khan ◽  
Yixin Meng ◽  
Kun Zhu

Concrete cracks are very serious and potentially dangerous. There are three obvious limitations existing in the present machine learning methods: low recognition rate, low accuracy, and long time. Improved crack detection based on convolutional neural networks can automatically detect whether an image contains cracks and mark the location of the cracks, which can greatly improve the monitoring efficiency. Experimental results show that the Adam optimization algorithm and batch normalization (BN) algorithm can make the model converge faster and achieve the maximum accuracy of 99.71%.


Author(s):  
Sergiy Pogorilyy ◽  
Artem Kramov

The detection of coreferent pairs within a text is one of the basic tasks in the area of natural language processing (NLP). The state‑ of‑ the‑ art methods of coreference resolution are based on machine learning algorithms. The key idea of the methods is to detect certain regularities between the semantic or grammatical features of text entities. In the paper, the comparative analysis of current methods of coreference resolution in English and Ukrainian texts has been performed. The key disadvantage of many methods consists in the interpretation of coreference resolution as a classification problem. The result of coreferent pairs detection is the set of groups in which elements refer to a common entity. Therefore it is advisable to consider the coreference resolution as a clusterization task. The method of coreference resolution using the set of filtering sieves and a convolutional neural network has been suggested. The set of filtering sieves to find candidates for coreferent pairs formation has been implemented. The training process of a multichannel convolutional neural network on a marked Ukrainian corpus has been performed. The usage of a multichannel structure allows analyzing of the different components of text units: semantic, lexical, and grammatical features of words and sentences. Furthermore, it is possible to process input data with unfixed size (words or sentences of a text) using a convolutional layer. The output result of the method is the set of clusters. In order to form clusters, it is necessary to take into account the previous steps of the model’s workflow. Nevertheless, such an approach contradicts the traditional methodology of machine learning. Thus, the training process of the network has been performed using the SEARN algorithm that allows the solving of tasks with unfixed output structures using a classifier model. An experimental examination of the method on the corpus of Ukrainian news has been performed. In order to estimate the accuracy of the method the corresponding common metrics for clusterization tasks have been calculated. The results obtained can indicate that the suggested method can be used to find coreferent pairs within Ukrainian texts. The method can be also easily adapted and applied to other natural languages.


Complexity ◽  
2021 ◽  
Vol 2021 ◽  
pp. 1-16
Author(s):  
Hua Xie ◽  
Minghua Zhang ◽  
Jiaming Ge ◽  
Xinfang Dong ◽  
Haiyan Chen

A sector is a basic unit of airspace whose operation is managed by air traffic controllers. The operation complexity of a sector plays an important role in air traffic management system, such as airspace reconfiguration, air traffic flow management, and allocation of air traffic controller resources. Therefore, accurate evaluation of the sector operation complexity (SOC) is crucial. Considering there are numerous factors that can influence SOC, researchers have proposed several machine learning methods recently to evaluate SOC by mining the relationship between factors and complexity. However, existing studies rely on hand-crafted factors, which are computationally difficult, specialized background required, and may limit the evaluation performance of the model. To overcome these problems, this paper for the first time proposes an end-to-end SOC learning framework based on deep convolutional neural network (CNN) specifically for free of hand-crafted factors environment. A new data representation, i.e., multichannel traffic scenario image (MTSI), is proposed to represent the overall air traffic scenario. A MTSI is generated by splitting the airspace into a two-dimension grid map and filled with navigation information. Motivated by the applications of deep learning network, the specific CNN model is introduced to automatically extract high-level traffic features from MTSIs and learn the SOC pattern. Thus, the model input is determined by combining multiple image channels composed of air traffic information, which are used to describe the traffic scenario. The model output is SOC levels for the target sector. The experimental results using a real dataset from the Guangzhou airspace sector in China show that our model can effectively extract traffic complexity information from MTSIs and achieve promising performance than traditional machine learning methods. In practice, our work can be flexibly and conveniently applied to SOC evaluation without the additional calculation of hand-crafted factors.


Electronics ◽  
2021 ◽  
Vol 10 (2) ◽  
pp. 170
Author(s):  
Muhammad Wasimuddin ◽  
Khaled Elleithy ◽  
Abdelshakour Abuzneid ◽  
Miad Faezipour ◽  
Omar Abuzaghleh

Cardiovascular diseases have been reported to be the leading cause of mortality across the globe. Among such diseases, Myocardial Infarction (MI), also known as “heart attack”, is of main interest among researchers, as its early diagnosis can prevent life threatening cardiac conditions and potentially save human lives. Analyzing the Electrocardiogram (ECG) can provide valuable diagnostic information to detect different types of cardiac arrhythmia. Real-time ECG monitoring systems with advanced machine learning methods provide information about the health status in real-time and have improved user’s experience. However, advanced machine learning methods have put a burden on portable and wearable devices due to their high computing requirements. We present an improved, less complex Convolutional Neural Network (CNN)-based classifier model that identifies multiple arrhythmia types using the two-dimensional image of the ECG wave in real-time. The proposed model is presented as a three-layer ECG signal analysis model that can potentially be adopted in real-time portable and wearable monitoring devices. We have designed, implemented, and simulated the proposed CNN network using Matlab. We also present the hardware implementation of the proposed method to validate its adaptability in real-time wearable systems. The European ST-T database recorded with single lead L3 is used to validate the CNN classifier and achieved an accuracy of 99.23%, outperforming most existing solutions.


2021 ◽  
Vol 1 (1) ◽  
pp. 31
Author(s):  
Kristiawan Nugroho

The Covid-19 pandemic has occurred for a year on earth. Various attempts have been made to overcome this pandemic, especially in making various types of vaccines developed around the world. The level of vaccine effectiveness in dealing with Covid-19 is one of the questions that is often asked by the public. This research is an attempt to classify the names of vaccines that have been used in various nations by using one of the robust machine learning methods, namely the Neural Network. The results showed that the Neural Network method provides the best accuracy, which is 99.9% higher than the Random Forest and Support Vector Machine(SVM) methods.


Sign in / Sign up

Export Citation Format

Share Document