scholarly journals Fraudulent Democracy? An Analysis of Argentina's Infamous Decade Using Supervised Machine Learning

2011 ◽  
Vol 19 (4) ◽  
pp. 409-433 ◽  
Author(s):  
Francisco Cantú ◽  
Sebastián M. Saiegh

In this paper, we introduce an innovative method to diagnose electoral fraud using vote counts. Specifically, we use synthetic data to develop and train a fraud detection prototype. We employ a naive Bayes classifier as our learning algorithm and rely on digital analysis to identify the features that are most informative about class distinctions. To evaluate the detection capability of the classifier, we use authentic data drawn from a novel data set of district-level vote counts in the province of Buenos Aires (Argentina) between 1931 and 1941, a period with a checkered history of fraud. Our results corroborate the validity of our approach: The elections considered to be irregular (legitimate) by most historical accounts are unambiguously classified as fraudulent (clean) by the learner. More generally, our findings demonstrate the feasibility of generating and using synthetic data for training and testing an electoral fraud detection system.

2021 ◽  
Author(s):  
Omar Alfarisi ◽  
Zeyar Aung ◽  
Mohamed Sassi

For defining the optimal machine learning algorithm, the decision was not easy for which we shall choose. To help future researchers, we describe in this paper the optimal among the best of the algorithms. We built a synthetic data set and performed the supervised machine learning runs for five different algorithms. For heterogeneity, we identified Random Forest, among others, to be the best algorithm.


2022 ◽  
Author(s):  
Omar Alfarisi ◽  
Zeyar Aung ◽  
Mohamed Sassi

For defining the optimal machine learning algorithm, the decision was not easy for which we shall choose. To help future researchers, we describe in this paper the optimal among the best of the algorithms. We built a synthetic data set and performed the supervised machine learning runs for five different algorithms. For heterogeneous rock fabric, we identified Random Forest, among others, to be the appropriate algorithm.


2021 ◽  
Author(s):  
Omar Alfarisi ◽  
Zeyar Aung ◽  
Mohamed Sassi

For defining the optimal machine learning algorithm, the decision was not easy for which we shall choose. To help future researchers, we describe in this paper the optimal among the best of the algorithms. We built a synthetic data set and performed the supervised machine learning runs for five different algorithms. For heterogeneity, we identified Random Forest, among others, to be the best algorithm.


2020 ◽  
Vol 223 (3) ◽  
pp. 1565-1583
Author(s):  
Hoël Seillé ◽  
Gerhard Visser

SUMMARY Bayesian inversion of magnetotelluric (MT) data is a powerful but computationally expensive approach to estimate the subsurface electrical conductivity distribution and associated uncertainty. Approximating the Earth subsurface with 1-D physics considerably speeds-up calculation of the forward problem, making the Bayesian approach tractable, but can lead to biased results when the assumption is violated. We propose a methodology to quantitatively compensate for the bias caused by the 1-D Earth assumption within a 1-D trans-dimensional Markov chain Monte Carlo sampler. Our approach determines site-specific likelihood functions which are calculated using a dimensionality discrepancy error model derived by a machine learning algorithm trained on a set of synthetic 3-D conductivity training images. This is achieved by exploiting known geometrical dimensional properties of the MT phase tensor. A complex synthetic model which mimics a sedimentary basin environment is used to illustrate the ability of our workflow to reliably estimate uncertainty in the inversion results, even in presence of strong 2-D and 3-D effects. Using this dimensionality discrepancy error model we demonstrate that on this synthetic data set the use of our workflow performs better in 80 per cent of the cases compared to the existing practice of using constant errors. Finally, our workflow is benchmarked against real data acquired in Queensland, Australia, and shows its ability to detect the depth to basement accurately.


Complexity ◽  
2018 ◽  
Vol 2018 ◽  
pp. 1-9 ◽  
Author(s):  
Massimiliano Zanin ◽  
Miguel Romance ◽  
Santiago Moral ◽  
Regino Criado

The detection of frauds in credit card transactions is a major topic in financial research, of profound economic implications. While this has hitherto been tackled through data analysis techniques, the resemblances between this and other problems, like the design of recommendation systems and of diagnostic/prognostic medical tools, suggest that a complex network approach may yield important benefits. In this paper we present a first hybrid data mining/complex network classification algorithm, able to detect illegal instances in a real card transaction data set. It is based on a recently proposed network reconstruction algorithm that allows creating representations of the deviation of one instance from a reference group. We show how the inclusion of features extracted from the network data representation improves the score obtained by a standard, neural network-based classification algorithm and additionally how this combined approach can outperform a commercial fraud detection system in specific operation niches. Beyond these specific results, this contribution represents a new example on how complex networks and data mining can be integrated as complementary tools, with the former providing a view to data beyond the capabilities of the latter.


2018 ◽  
Vol 7 (04) ◽  
pp. 871-888 ◽  
Author(s):  
Sophie J. Lee ◽  
Howard Liu ◽  
Michael D. Ward

Improving geolocation accuracy in text data has long been a goal of automated text processing. We depart from the conventional method and introduce a two-stage supervised machine-learning algorithm that evaluates each location mention to be either correct or incorrect. We extract contextual information from texts, i.e., N-gram patterns for location words, mention frequency, and the context of sentences containing location words. We then estimate model parameters using a training data set and use this model to predict whether a location word in the test data set accurately represents the location of an event. We demonstrate these steps by constructing customized geolocation event data at the subnational level using news articles collected from around the world. The results show that the proposed algorithm outperforms existing geocoders even in a case added post hoc to test the generality of the developed algorithm.


Electronics ◽  
2020 ◽  
Vol 9 (11) ◽  
pp. 1777
Author(s):  
Muhammad Ali ◽  
Stavros Shiaeles ◽  
Gueltoum Bendiab ◽  
Bogdan Ghita

Detection and mitigation of modern malware are critical for the normal operation of an organisation. Traditional defence mechanisms are becoming increasingly ineffective due to the techniques used by attackers such as code obfuscation, metamorphism, and polymorphism, which strengthen the resilience of malware. In this context, the development of adaptive, more effective malware detection methods has been identified as an urgent requirement for protecting the IT infrastructure against such threats, and for ensuring security. In this paper, we investigate an alternative method for malware detection that is based on N-grams and machine learning. We use a dynamic analysis technique to extract an Indicator of Compromise (IOC) for malicious files, which are represented using N-grams. The paper also proposes TF-IDF as a novel alternative used to identify the most significant N-grams features for training a machine learning algorithm. Finally, the paper evaluates the proposed technique using various supervised machine-learning algorithms. The results show that Logistic Regression, with a score of 98.4%, provides the best classification accuracy when compared to the other classifiers used.


2019 ◽  
Vol 8 (1) ◽  
pp. 46-51 ◽  
Author(s):  
Mukrimah Nawir ◽  
Amiza Amir ◽  
Naimah Yaakob ◽  
Ong Bi Lynn

Network anomaly detection system enables to monitor computer network that behaves differently from the network protocol and it is many implemented in various domains. Yet, the problem arises where different application domains have different defining anomalies in their environment. These make a difficulty to choose the best algorithms that suit and fulfill the requirements of certain domains and it is not straightforward. Additionally, the issue of centralization that cause fatal destruction of network system when powerful malicious code injects in the system. Therefore, in this paper we want to conduct experiment using supervised Machine Learning (ML) for network anomaly detection system that low communication cost and network bandwidth minimized by using UNSW-NB15 dataset to compare their performance in term of their accuracy (effective) and processing time (efficient) for a classifier to build a model. Supervised machine learning taking account the important features by labelling it from the datasets. The best machine learning algorithm for network dataset is AODE with a comparable accuracy is 97.26% and time taken approximately 7 seconds. Also, distributed algorithm solves the issue of centralization with the accuracy and processing time still a considerable compared to a centralized algorithm even though a little drop of the accuracy and a bit longer time needed.


2021 ◽  
Author(s):  
Marc Raphael ◽  
Michael Robitaille ◽  
Jeff Byers ◽  
Joseph Christodoulides

Abstract Machine learning algorithms hold the promise of greatly improving live cell image analysis by way of (1) analyzing far more imagery than can be achieved by more traditional manual approaches and (2) by eliminating the subjective nature of researchers and diagnosticians selecting the cells or cell features to be included in the analyzed data set. Currently, however, even the most sophisticated model based or machine learning algorithms require user supervision, meaning the subjectivity problem is not removed but rather incorporated into the algorithm’s initial training steps and then repeatedly applied to the imagery. To address this roadblock, we have developed a self-supervised machine learning algorithm that recursively trains itself directly from the live cell imagery data, thus providing objective segmentation and quantification. The approach incorporates an optical flow algorithm component to self-label cell and background pixels for training, followed by the extraction of additional feature vectors for the automated generation of a cell/background classification model. Because it is self-trained, the software has no user-adjustable parameters and does not require curated training imagery. The algorithm was applied to automatically segment cells from their background for a variety of cell types and five commonly used imaging modalities - fluorescence, phase contrast, differential interference contrast (DIC), transmitted light and interference reflection microscopy (IRM). The approach is broadly applicable in that it enables completely automated cell segmentation for long-term live cell phenotyping applications, regardless of the input imagery’s optical modality, magnification or cell type.


Sign in / Sign up

Export Citation Format

Share Document