Fraudulent Democracy? An Analysis of Argentina's Infamous Decade Using Supervised Machine Learning

Francisco Cantú; Sebastián M. Saiegh

doi:10.1093/pan/mpr033

Fraudulent Democracy? An Analysis of Argentina's Infamous Decade Using Supervised Machine Learning

Political Analysis ◽

10.1093/pan/mpr033 ◽

2011 ◽

Vol 19 (4) ◽

pp. 409-433 ◽

Cited By ~ 30

Author(s):

Francisco Cantú ◽

Sebastián M. Saiegh

Keyword(s):

Learning Algorithm ◽

Detection System ◽

Synthetic Data ◽

Fraud Detection ◽

Supervised Machine Learning ◽

Data Set ◽

Electoral Fraud ◽

History Of ◽

Historical Accounts ◽

Authentic Data

In this paper, we introduce an innovative method to diagnose electoral fraud using vote counts. Specifically, we use synthetic data to develop and train a fraud detection prototype. We employ a naive Bayes classifier as our learning algorithm and rely on digital analysis to identify the features that are most informative about class distinctions. To evaluate the detection capability of the classifier, we use authentic data drawn from a novel data set of district-level vote counts in the province of Buenos Aires (Argentina) between 1931 and 1941, a period with a checkered history of fraud. Our results corroborate the validity of our approach: The elections considered to be irregular (legitimate) by most historical accounts are unambiguously classified as fraudulent (clean) by the learner. More generally, our findings demonstrate the feasibility of generating and using synthetic data for training and testing an electoral fraud detection system.

Download Full-text

Deducing of Optimal Machine Learning Algorithms for Heterogeneity

10.36227/techrxiv.17162147 ◽

2021 ◽

Author(s):

Omar Alfarisi ◽

Zeyar Aung ◽

Mohamed Sassi

Keyword(s):

Machine Learning ◽

Random Forest ◽

Learning Algorithm ◽

Learning Algorithms ◽

Synthetic Data ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Machine Learning Algorithm ◽

Data Set ◽

Optimal Machine

For defining the optimal machine learning algorithm, the decision was not easy for which we shall choose. To help future researchers, we describe in this paper the optimal among the best of the algorithms. We built a synthetic data set and performed the supervised machine learning runs for five different algorithms. For heterogeneity, we identified Random Forest, among others, to be the best algorithm.

Download Full-text

Deducing Optimal Classification Algorithm for Heterogeneous Fabric

10.36227/techrxiv.17162147.v2 ◽

2022 ◽

Author(s):

Omar Alfarisi ◽

Zeyar Aung ◽

Mohamed Sassi

Keyword(s):

Machine Learning ◽

Random Forest ◽

Learning Algorithm ◽

Synthetic Data ◽

Supervised Machine Learning ◽

Machine Learning Algorithm ◽

Data Set ◽

Heterogeneous Rock ◽

Optimal Classification ◽

Optimal Machine

For defining the optimal machine learning algorithm, the decision was not easy for which we shall choose. To help future researchers, we describe in this paper the optimal among the best of the algorithms. We built a synthetic data set and performed the supervised machine learning runs for five different algorithms. For heterogeneous rock fabric, we identified Random Forest, among others, to be the appropriate algorithm.

Download Full-text

Deducing of Optimal Machine Learning Algorithms for Heterogeneity

10.36227/techrxiv.17162147.v1 ◽

2021 ◽

Author(s):

Omar Alfarisi ◽

Zeyar Aung ◽

Mohamed Sassi

Keyword(s):

Machine Learning ◽

Random Forest ◽

Learning Algorithm ◽

Learning Algorithms ◽

Synthetic Data ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Machine Learning Algorithm ◽

Data Set ◽

Optimal Machine

Download Full-text

Bayesian inversion of magnetotelluric data considering dimensionality discrepancies

Geophysical Journal International ◽

10.1093/gji/ggaa391 ◽

2020 ◽

Vol 223 (3) ◽

pp. 1565-1583

Author(s):

Hoël Seillé ◽

Gerhard Visser

Keyword(s):

Learning Algorithm ◽

Synthetic Data ◽

Real Data ◽

Error Model ◽

Bayesian Inversion ◽

Magnetotelluric Data ◽

Data Set ◽

Likelihood Functions ◽

Training Images ◽

Phase Tensor

SUMMARY Bayesian inversion of magnetotelluric (MT) data is a powerful but computationally expensive approach to estimate the subsurface electrical conductivity distribution and associated uncertainty. Approximating the Earth subsurface with 1-D physics considerably speeds-up calculation of the forward problem, making the Bayesian approach tractable, but can lead to biased results when the assumption is violated. We propose a methodology to quantitatively compensate for the bias caused by the 1-D Earth assumption within a 1-D trans-dimensional Markov chain Monte Carlo sampler. Our approach determines site-specific likelihood functions which are calculated using a dimensionality discrepancy error model derived by a machine learning algorithm trained on a set of synthetic 3-D conductivity training images. This is achieved by exploiting known geometrical dimensional properties of the MT phase tensor. A complex synthetic model which mimics a sedimentary basin environment is used to illustrate the ability of our workflow to reliably estimate uncertainty in the inversion results, even in presence of strong 2-D and 3-D effects. Using this dimensionality discrepancy error model we demonstrate that on this synthetic data set the use of our workflow performs better in 80 per cent of the cases compared to the existing practice of using constant errors. Finally, our workflow is benchmarked against real data acquired in Queensland, Australia, and shows its ability to detect the depth to basement accurately.

Download Full-text

A SYNTHETIC DATA SET OF 3D OOCYTE IMAGES AND MACHINE LEARNING ALGORITHM AS A MODEL TO ASSESS THE REPRODUCTIVE POTENTIAL OF OOCYTES

Fertility and Sterility ◽

10.1016/j.fertnstert.2020.08.424 ◽

2020 ◽

Vol 114 (3) ◽

pp. e145

Author(s):

Gerard Letterie ◽

Nathan Kundtz

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

Reproductive Potential ◽

Synthetic Data ◽

Machine Learning Algorithm ◽

Data Set

Download Full-text

Credit Card Fraud Detection through Parenclitic Network Analysis

Complexity ◽

10.1155/2018/5764370 ◽

2018 ◽

Vol 2018 ◽

pp. 1-9 ◽

Cited By ~ 10

Author(s):

Massimiliano Zanin ◽

Miguel Romance ◽

Santiago Moral ◽

Regino Criado

Keyword(s):

Data Mining ◽

Complex Network ◽

Credit Card ◽

Reference Group ◽

Detection System ◽

Fraud Detection ◽

Data Representation ◽

Classification Algorithm ◽

Data Set ◽

Economic Implications

The detection of frauds in credit card transactions is a major topic in financial research, of profound economic implications. While this has hitherto been tackled through data analysis techniques, the resemblances between this and other problems, like the design of recommendation systems and of diagnostic/prognostic medical tools, suggest that a complex network approach may yield important benefits. In this paper we present a first hybrid data mining/complex network classification algorithm, able to detect illegal instances in a real card transaction data set. It is based on a recently proposed network reconstruction algorithm that allows creating representations of the deviation of one instance from a reference group. We show how the inclusion of features extracted from the network data representation improves the score obtained by a standard, neural network-based classification algorithm and additionally how this combined approach can outperform a commercial fraud detection system in specific operation niches. Beyond these specific results, this contribution represents a new example on how complex networks and data mining can be integrated as complementary tools, with the former providing a view to data beyond the capabilities of the latter.

Download Full-text

Lost in Space: Geolocation in Event Data

Political Science Research and Methods ◽

10.1017/psrm.2018.23 ◽

2018 ◽

Vol 7 (04) ◽

pp. 871-888 ◽

Cited By ~ 6

Author(s):

Sophie J. Lee ◽

Howard Liu ◽

Michael D. Ward

Keyword(s):

Learning Algorithm ◽

Text Processing ◽

Contextual Information ◽

Training Data ◽

Supervised Machine Learning ◽

Model Parameters ◽

Event Data ◽

Data Set ◽

N Gram ◽

Automated Text Processing

Improving geolocation accuracy in text data has long been a goal of automated text processing. We depart from the conventional method and introduce a two-stage supervised machine-learning algorithm that evaluates each location mention to be either correct or incorrect. We extract contextual information from texts, i.e., N-gram patterns for location words, mention frequency, and the context of sentences containing location words. We then estimate model parameters using a training data set and use this model to predict whether a location word in the test data set accurately represents the location of an event. We demonstrate these steps by constructing customized geolocation event data at the subnational level using news articles collected from around the world. The results show that the proposed algorithm outperforms existing geocoders even in a case added post hoc to test the generality of the developed algorithm.

Download Full-text

MALGRA: Machine Learning and N-Gram Malware Feature Extraction and Detection System

Electronics ◽

10.3390/electronics9111777 ◽

2020 ◽

Vol 9 (11) ◽

pp. 1777

Author(s):

Muhammad Ali ◽

Stavros Shiaeles ◽

Gueltoum Bendiab ◽

Bogdan Ghita

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

Detection System ◽

Malware Detection ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Detection Methods ◽

Normal Operation ◽

Analysis Technique ◽

N Gram

Detection and mitigation of modern malware are critical for the normal operation of an organisation. Traditional defence mechanisms are becoming increasingly ineffective due to the techniques used by attackers such as code obfuscation, metamorphism, and polymorphism, which strengthen the resilience of malware. In this context, the development of adaptive, more effective malware detection methods has been identified as an urgent requirement for protecting the IT infrastructure against such threats, and for ensuring security. In this paper, we investigate an alternative method for malware detection that is based on N-grams and machine learning. We use a dynamic analysis technique to extract an Indicator of Compromise (IOC) for malicious files, which are represented using N-grams. The paper also proposes TF-IDF as a novel alternative used to identify the most significant N-grams features for training a machine learning algorithm. Finally, the paper evaluates the proposed technique using various supervised machine-learning algorithms. The results show that Logistic Regression, with a score of 98.4%, provides the best classification accuracy when compared to the other classifiers used.

Download Full-text

Effective and efficient network anomaly detection system using machine learning algorithm

Bulletin of Electrical Engineering and Informatics ◽

10.11591/eei.v8i1.1387 ◽

2019 ◽

Vol 8 (1) ◽

pp. 46-51 ◽

Cited By ~ 7

Author(s):

Mukrimah Nawir ◽

Amiza Amir ◽

Naimah Yaakob ◽

Ong Bi Lynn

Keyword(s):

Machine Learning ◽

Anomaly Detection ◽

Processing Time ◽

Learning Algorithm ◽

Detection System ◽

Malicious Code ◽

Supervised Machine Learning ◽

Machine Learning Algorithm ◽

Network Anomaly Detection ◽

Anomaly Detection System

Network anomaly detection system enables to monitor computer network that behaves differently from the network protocol and it is many implemented in various domains. Yet, the problem arises where different application domains have different defining anomalies in their environment. These make a difficulty to choose the best algorithms that suit and fulfill the requirements of certain domains and it is not straightforward. Additionally, the issue of centralization that cause fatal destruction of network system when powerful malicious code injects in the system. Therefore, in this paper we want to conduct experiment using supervised Machine Learning (ML) for network anomaly detection system that low communication cost and network bandwidth minimized by using UNSW-NB15 dataset to compare their performance in term of their accuracy (effective) and processing time (efficient) for a classifier to build a model. Supervised machine learning taking account the important features by labelling it from the datasets. The best machine learning algorithm for network dataset is AODE with a comparable accuracy is 97.26% and time taken approximately 7 seconds. Also, distributed algorithm solves the issue of centralization with the accuracy and processing time still a considerable compared to a centralized algorithm even though a little drop of the accuracy and a bit longer time needed.

Download Full-text

A Self-Supervised Machine Learning Approach for Objective Live Cell Segmentation and Analysis

10.21203/rs.3.rs-147010/v1 ◽

2021 ◽

Author(s):

Marc Raphael ◽

Michael Robitaille ◽

Jeff Byers ◽

Joseph Christodoulides

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

Learning Algorithms ◽

Label Cell ◽

Live Cell ◽

Machine Learning Algorithms ◽

Classification Model ◽

Supervised Machine Learning ◽

Cell Segmentation ◽

Data Set

Abstract Machine learning algorithms hold the promise of greatly improving live cell image analysis by way of (1) analyzing far more imagery than can be achieved by more traditional manual approaches and (2) by eliminating the subjective nature of researchers and diagnosticians selecting the cells or cell features to be included in the analyzed data set. Currently, however, even the most sophisticated model based or machine learning algorithms require user supervision, meaning the subjectivity problem is not removed but rather incorporated into the algorithm’s initial training steps and then repeatedly applied to the imagery. To address this roadblock, we have developed a self-supervised machine learning algorithm that recursively trains itself directly from the live cell imagery data, thus providing objective segmentation and quantification. The approach incorporates an optical flow algorithm component to self-label cell and background pixels for training, followed by the extraction of additional feature vectors for the automated generation of a cell/background classification model. Because it is self-trained, the software has no user-adjustable parameters and does not require curated training imagery. The algorithm was applied to automatically segment cells from their background for a variety of cell types and five commonly used imaging modalities - fluorescence, phase contrast, differential interference contrast (DIC), transmitted light and interference reflection microscopy (IRM). The approach is broadly applicable in that it enables completely automated cell segmentation for long-term live cell phenotyping applications, regardless of the input imagery’s optical modality, magnification or cell type.

Download Full-text