Do birds of a feather really flock together, or how to choose training samples for authorship attribution

M. Eder; J. Rybicki

doi:10.1093/llc/fqs036

Authorship Attribution With Few Training Samples

Machine Learning for Authorship Attribution and Cyber Forensics - International Series on Computer Entertainment and Media Technology ◽

10.1007/978-3-030-61675-5_6 ◽

2020 ◽

pp. 75-87

Author(s):

Farkhund Iqbal ◽

Mourad Debbabi ◽

Benjamin C. M. Fung

Keyword(s):

Authorship Attribution ◽

Training Samples

Download Full-text

Efficient Learning Method for Human Detection based on Automatic Generation of Training Samples with the Negative-Bag MILBoost

IEEJ Transactions on Electronics Information and Systems ◽

10.1541/ieejeiss.134.450 ◽

2014 ◽

Vol 134 (3) ◽

pp. 450-458

Author(s):

Masamitsu Tsuchiya ◽

Yuji Yamauchi ◽

Hironobu Fujiyoshi

Keyword(s):

Automatic Generation ◽

Human Detection ◽

Learning Method ◽

Training Samples ◽

Efficient Learning

Download Full-text

Authorship Attribution of YeomMa , the Novel Under the Pseudonym Seo Dong-san, by Applying Novel Corpus

The Journal of Language & Literature ◽

10.15565/jll.2019.06.78.63 ◽

2019 ◽

Vol 78 ◽

pp. 63-91

Author(s):

Hanbyoul Moon ◽

Dogil Lee

Keyword(s):

Authorship Attribution ◽

The Novel

Download Full-text

A study on authorship attribution of Chinese texts based on discourse information analysis

International Journal of Speech Language and the Law ◽

10.1558/ijsll.v23i1.28304 ◽

2016 ◽

Vol 23 (1) ◽

pp. 147-150

Author(s):

Shaomin Zhang

Keyword(s):

Authorship Attribution ◽

Information Analysis ◽

Chinese Texts

Download Full-text

Not All Character N-grams Are Created Equal: A Study in Authorship Attribution

10.3115/v1/n15-1010 ◽

2015 ◽

Cited By ~ 43

Author(s):

Upendra Sapkota ◽

Steven Bethard ◽

Manuel Montes ◽

Thamar Solorio

Keyword(s):

Authorship Attribution

Download Full-text

Authorship attribution added to "Inertial technology for the future"

IEEE Transactions on Aerospace and Electronic Systems ◽

10.1109/taes.1984.310471 ◽

1984 ◽

Vol AES-20 (6) ◽

pp. 834-834 ◽

Cited By ~ 2

Author(s):

Daniel B. DeBra

Keyword(s):

Authorship Attribution ◽

The Future

Download Full-text

Siamese Reconstruction Network: Accurate Image Reconstruction from Human Brain Activity by Learning to Compare

Applied Sciences ◽

10.3390/app9224749 ◽

2019 ◽

Vol 9 (22) ◽

pp. 4749

Author(s):

Lingyun Jiang ◽

Kai Qiao ◽

Linyuan Wang ◽

Chi Zhang ◽

Jian Chen ◽

...

Keyword(s):

Deep Learning ◽

Human Brain ◽

Brain Activity ◽

Feature Space ◽

Training Data ◽

Reconstruction Method ◽

Learning Method ◽

Training Samples ◽

Visual Reconstruction ◽

Relationship Of

Decoding human brain activities, especially reconstructing human visual stimuli via functional magnetic resonance imaging (fMRI), has gained increasing attention in recent years. However, the high dimensionality and small quantity of fMRI data impose restrictions on satisfactory reconstruction, especially for the reconstruction method with deep learning requiring huge amounts of labelled samples. When compared with the deep learning method, humans can recognize a new image because our human visual system is naturally capable of extracting features from any object and comparing them. Inspired by this visual mechanism, we introduced the mechanism of comparison into deep learning method to realize better visual reconstruction by making full use of each sample and the relationship of the sample pair by learning to compare. In this way, we proposed a Siamese reconstruction network (SRN) method. By using the SRN, we improved upon the satisfying results on two fMRI recording datasets, providing 72.5% accuracy on the digit dataset and 44.6% accuracy on the character dataset. Essentially, this manner can increase the training data about from n samples to 2n sample pairs, which takes full advantage of the limited quantity of training samples. The SRN learns to converge sample pairs of the same class or disperse sample pairs of different class in feature space.

Download Full-text

Author identification of short texts using dependency treebanks without vocabulary

Digital Scholarship in the Humanities ◽

10.1093/llc/fqz070 ◽

2019 ◽

Vol 35 (4) ◽

pp. 812-825 ◽

Cited By ~ 1

Author(s):

Robert Gorman

Keyword(s):

Text Classification ◽

Authorship Attribution ◽

Support Vector ◽

Ancient Greek ◽

Combinatorial Explosion ◽

Independent Variables ◽

Author Identification ◽

Important Addition ◽

Digital Methods ◽

And Control

Abstract How to classify short texts effectively remains an important question in computational stylometry. This study presents the results of an experiment involving authorship attribution of ancient Greek texts. These texts were chosen to explore the effectiveness of digital methods as a supplement to the author’s work on text classification based on traditional stylometry. Here it is crucial to avoid confounding effects of shared topic, etc. Therefore, this study attempts to identify authorship using only morpho-syntactic data without regard to specific vocabulary items. The data are taken from the dependency annotations published in the Ancient Greek and Latin Dependency Treebank. The independent variables for classification are combinations generated from the dependency label and the morphology of each word in the corpus and its dependency parent. To avoid the effects of the combinatorial explosion, only the most frequent combinations are retained as input features. The authorship classification (with thirteen classes) is done with standard algorithms—logistic regression and support vector classification. During classification, the corpus is partitioned into increasingly smaller ‘texts’. To explore and control for the possible confounding effects of, e.g. different genre and annotator, three corpora were tested: a mixed corpus of several genres of both prose and verse, a corpus of prose including oratory, history, and essay, and a corpus restricted to narrative history. Results are surprisingly good as compared to those previously published. Accuracy for fifty-word inputs is 84.2–89.6%. Thus, this approach may prove an important addition to the prevailing methods for small text classification.

Download Full-text

ER rule classifier with an optimization operator recommendation

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-210629 ◽

2021 ◽

pp. 1-13

Author(s):

Xiaoyan Wang ◽

Jianbin Sun ◽

Qingsong Zhao ◽

Yaqian You ◽

Jiang Jiang

Keyword(s):

Classification Accuracy ◽

Comprehensive Analysis ◽

Small Sample ◽

Empirical Knowledge ◽

Classification Methods ◽

Turbofan Engine ◽

Training Samples ◽

Optimization Operator ◽

Recommendation Strategy

It is difficult for many classic classification methods to consider expert experience and classify small-sample datasets well. The evidential reasoning rule (ER rule) classifier can solve these problems. The ER rule has strong processing and comprehensive analysis abilities for diversified mixed information and can solve problems with expert experience effectively. Moreover, the initial parameters of the classifier constructed based on the ER rule can be set according to empirical knowledge instead of being trained by a large number of samples, which can help the classifier classify small-sample datasets well. However, the initial parameters of the ER rule classifier need to be optimized, and choosing the best optimization algorithm is still a challenge. Considering these problems, the ER rule classifier with an optimization operator recommendation is proposed in this paper. First, the initial ER rule classifier is constructed based on training samples and expert experience. Second, the adjustable parameters are optimized, in which the optimization operator recommendation strategy is applied to select the best algorithm by partial samples, and then experiments with full samples are carried out. Finally, a case study on a turbofan engine degradation simulation dataset is carried out, and the results indicate that the ER rule classifier has a higher classification accuracy than other classic classifiers, which demonstrates the capability and effectiveness of the proposed ER rule classifier with an optimization operator recommendation.

Download Full-text

On Combining DeepSnake and Global Saliency for Detection of Orchard Apples

Applied Sciences ◽

10.3390/app11146269 ◽

2021 ◽

Vol 11 (14) ◽

pp. 6269

Author(s):

Wang Jing ◽

Wang Leqi ◽

Han Yanling ◽

Zhang Yun ◽

Zhou Ruyan

Keyword(s):

Saliency Detection ◽

Color Difference ◽

Saliency Map ◽

Apple Fruit ◽

Initial Contour ◽

Detection Accuracy ◽

Natural Environments ◽

Fast Detection ◽

Single Target ◽

Training Samples

For the fast detection and recognition of apple fruit targets, based on the real-time DeepSnake deep learning instance segmentation model, this paper provided an algorithm basis for the practical application and promotion of apple picking robots. Since the initial detection results have an important impact on the subsequent edge prediction, this paper proposed an automatic detection method for apple fruit targets in natural environments based on saliency detection and traditional color difference methods. Combined with the original image, the histogram backprojection algorithm was used to further optimize the salient image results. A dynamic adaptive overlapping target separation algorithm was proposed to locate the single target fruit and further to determine the initial contour for DeepSnake, in view of the possible overlapping fruit regions in the saliency map. Finally, the target fruit was labeled based on the segmentation results of the examples. In the experiment, 300 training datasets were used to train the DeepSnake model, and the self-built dataset containing 1036 pictures of apples in various situations under natural environment was tested. The detection accuracy of target fruits under non-overlapping shaded fruits, overlapping fruits, shaded branches and leaves, and poor illumination conditions were 99.12%, 94.78%, 90.71%, and 94.46% respectively. The comprehensive detection accuracy was 95.66%, and the average processing time was 0.42 s in 1036 test images, which showed that the proposed algorithm can effectively separate the overlapping fruits through a not-very-large training samples and realize the rapid and accurate detection of apple targets.

Download Full-text