Nonuniform Granularity-Based Classification in Social Interest Detection

Mathematical Problems in Engineering ◽

10.1155/2017/5054825 ◽

2017 ◽

Vol 2017 ◽

pp. 1-10

Author(s):

Wenjuan Shao ◽

Qingguo Shen ◽

Xianli Jin ◽

Liaoruo Huang ◽

Jingjing Chen

Keyword(s):

Large Scale ◽

Clustering Algorithm ◽

Nearest Neighbor ◽

Social Interest ◽

Classification Performance ◽

Classification Algorithm ◽

Computing Paradigm ◽

The Social ◽

Interest Detection

Social interest detection is a new computing paradigm which processes a great variety of large scale resources. Effective classification of these resources is necessary for the social interest detection. In this paper, we describe some concepts and principles about classification and present a novel classification algorithm based on nonuniform granularity. Clustering algorithm is used to generate a clustering pedigree chart. By using suitable classification cutting values to cut the chart, we can get different branches which are used as categories. The size of cutting value is vital to the performance and can be dynamically adapted in the proposed algorithm. Experiments results carried on the blog posts illustrate the effectiveness of the proposed algorithm. Furthermore, the results for comparing with Naive Bayes, k-nearest neighbor, and so forth validate the better classification performance of the proposed algorithm for large scale resources.

Download Full-text

A High Performace of Local Binary Pattern on Classify Javanese Character Classification

Scientific Journal of Informatics ◽

10.15294/sji.v5i1.14017 ◽

2018 ◽

Vol 5 (1) ◽

pp. 8 ◽

Cited By ~ 1

Author(s):

Ajib Susanto ◽

Daurat Sinaga ◽

Christy Atika Sari ◽

Eko Hari Rachmawanto ◽

De Rosal Ignatius Moses Setiadi

Keyword(s):

Feature Extraction ◽

Image Classification ◽

Local Binary Pattern ◽

Nearest Neighbor ◽

Classification Algorithm ◽

K Nearest Neighbor ◽

Characteristic Extraction ◽

Research Objects ◽

Character Classification

The classification of Javanese character images is done with the aim of recognizing each character. The selected classification algorithm is K-Nearest Neighbor (KNN) at K = 1, 3, 5, 7, and 9. To improve KNN performance in Javanese character written by the author, and to prove that feature extraction is needed in the process image classification of Javanese character. In this study selected Local Binary Patter (LBP) as a feature extraction because there are research objects with a certain level of slope. The LBP parameters are used between [16 16], [32 32], [64 64], [128 128], and [256 256]. Experiments were performed on 80 training drawings and 40 test images. KNN values after combination with LBP characteristic extraction were 82.5% at K = 3 and LBP parameters [64 64].

Download Full-text

Tissue classification of large-scale multi-site MR data using fuzzy k-nearest neighbor method

10.1117/12.2216625 ◽

2016 ◽

Cited By ~ 4

Author(s):

Ali Ghayoor ◽

Jane S. Paulsen ◽

Regina E. Y. Kim ◽

Hans J. Johnson

Keyword(s):

Large Scale ◽

Nearest Neighbor ◽

K Nearest Neighbor ◽

Tissue Classification

Download Full-text

Log-Based Anomaly Detection with the Improved K-Nearest Neighbor

International Journal of Software Engineering and Knowledge Engineering ◽

10.1142/s0218194020500114 ◽

2020 ◽

Vol 30 (02) ◽

pp. 239-262 ◽

Cited By ~ 1

Author(s):

Bingming Wang ◽

Shi Ying ◽

Guoli Cheng ◽

Rui Wang ◽

Zhe Yang ◽

...

Keyword(s):

Anomaly Detection ◽

Large Scale ◽

Clustering Algorithm ◽

Nearest Neighbor ◽

Keyword Search ◽

Mean Shift ◽

Recall Rate ◽

K Nearest Neighbor ◽

Mean Shift Clustering ◽

Negative Effect

Logs play an important role in the maintenance of large-scale systems. The number of logs which indicate normal (normal logs) differs greatly from the number of logs that indicate anomalies (abnormal logs), and the two types of logs have certain differences. To automatically obtain faults by K-Nearest Neighbor (KNN) algorithm, an outlier detection method with high accuracy, is an effective way to detect anomalies from logs. However, logs have the characteristics of large scale and very uneven samples, which will affect the results of KNN algorithm on log-based anomaly detection. Thus, we propose an improved KNN algorithm-based method which uses the existing mean-shift clustering algorithm to efficiently select the training set from massive logs. Then we assign different weights to samples with different distances, which reduces the negative effect of unbalanced distribution of the log samples on the accuracy of KNN algorithm. By comparing experiments on log sets from five supercomputers, the results show that the method we proposed can be effectively applied to log-based anomaly detection, and the accuracy, recall rate and F measure with our method are higher than those of traditional keyword search method.

Download Full-text

Classification of Sentinel-2 Images Utilizing Abundance Representation

Proceedings ◽

10.3390/ecrs-2-05141 ◽

2018 ◽

Vol 2 (7) ◽

pp. 328 ◽

Cited By ~ 6

Author(s):

Eleftheria Mylona ◽

Vassiliki Daskalopoulou ◽

Olga Sykioti ◽

Konstantinos Koutroumbas ◽

Athanasios Rontogiannis

Keyword(s):

Clustering Algorithm ◽

Nearest Neighbor ◽

Unsupervised Classification ◽

Bare Soil ◽

Spectral Unmixing ◽

Support Vector ◽

Endmember Extraction ◽

Bayes Algorithm ◽

Sentinel 2

This paper deals with (both supervised and unsupervised) classification of multispectral Sentinel-2 images, utilizing the abundance representation of the pixels of interest. The latter pixel representation uncovers the hidden structured regions that are not often available in the reference maps. Additionally, it encourages class distinctions and bolsters accuracy. The adopted methodology, which has been successfully applied to hyperpsectral data, involves two main stages: (I) the determination of the pixel’s abundance representation; and (II) the employment of a classification algorithm applied to the abundance representations. More specifically, stage (I) incorporates two key processes, namely (a) endmember extraction, utilizing spectrally homogeneous regions of interest (ROIs); and (b) spectral unmixing, which hinges upon the endmember selection. The adopted spectral unmixing process assumes the linear mixing model (LMM), where each pixel is expressed as a linear combination of the endmembers. The pixel’s abundance vector is estimated via a variational Bayes algorithm that is based on a suitably defined hierarchical Bayesian model. The resulting abundance vectors are then fed to stage (II), where two off-the-shelf supervised classification approaches (namely nearest neighbor (NN) classification and support vector machines (SVM)), as well as an unsupervised classification process (namely the online adaptive possibilistic c-means (OAPCM) clustering algorithm), are adopted. Experiments are performed on a Sentinel-2 image acquired for a specific region of the Northern Pindos National Park in north-western Greece containing water, vegetation and bare soil areas. The experimental results demonstrate that the ad-hoc classification approaches utilizing abundance representations of the pixels outperform those utilizing the spectral signatures of the pixels in terms of accuracy.

Download Full-text

Spatial-Adaptive Siamese Residual Network for Multi-/Hyperspectral Classification

Remote Sensing ◽

10.3390/rs12101640 ◽

2020 ◽

Vol 12 (10) ◽

pp. 1640 ◽

Cited By ~ 1

Author(s):

Zhi He ◽

Dan He

Keyword(s):

Large Scale ◽

Nearest Neighbor ◽

Three Dimensional ◽

Classification Performance ◽

Majority Voting ◽

Residual Network ◽

Training Samples ◽

Simple Linear Iterative Clustering ◽

Limited Training Samples ◽

Hyperspectral Classification

Deep learning methods have been successfully applied for multispectral and hyperspectral images classification due to their ability to extract hierarchical abstract features. However, the performance of these methods relies heavily on large-scale training samples. In this paper, we propose a three-dimensional spatial-adaptive Siamese residual network (3D-SaSiResNet) that requires fewer samples and still enhances the performance. The proposed method consists of two main steps: construction of 3D spatial-adaptive patches and Siamese residual network for multiband images classification. In the first step, the spectral dimension of the original multiband images is reduced by a stacked autoencoder and superpixels of each band are obtained by the simple linear iterative clustering (SLIC) method. Superpixels of the original multiband image can be finally generated by majority voting. Subsequently, the 3D spatial-adaptive patch of each pixel is extracted from the original multiband image by reference to the previously generated superpixels. In the second step, a Siamese network composed of two 3D residual networks is designed to extract discriminative features for classification and we train the 3D-SaSiResNet by pairwise inputting the training samples into the networks. The testing samples are then fed into the trained 3D-SaSiResNet and the learned features of the testing samples are classified by the nearest neighbor classifier. Experimental results on three multiband image datasets show the feasibility of the proposed method in enhancing classification performance even with limited training samples.

Download Full-text

Phytosociological study of the forest vegetation of Kyiv urban area (Ukraine)

Hacquetia ◽

10.2478/hacq-2019-0012 ◽

2020 ◽

Vol 19 (1) ◽

pp. 99-126

Author(s):

Igor V. Goncharenko ◽

Halina M. Yatsenko

Keyword(s):

Urban Area ◽

Large Scale ◽

Clustering Algorithm ◽

Urban Forests ◽

Forest Vegetation ◽

Sorting Algorithm ◽

New Associations ◽

Floristic Analysis ◽

Phytosociological Study

AbstractThe study presents a floristic-sociological classification of the forest vegetation of Kyiv urban area. We identified 18 syntaxa within 7 classes, 7 orders, 8 alliances, and 3 new associations were allocated (Aristolochio clematitis-Populetum nigrae, Galio aparines-Aceretum negundi, Dryopterido carthusianae-Pinetum sylvestris). We analyzed vegetation data using quantitative approaches of ordination and phytoindication. Considering many relevés of transitional nature in the collected data on urban forests, the clustering algorithm of DRSA (Distance-Ranked Sorting Algorithm) was applied to classify vegetation matrix. Large-scale comparative floristic analysis of syntaxa from different regions and countries have been conducted and summarized in differentiating tables.

Download Full-text

To the Problem of Normative Data in Pathopsychological Diagnostics

Clinical Psychology and Special Education ◽

10.17759/cpse.2017060207 ◽

2017 ◽

Vol 6 (2) ◽

pp. 83-96 ◽

Cited By ~ 2

Author(s):

A. Sultanova ◽

I.A. Ivanova

Keyword(s):

Pilot Study ◽

Healthy Subjects ◽

Normative Data ◽

Large Scale ◽

Experimental Studies ◽

Test Interpretation ◽

Social Changes ◽

Personal Attitude ◽

The Social

The article raises the question of the actuality level of normative data. This kind of data is necessary to compare the results of experimental studies with it, according to the traditions of Russian psychology. It can be assumed that the social changes that took place in the last decades should reflect on the process of forming of thinking and other mental functions. A pilot study for identifying the features of performing of classical pathopsychological techniques by healthy subjects was conducted. The study involved mentally healthy and socially adapted people of 20-39 years old, graduated or undergraduated. We used next several techniques: "Classification of objects", "Pictogram", filling in words missed in the text (Ebbinghaus test), "Interpretation of proverbs". The results of the experiment made it possible to identify two areas in which the changes were most significant. These spheres are emotional-motivational (personality) and thinking. Many subjects were characterized by: a wary-anxious attitude to the experiment, increased emotional - personal attitude to the stimuli material, a decrease in criticality to the results of their activities, neurodynamic disorders, inconsistency of thinking, versatility of thinking, a tendency to resonate, self-centered thinking (according to the authors these features are manifested in the form of "pathopsychology of everyday life" in everyday lifestyle). It is necessary to conduct special large-scale scientific research devoted to this problem.

Download Full-text

dropClust: Efficient clustering of ultra-large scRNA-seq data

10.1101/170308 ◽

2017 ◽

Cited By ~ 2

Author(s):

Debajyoti Sinha ◽

Akhilesh Kumar ◽

Himanshu Kumar ◽

Sanghamitra Bandyopadhyay ◽

Debarka Sengupta

Keyword(s):

Single Cell ◽

Large Scale ◽

Best Practice ◽

Clustering Algorithm ◽

Nearest Neighbor ◽

De Novo ◽

Single Cells ◽

Nearest Neighbor Search ◽

Locality Sensitive Hashing ◽

Clustering Methods

ABSTRACTDroplet based single cell transcriptomics has recently enabled parallel screening of tens of thousands of single cells. Clustering methods that scale for such high dimensional data without compromising accuracy are scarce. We exploit Locality Sensitive Hashing, an approximate nearest neighbor search technique to develop ade novoclustering algorithm for large-scale single cell data. On a number of real datasets, dropClust outperformed the existing best practice methods in terms of execution time, clustering accuracy and detectability of minor cell sub-types.

Download Full-text

Introduction

10.1093/oso/9780198701378.003.0001 ◽

2018 ◽

Author(s):

Clive Holes

Keyword(s):

Social Factors ◽

Large Scale ◽

Early History ◽

Small Scale ◽

Linguistic Change ◽

The Social ◽

History Of ◽

Arabic Dialects ◽

Early Occurrence

This chapter outlines the scholarly background of the study of Arabic historical dialectology, and addresses the following issues: the early history of Arabic: myth and reality; the definition and exemplification of ‘Middle Arabic’ and ‘Mixed Arabic through history’; evidence for the early occurrence of certain Arabic dialectal features; examples of substrates and borrowing in Arabic dialects; the dialect geography of Arabic and its typology, especially the ‘sedentary’ and ‘bedouin’ divide; how and why dialects have undergone change, large-scale and small-scale, and the causative social factors; a classification of the typology of internal linguistic change in Arabic; causes of the social indexicalization of dialectal features of Arabic; examples of the pidginization and creolization of Arabic, and the reasons for the apparent rarity of this phenomenon.

Download Full-text

Human Activity Recognition from the Acceleration Data of a Wearable Device. Which Features Are More Relevant by Activities?

Proceedings ◽

10.3390/proceedings2191242 ◽

2018 ◽

Vol 2 (19) ◽

pp. 1242 ◽

Cited By ~ 8

Author(s):

Macarena Espinilla ◽

Javier Medina ◽

Alberto Salguero ◽

Naomi Irvine ◽

Mark Donnelly ◽

...

Keyword(s):

Activity Recognition ◽

Human Activity ◽

Large Scale ◽

Nearest Neighbor ◽

Human Activity Recognition ◽

Classification Algorithm ◽

Wearable Device ◽

K Nearest Neighbor ◽

Multiple Features ◽

Acceleration Data

Data driven approaches for human activity recognition learn from pre-existent large-scale datasets to generate a classification algorithm that can recognize target activities. Typically, several activities are represented within such datasets, characterized by multiple features that are computed from sensor devices. Often, some features are found to be more relevant to particular activities, which can lead to the classification algorithm providing less accuracy in detecting the activity where such features are not so relevant. This work presents an experimentation for human activity recognition with features derived from the acceleration data of a wearable device. Specifically, this work analyzes which features are most relevant for each activity and furthermore investigates which classifier provides the best accuracy with those features. The results obtained indicate that the best classifier is the k-nearest neighbor and furthermore, confirms that there do exist redundant features that generally introduce noise into the classification, leading to decreased accuracy.

Download Full-text