Graph-based recommendations: from data representation to feature extraction and application

FREEMAN CHAIN CODE AS REPRESENTATION IN OFFLINE SIGNATURE VERIFICATION SYSTEM

Jurnal Teknologi ◽

10.11113/jt.v78.9546 ◽

2016 ◽

Vol 78 (8-2) ◽

Author(s):

Aini Najwa Azmi ◽

Dewi Nasien ◽

Azurah Abu Samah

Keyword(s):

Pattern Recognition ◽

Feature Extraction ◽

Rejection Rate ◽

Data Representation ◽

Signature Verification ◽

Global Features ◽

Chain Code ◽

Verification System ◽

Offline Signature Verification ◽

Handwritten Signature

Over recent years, there has been an explosive growth of interest in the pattern recognition. For example, handwritten signature is one of human biometric that can be used in many areas in terms of access control and security. However, handwritten signature is not a uniform characteristic such as fingerprint, iris or vein. It may change to several factors; mood, environment and age. Signature Verification System (SVS) is a part of pattern recognition that can be a solution for such situation. The system can be decomposed into three stages: data acquisition and preprocessing, feature extraction and verification. This paper presents techniques for SVS that uses Freeman chain code (FCC) as data representation. In the first part of feature extraction stage, the FCC was extracted by using boundary-based style on the largest contiguous part of the signature images. The extracted FCC was divided into four, eight or sixteen equal parts. In the second part of feature extraction, six global features were calculated. Finally, verification utilized k-Nearest Neighbour (k-NN) to test the performance. MCYT bimodal database was used in every stage in the system. Based on our systems, the best result achieved was False Rejection Rate (FRR) 14.67%, False Acceptance Rate (FAR) 15.83% and Equal Error Rate (EER) 0.43% with shortest computation, 7.53 seconds and 47 numbers of features.

Download Full-text

The Ideal Data Representation for Feature Extraction of Traditional Malay Musical Instrument Sounds Classification

Lecture Notes in Computer Science - Advanced Intelligent Computing Theories and Applications ◽

10.1007/978-3-642-14922-1_43 ◽

2010 ◽

pp. 345-353 ◽

Cited By ~ 4

Author(s):

Norhalina Senan ◽

Rosziati Ibrahim ◽

Nazri Mohd Nawi ◽

Musa Mohd Mokji ◽

Tutut Herawan

Keyword(s):

Feature Extraction ◽

Musical Instrument ◽

Data Representation ◽

The Ideal

Download Full-text

De-Anonymizing Authors of Electronic Texts: A Survey on Electronic Text Stylometry

10.20944/preprints201708.0003.v1 ◽

2017 ◽

Author(s):

Mahmoud Khonji ◽

Youssef Iraqi

Keyword(s):

Social Media ◽

Feature Extraction ◽

Comprehensive Evaluation ◽

Relative Effectiveness ◽

Extraction Methods ◽

Data Representation ◽

Electronic Text ◽

Electronic Texts ◽

Survey Paper ◽

Author Identification

Electronic text stylometry is a collection of forensics methods that analyze the writing styles of input electronic texts in order to extract information about authors of the input electronic texts. Such extracted information could be the identity of the authors, or aspects of the authors, such as their gender, age group, ethnicity, etc. This survey paper presents the following contributions: 1) A description of all stylometry problems in probability terms, under a unified notation. To the best of our knowledge, this is the most comprehensive definition to date. 2) A survey of key methods, with a particular attention to data representation (or feature extraction) methods. 3) An evaluation of 23,760 feature extraction methods, which is the most comprehensive evaluation of feature extraction methods in the literature of stylometry to date. The importance of this evaluation is two fold. First, identifying the relative effectiveness of the features (since, currently, many are not evaluated jointly; e.g. syntactic n-grams are not evaluated against k-skip n-grams, and so forth). Second, thanks to our generalizations, we could evaluate novel grams, such as what we name compound grams. 4) The release of our associated Python feature extraction library, namely Fextractor. Essentially, the library generalizes all existing n-gram based feature extraction methods under the "at least l-frequent, dir-directed, k-skipped n-grams'', and allows grams to be diversely defined, including definitions that are based on high-level grammatical aspects, such as POS tags, as well as lower-level ones, such as distribution of function words, word shapes, etc. This makes the library, by far, the most extensive in this domain to date. 5) The construction, evaluation, and release of the first dataset for Emirati social media text. This evaluation represents the first evaluation of author identification against Emirati social media texts. Interestingly, we find that, when using our models and feature extraction library (Fextractor), authors could be identified significantly more accurately than what is reported with similarly sized datasets. The dataset also contains sub-datasets that represent other languages (Dutch, English, Greek and Spanish), and our findings are consistent across them.

Download Full-text

Image Mining for the Construction of Semantic-Inference Rules and for the Development of Automatic Image Diagnosis Systems

Medical Informatics ◽

10.4018/978-1-60566-050-9.ch050 ◽

2011 ◽

pp. 682-704

Author(s):

Petra Perner

Keyword(s):

Image Processing ◽

Feature Extraction ◽

Digital Image ◽

Image Understanding ◽

Data Representation ◽

Inference Rules ◽

Image Mining ◽

Semantic Inference ◽

Mining Methods ◽

Image Representations

This chapter introduces image mining as a method to discover implicit, previously unknown and potentially useful information from digital image and video repositories. It argues that image mining is a special discipline because of the special type of data and therefore, image-mining methods that consider the special data representation and the different aspects of image mining have to be developed. Furthermore, a bridge has to be established between image mining and image processing, feature extraction and image understanding since the later topics are concerned with the development of methods for the automatic extraction of higher-level image representations. We introduce our methodology, the developed methods and the system for image mining which we successfully applied to several medical image-diagnostic tasks.

Download Full-text

A novel framework for single-cell Hi-C clustering based on graph-convolution-based imputation and two-phase-based feature extraction

10.1101/2021.04.30.442215 ◽

2021 ◽

Author(s):

Caiwei Zhen ◽

Yuxian Wang ◽

Lu Han ◽

Jingyi Li ◽

Jinghao Peng ◽

...

Keyword(s):

Feature Extraction ◽

Single Cell ◽

Genome Structure ◽

Cell Types ◽

Data Representation ◽

Linear Phase ◽

Cell Level ◽

Two Phase ◽

Feature Extraction Method ◽

Structure Information

The three-dimensional genome structure plays a key role in cellular function and gene regulation. Single-cell Hi-C technology can capture genome structure information at the cell level, which provides the opportunity to study how genome structure varies among different cell types. However, few methods are well designed for single-cell Hi-C clustering, because of high sparsity, noise and heterogeneity of single-cell Hi-C data. In this manuscript, we propose a novel framework, named ScHiC-Rep, for single-cell Hi-C data representation and clustering. ScHiC-Rep mainly contains two parts: data imputation and feature extraction. In the imputation part, a novel imputation workflow is proposed, including graph convolution-based, random walk with restart-based and genomic neighbor-based imputation. In the feature extraction part, a two-phase feature extraction method is proposed, including linear phase for chromosome level and non-linear phase for cell level feature extraction. The evaluation results show that the proposed framework outperforms existing state-of-the-art approaches on both human and mouse datasets.

Download Full-text

The Performance of BERT as Data Representation of Text Clustering

10.21203/rs.3.rs-940164/v1 ◽

2021 ◽

Author(s):

Alvin Subakti ◽

Hendri Murfi ◽

Nora Hariadi

Keyword(s):

Feature Extraction ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Text Clustering ◽

Data Representation ◽

Text Representation ◽

Inverse Document Frequency ◽

Document Frequency ◽

Normalization Methods ◽

Textual Data

Abstract Text clustering is the task of grouping a set of texts so that text in the same group will be more similar than those from a different group. The process of grouping text manually requires a significant amount of time and labor. Therefore, automation utilizing machine learning is necessary. The standard method used to represent textual data is Term Frequency Inverse Document Frequency (TFIDF). However, TFIDF cannot consider the position and context of a word in a sentence. Bidirectional Encoder Representation from Transformers (BERT) model can produce text representation that incorporates the position and context of a word in a sentence. This research analyzed the performance of the BERT model as data representation for text. Moreover, various feature extraction and normalization methods are also applied for the data representation of the BERT model. To examine the performances of BERT, we use four clustering algorithms, i.e., k-means clustering, eigenspace-based fuzzy c-means, deep embedded clustering, and improved deep embedded clustering. Our simulations show that BERT outperforms the standard TFIDF method in 28 out of 36 metrics. Furthermore, different feature extraction and normalization produced varied performances. The usage of these feature extraction and normalization must be altered depending on the text clustering algorithm used.

Download Full-text

Similarity Detection for Higher-Order Structure of DNA Sequences

Journal of Science and Technology Issue on Information and Communications Technology ◽

10.31130/jst.2017.51 ◽

2017 ◽

Vol 3 (2) ◽

pp. 28

Author(s):

Nguyen Thi Ngoc Anh ◽

Ho Phan Hieu ◽

Tran Anh Kiet ◽

Vo Trung Hung

Keyword(s):

Feature Extraction ◽

Dna Sequences ◽

Missing Values ◽

Computation Time ◽

Data Representation ◽

Higher Order ◽

High Order ◽

Order Structure ◽

Similarity Detection ◽

The Matrix

With the advances in data collection and storage capabilities, large amount of multidimensional dataset, known as higher-order data representation, has been generated on bioinformatics applications recently, especially in DNA sequences recognition. This paper thus proposes a mathematical modeling could be capable of the multidimensional problem of DNA similarity detection with high accuracy and reliability. To this end, the paper covers the central issues of multidimensional DNA gene expression data, including: (1) formulating multidimensional DNA data into higher-order representation; (2) recovering missing values; (3) decomposing high-order DNA data directly from their tensorial representation to extracted useful information for classification. Consequently, an exploring a novel type of third-order microarray expression, termed as gene - sample - time (GST), is presented for biological sample classification. The contributions will be distributed along two main thrusts of effectiveness; including latent modeling setting for imputing missing values based on the High-Order Kalman Filter and feature extraction based on Tensor Discriminative Feature Extraction. The experimental performance on real dataset of DNA sequences corroborates the advantages of the proposed approaches upon those of the matrix-based algorithms and recent tensor-based, discriminant-decomposition, in terms of missing values completion, classiﬁcation accuracy and computation time.

Download Full-text

Combining Implicit and Explicit Feature Extraction for Eye Tracking: Attention Classification Using a Heterogeneous Input

Sensors ◽

10.3390/s21248205 ◽

2021 ◽

Vol 21 (24) ◽

pp. 8205

Author(s):

Lisa-Marie Vortmann ◽

Felix Putze

Keyword(s):

Feature Extraction ◽

Eye Tracking ◽

Data Representation ◽

Machine Learning Algorithms ◽

Tracking Data ◽

Attentional State ◽

Combined Features ◽

Implicit And Explicit ◽

Implicit Data ◽

Task Independence

Statistical measurements of eye movement-specific properties, such as fixations, saccades, blinks, or pupil dilation, are frequently utilized as input features for machine learning algorithms applied to eye tracking recordings. These characteristics are intended to be interpretable aspects of eye gazing behavior. However, prior research has demonstrated that when trained on implicit representations of raw eye tracking data, neural networks outperform these traditional techniques. To leverage the strengths and information of both feature sets, we integrated implicit and explicit eye tracking features in one classification approach in this work. A neural network was adapted to process the heterogeneous input and predict the internally and externally directed attention of 154 participants. We compared the accuracies reached by the implicit and combined features for different window lengths and evaluated the approaches in terms of person- and task-independence. The results indicate that combining implicit and explicit feature extraction techniques for eye tracking data improves classification results for attentional state detection significantly. The attentional state was correctly classified during new tasks with an accuracy better than chance, and person-independent classification even outperformed person-dependently trained classifiers for some settings. For future experiments and applications that require eye tracking data classification, we suggest to consider implicit data representation in addition to interpretable explicit features.

Download Full-text

A computing model for trend analysis in stock data stream classiﬁcation

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v19.i3.pp1602-1609 ◽

2020 ◽

Vol 19 (3) ◽

pp. 1602

Author(s):

Abdul Razak ◽

Nirmala C. R

Keyword(s):

Feature Extraction ◽

Trend Analysis ◽

Density Estimation ◽

Data Stream ◽

Data Representation ◽

Stock Trading ◽

Stream Classification ◽

Data Stream Classification ◽

Computing Model ◽

Representation Method

<p>For several decades, many statistical and scientiﬁc efforts took place for the better analysis or prediction of stock trading. But still it is open to offer new avenues for the scientists to rethink and discover new inferences by adopting latest technological scenarios. In this regard, this paper is trying to apply classiﬁcation techniques on stock data stream through feature extraction for the trend analysis. The proposed work is involving k-means for clustering samples into two clusters (the stocks in trend as one cluster and another on as stocks not in trend). The trend analysis is done based on density estimation of the stocks with respect to sectors. A well-known data representation method that is histogram is used to represent the sector which is in trend. This work has been implemented and experimented by considering live NSE (india) data using python and its related tools.</p>

Download Full-text

Image Mining for the Construction of Semantic-Inference Rules and for the Development of Automatic Image Diagnosis Systems

Knowledge Discovery and Data Mining ◽

10.4018/978-1-59904-252-7.ch005 ◽

2011 ◽

pp. 75-97

Author(s):

Petra Perner

Keyword(s):

Image Processing ◽

Feature Extraction ◽

Digital Image ◽

Image Understanding ◽

Data Representation ◽

Inference Rules ◽

Image Mining ◽

Semantic Inference ◽

Mining Methods ◽

Image Representations

This chapter introduces image mining as a method to discover implicit, previously unknown and potentially useful information from digital image and video repositories. It argues that image mining is a special discipline because of the special type of data and therefore, image-mining methods that consider the special data representation and the different aspects of image mining have to be developed. Furthermore, a bridge has to be established between image mining and image processing, feature extraction and image understanding since the later topics are concerned with the development of methods for the automatic extraction of higher-level image representations. We introduce our methodology, the developed methods and the system for image mining which we successfully applied to several medical image-diagnostic tasks.

Download Full-text