scholarly journals Graph-based recommendations: from data representation to feature extraction and application

Author(s):  
Amit Tiroshi ◽  
Tsvi Kuflik ◽  
Shlomo Berkovsky ◽  
MohamedAli Kaafar
2016 ◽  
Vol 78 (8-2) ◽  
Author(s):  
Aini Najwa Azmi ◽  
Dewi Nasien ◽  
Azurah Abu Samah

Over recent years, there has been an explosive growth of interest in the pattern recognition. For example, handwritten signature is one of human biometric that can be used in many areas in terms of access control and security. However, handwritten signature is not a uniform characteristic such as fingerprint, iris or vein. It may change to several factors; mood, environment and age. Signature Verification System (SVS) is a part of pattern recognition that can be a solution for such situation. The system can be decomposed into three stages: data acquisition and preprocessing, feature extraction and verification. This paper presents techniques for SVS that uses Freeman chain code (FCC) as data representation. In the first part of feature extraction stage, the FCC was extracted by using boundary-based style on the largest contiguous part of the signature images. The extracted FCC was divided into four, eight or sixteen equal parts. In the second part of feature extraction, six global features were calculated. Finally, verification utilized k-Nearest Neighbour (k-NN) to test the performance. MCYT bimodal database was used in every stage in the system. Based on our systems, the best result achieved was False Rejection Rate (FRR) 14.67%, False Acceptance Rate (FAR) 15.83% and Equal Error Rate (EER) 0.43% with shortest computation, 7.53 seconds and 47 numbers of features.


Author(s):  
Mahmoud Khonji ◽  
Youssef Iraqi

Electronic text stylometry is a collection of forensics methods that analyze the writing styles of input electronic texts in order to extract information about authors of the input electronic texts. Such extracted information could be the identity of the authors, or aspects of the authors, such as their gender, age group, ethnicity, etc. This survey paper presents the following contributions: 1) A description of all stylometry problems in probability terms, under a unified notation. To the best of our knowledge, this is the most comprehensive definition to date. 2) A survey of key methods, with a particular attention to data representation (or feature extraction) methods. 3) An evaluation of 23,760 feature extraction methods, which is the most comprehensive evaluation of feature extraction methods in the literature of stylometry to date. The importance of this evaluation is two fold. First, identifying the relative effectiveness of the features (since, currently, many are not evaluated jointly; e.g. syntactic n-grams are not evaluated against k-skip n-grams, and so forth). Second, thanks to our generalizations, we could evaluate novel grams, such as what we name compound grams. 4) The release of our associated Python feature extraction library, namely Fextractor. Essentially, the library generalizes all existing n-gram based feature extraction methods under the "at least l-frequent, dir-directed, k-skipped n-grams'', and allows grams to be diversely defined, including definitions that are based on high-level grammatical aspects, such as POS tags, as well as lower-level ones, such as distribution of function words, word shapes, etc. This makes the library, by far, the most extensive in this domain to date. 5) The construction, evaluation, and release of the first dataset for Emirati social media text. This evaluation represents the first evaluation of author identification against Emirati social media texts. Interestingly, we find that, when using our models and feature extraction library (Fextractor), authors could be identified significantly more accurately than what is reported with similarly sized datasets. The dataset also contains sub-datasets that represent other languages (Dutch, English, Greek and Spanish), and our findings are consistent across them.


2011 ◽  
pp. 682-704
Author(s):  
Petra Perner

This chapter introduces image mining as a method to discover implicit, previously unknown and potentially useful information from digital image and video repositories. It argues that image mining is a special discipline because of the special type of data and therefore, image-mining methods that consider the special data representation and the different aspects of image mining have to be developed. Furthermore, a bridge has to be established between image mining and image processing, feature extraction and image understanding since the later topics are concerned with the development of methods for the automatic extraction of higher-level image representations. We introduce our methodology, the developed methods and the system for image mining which we successfully applied to several medical image-diagnostic tasks.


2021 ◽  
Author(s):  
Caiwei Zhen ◽  
Yuxian Wang ◽  
Lu Han ◽  
Jingyi Li ◽  
Jinghao Peng ◽  
...  

The three-dimensional genome structure plays a key role in cellular function and gene regulation. Single-cell Hi-C technology can capture genome structure information at the cell level, which provides the opportunity to study how genome structure varies among different cell types. However, few methods are well designed for single-cell Hi-C clustering, because of high sparsity, noise and heterogeneity of single-cell Hi-C data. In this manuscript, we propose a novel framework, named ScHiC-Rep, for single-cell Hi-C data representation and clustering. ScHiC-Rep mainly contains two parts: data imputation and feature extraction. In the imputation part, a novel imputation workflow is proposed, including graph convolution-based, random walk with restart-based and genomic neighbor-based imputation. In the feature extraction part, a two-phase feature extraction method is proposed, including linear phase for chromosome level and non-linear phase for cell level feature extraction. The evaluation results show that the proposed framework outperforms existing state-of-the-art approaches on both human and mouse datasets.


2021 ◽  
Author(s):  
Alvin Subakti ◽  
Hendri Murfi ◽  
Nora Hariadi

Abstract Text clustering is the task of grouping a set of texts so that text in the same group will be more similar than those from a different group. The process of grouping text manually requires a significant amount of time and labor. Therefore, automation utilizing machine learning is necessary. The standard method used to represent textual data is Term Frequency Inverse Document Frequency (TFIDF). However, TFIDF cannot consider the position and context of a word in a sentence. Bidirectional Encoder Representation from Transformers (BERT) model can produce text representation that incorporates the position and context of a word in a sentence. This research analyzed the performance of the BERT model as data representation for text. Moreover, various feature extraction and normalization methods are also applied for the data representation of the BERT model. To examine the performances of BERT, we use four clustering algorithms, i.e., k-means clustering, eigenspace-based fuzzy c-means, deep embedded clustering, and improved deep embedded clustering. Our simulations show that BERT outperforms the standard TFIDF method in 28 out of 36 metrics. Furthermore, different feature extraction and normalization produced varied performances. The usage of these feature extraction and normalization must be altered depending on the text clustering algorithm used.


Author(s):  
Nguyen Thi Ngoc Anh ◽  
Ho Phan Hieu ◽  
Tran Anh Kiet ◽  
Vo Trung Hung

With the advances in data collection and storage capabilities, large amount of multidimensional dataset, known as higher-order data representation, has been generated on bioinformatics applications recently, especially in DNA sequences recognition. This paper thus proposes a mathematical modeling could be capable of the multidimensional problem of DNA similarity detection with high accuracy and reliability. To this end, the paper covers the central issues of multidimensional DNA gene expression data, including: (1) formulating multidimensional DNA data into higher-order representation; (2) recovering missing values; (3) decomposing high-order DNA data directly from their tensorial representation to extracted useful information for classification. Consequently, an exploring a novel type of third-order microarray expression, termed as gene - sample - time (GST), is presented for biological sample classification. The contributions will be distributed along two main thrusts of effectiveness; including latent modeling setting for imputing missing values based on the High-Order Kalman Filter and feature extraction based on Tensor Discriminative Feature Extraction. The experimental performance on real dataset of DNA sequences corroborates the advantages of the proposed approaches upon those of the matrix-based algorithms and recent tensor-based, discriminant-decomposition, in terms of missing values completion, classification accuracy and computation time.


Sensors ◽  
2021 ◽  
Vol 21 (24) ◽  
pp. 8205
Author(s):  
Lisa-Marie Vortmann ◽  
Felix Putze

Statistical measurements of eye movement-specific properties, such as fixations, saccades, blinks, or pupil dilation, are frequently utilized as input features for machine learning algorithms applied to eye tracking recordings. These characteristics are intended to be interpretable aspects of eye gazing behavior. However, prior research has demonstrated that when trained on implicit representations of raw eye tracking data, neural networks outperform these traditional techniques. To leverage the strengths and information of both feature sets, we integrated implicit and explicit eye tracking features in one classification approach in this work. A neural network was adapted to process the heterogeneous input and predict the internally and externally directed attention of 154 participants. We compared the accuracies reached by the implicit and combined features for different window lengths and evaluated the approaches in terms of person- and task-independence. The results indicate that combining implicit and explicit feature extraction techniques for eye tracking data improves classification results for attentional state detection significantly. The attentional state was correctly classified during new tasks with an accuracy better than chance, and person-independent classification even outperformed person-dependently trained classifiers for some settings. For future experiments and applications that require eye tracking data classification, we suggest to consider implicit data representation in addition to interpretable explicit features.


Author(s):  
Abdul Razak ◽  
Nirmala C. R

<p>For several decades, many statistical and scientific efforts took place for the better analysis or prediction of stock trading. But still it is open to offer new avenues for the scientists to rethink and discover new inferences by adopting latest technological scenarios. In this regard, this paper is trying to apply classification techniques on stock data stream through feature extraction for the trend analysis. The proposed work is involving k-means for clustering samples into two clusters (the stocks in trend as one cluster and another on as stocks not in trend). The trend analysis is done based on density estimation of the stocks with respect to sectors. A well-known data representation method that is histogram is used to represent the sector which is in trend. This work has been implemented and experimented by considering live NSE (india) data using python and its related tools.</p>


Author(s):  
Petra Perner

This chapter introduces image mining as a method to discover implicit, previously unknown and potentially useful information from digital image and video repositories. It argues that image mining is a special discipline because of the special type of data and therefore, image-mining methods that consider the special data representation and the different aspects of image mining have to be developed. Furthermore, a bridge has to be established between image mining and image processing, feature extraction and image understanding since the later topics are concerned with the development of methods for the automatic extraction of higher-level image representations. We introduce our methodology, the developed methods and the system for image mining which we successfully applied to several medical image-diagnostic tasks.


Sign in / Sign up

Export Citation Format

Share Document