Histogram-Based Dimensionality Reduction of Term Vector Space

Author(s):  
Krzysztof Ciesielski ◽  
Mieczyslaw A. Klopotek ◽  
Slawomir T. Wierzchon
2019 ◽  
Vol 10 (3) ◽  
pp. 221-239
Author(s):  
Enrico Au-Yeung

Abstract The problem of how to find a sparse representation of a signal is an important one in applied and computational harmonic analysis. It is closely related to the problem of how to reconstruct a sparse vector from its projection in a much lower-dimensional vector space. This is the setting of compressed sensing, where the projection is given by a matrix with many more columns than rows. We introduce a class of random matrices that can be used to reconstruct sparse vectors in this paradigm. These matrices satisfy the restricted isometry property with overwhelming probability. We also discuss an application in dimensionality reduction where we initially discovered this class of matrices.


Author(s):  
Baojia Chen ◽  
Baoming Shen ◽  
Fajun Zhang ◽  
Wenrong Xiao ◽  
Fafa Chen ◽  
...  

The traditional reliability evaluation method based on large sample statistics is inefficient for a single or a small batch computer numerical control turning cutting tool due to the inadequate description of time, dynamic process, inaccurate model and individualization. To solve the problem, a new operation reliability evaluation method based on singular value decomposition transform and support vector space is proposed. In this new method, the singular value decomposition is used for the dimensionality reduction of high-dimensional feature data so as to reduce the computational complexity and the redundant components. The hypersphere space of the similar data is established based on the dimension reduction data. The relative distance between the sample points and the hypersphere is then calculated and used to describe the performance of the tool. The semi-normal function is introduced to define the mapping relationship of the relative distance and the operation reliability of the tool. Finally, two cutting tools in the experiment are taken as the research example to verify the effectiveness of the method. The result shows that this method can evaluate the operation reliability of the tool effectively and the singular value decomposition dimensionality reduction improves the accuracy of the evaluation. It provides a new theoretical and practical support for the reliability evaluation of small sample data.


Author(s):  
Roman Shaptala ◽  
Gennadiy Kyselov

In this study, we explore and compare two ways of vector space model creation for Kyiv city petitions. Both models are built on top of word vectors based on the distributional hypothesis, namely Word2Vec and FastText. We train word vectors on the dataset of Kyiv city petitions, preprocess the documents, and apply averaging to create petition vectors. Visualizations of the vector spaces after dimensionality reduction via UMAP are demonstrated in an attempt to show their overall structure. We show that the resulting models can be used to effectively query semantically related petitions as well as search for clusters of related petitions. The advantages and disadvantages of both models are analyzed.


Molecules ◽  
2019 ◽  
Vol 24 (9) ◽  
pp. 1698 ◽  
Author(s):  
Samina Kausar ◽  
Andre O. Falcao

The performance of quantitative structure–activity relationship (QSAR) models largely depends on the relevance of the selected molecular representation used as input data matrices. This work presents a thorough comparative analysis of two main categories of molecular representations (vector space and metric space) for fitting robust machine learning models in QSAR problems. For the assessment of these methods, seven different molecular representations that included RDKit descriptors, five different fingerprints types (MACCS, PubChem, FP2-based, Atom Pair, and ECFP4), and a graph matching approach (non-contiguous atom matching structure similarity; NAMS) in both vector space and metric space, were subjected to state-of-art machine learning methods that included different dimensionality reduction methods (feature selection and linear dimensionality reduction). Five distinct QSAR data sets were used for direct assessment and analysis. Results show that, in general, metric-space and vector-space representations are able to produce equivalent models, but there are significant differences between individual approaches. The NAMS-based similarity approach consistently outperformed most fingerprint representations in model quality, closely followed by Atom Pair fingerprints. To further verify these findings, the metric space-based models were fitted to the same data sets with the closest neighbors removed. These latter results further strengthened the above conclusions. The metric space graph-based approach appeared significantly superior to the other representations, albeit at a significant computational cost.


Author(s):  
Htay Htay Win ◽  
Aye Thida Myint ◽  
Mi Cho Cho

For years, achievements and discoveries made by researcher are made aware through research papers published in appropriate journals or conferences. Many a time, established s researcher and mainly new user are caught up in the predicament of choosing an appropriate conference to get their work all the time. Every scienti?c conference and journal is inclined towards a particular ?eld of research and there is a extensive group of them for any particular ?eld. Choosing an appropriate venue is needed as it helps in reaching out to the right listener and also to further one’s chance of getting their paper published. In this work, we address the problem of recommending appropriate conferences to the authors to increase their chances of receipt. We present three di?erent approaches for the same involving the use of social network of the authors and the content of the paper in the settings of dimensionality reduction and topic modelling. In all these approaches, we apply Correspondence Analysis (CA) to obtain appropriate relationships between the entities in question, such as conferences and papers. Our models show hopeful results when compared with existing methods such as content-based ?ltering, collaborative ?ltering and hybrid ?ltering.


2018 ◽  
Vol 14 (3) ◽  
pp. 184-192
Author(s):  
Radhi Ali ◽  
◽  
Jalal Hussein Bayati ◽  
Suhad Hameed

2013 ◽  
Vol 38 (4) ◽  
pp. 465-470 ◽  
Author(s):  
Jingjie Yan ◽  
Xiaolan Wang ◽  
Weiyi Gu ◽  
LiLi Ma

Abstract Speech emotion recognition is deemed to be a meaningful and intractable issue among a number of do- mains comprising sentiment analysis, computer science, pedagogy, and so on. In this study, we investigate speech emotion recognition based on sparse partial least squares regression (SPLSR) approach in depth. We make use of the sparse partial least squares regression method to implement the feature selection and dimensionality reduction on the whole acquired speech emotion features. By the means of exploiting the SPLSR method, the component parts of those redundant and meaningless speech emotion features are lessened to zero while those serviceable and informative speech emotion features are maintained and selected to the following classification step. A number of tests on Berlin database reveal that the recogni- tion rate of the SPLSR method can reach up to 79.23% and is superior to other compared dimensionality reduction methods.


Sign in / Sign up

Export Citation Format

Share Document