Assessing Methods for Evaluating the Number of Components in Non-Negative Matrix Factorization

José M. Maisog; Andrew T. DeMarco; Karthik Devarajan; Stanley Young; Paul Fogel; George Luta

doi:10.3390/math9222840

Assessing Methods for Evaluating the Number of Components in Non-Negative Matrix Factorization

Mathematics ◽

10.3390/math9222840 ◽

2021 ◽

Vol 9 (22) ◽

pp. 2840

Author(s):

José M. Maisog ◽

Andrew T. DeMarco ◽

Karthik Devarajan ◽

Stanley Young ◽

Paul Fogel ◽

...

Keyword(s):

Matrix Factorization ◽

Synthetic Data ◽

Matrix Decomposition ◽

Normalization Method ◽

Data Matrix ◽

Cophenetic Correlation ◽

Cophenetic Correlation Coefficient ◽

Pca Method ◽

K Matrix ◽

Non Negative Matrix Factorization

Non-negative matrix factorization is a relatively new method of matrix decomposition which factors an m × n data matrix X into an m × k matrix W and a k × n matrix H, so that X ≈ W × H. Importantly, all values in X, W, and H are constrained to be non-negative. NMF can be used for dimensionality reduction, since the k columns of W can be considered components into which X has been decomposed. The question arises: how does one choose k? In this paper, we first assess methods for estimating k in the context of NMF in synthetic data. Second, we examine the effect of normalization on this estimate’s accuracy in empirical data. In synthetic data with orthogonal underlying components, methods based on PCA and Brunet’s Cophenetic Correlation Coefficient achieved the highest accuracy. When evaluated on a well-known real dataset, normalization had an unpredictable effect on the estimate. For any given normalization method, the methods for estimating k gave widely varying results. We conclude that when estimating k, it is best not to apply normalization. If the underlying components are known to be orthogonal, then Velicer’s MAP or Minka’s Laplace-PCA method might be best. However, when the orthogonality of the underlying components is unknown, none of the methods seemed preferable.

Download Full-text

A Block Coordinate Descent-Based Projected Gradient Algorithm for Orthogonal Non-Negative Matrix Factorization

Mathematics ◽

10.3390/math9050540 ◽

2021 ◽

Vol 9 (5) ◽

pp. 540

Author(s):

Soodabeh Asadi ◽

Janez Povh

Keyword(s):

Matrix Factorization ◽

Synthetic Data ◽

Coordinate Descent ◽

The Other ◽

Gradient Algorithm ◽

Block Coordinate Descent ◽

Projected Gradient Method ◽

Projected Gradient ◽

Factorization Problem ◽

Non Negative Matrix Factorization

This article uses the projected gradient method (PG) for a non-negative matrix factorization problem (NMF), where one or both matrix factors must have orthonormal columns or rows. We penalize the orthonormality constraints and apply the PG method via a block coordinate descent approach. This means that at a certain time one matrix factor is fixed and the other is updated by moving along the steepest descent direction computed from the penalized objective function and projecting onto the space of non-negative matrices. Our method is tested on two sets of synthetic data for various values of penalty parameters. The performance is compared to the well-known multiplicative update (MU) method from Ding (2006), and with a modified global convergent variant of the MU algorithm recently proposed by Mirzal (2014). We provide extensive numerical results coupled with appropriate visualizations, which demonstrate that our method is very competitive and usually outperforms the other two methods.

Download Full-text

Feature Sampling Based Unsupervised Semantic Clustering for Real Web Multi-View Content

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.3301102 ◽

2019 ◽

Vol 33 ◽

pp. 102-109

Author(s):

Xiaolong Gong ◽

Linpeng Huang ◽

Fuwei Wang

Keyword(s):

Matrix Factorization ◽

Clustering Algorithm ◽

Iteration Process ◽

Sampling Strategy ◽

Matrix Decomposition ◽

Feature Space ◽

Graph Regularization ◽

Semantic Clustering ◽

Independence Criterion ◽

Non Negative Matrix Factorization

Real web datasets are often associated with multiple views such as long and short commentaries, users preference and so on. However, with the rapid growth of user generated texts, each view of the dataset has a large feature space and leads to the computational challenge during matrix decomposition process. In this paper, we propose a novel multi-view clustering algorithm based on the non-negative matrix factorization that attempts to use feature sampling strategy in order to reduce the complexity during the iteration process. In particular, our method exploits unsupervised semantic information in the learning process to capture the intrinsic similarity through a graph regularization. Moreover, we use Hilbert Schmidt Independence Criterion (HSIC) to explore the unsupervised semantic diversity information among multi-view contents of one web item. The overall objective is to minimize the loss function of multi-view non-negative matrix factorization that combines with an intra-semantic similarity graph regularizer and an inter-semantic diversity term. Compared with some state-of-the-art methods, we demonstrate the effectiveness of our proposed method on a large real-world dataset Doucom and the other three smaller datasets.

Download Full-text

Robust hypergraph regularized non-negative matrix factorization for sample clustering and feature selection in multi-view gene expression data

Human Genomics ◽

10.1186/s40246-019-0222-6 ◽

2019 ◽

Vol 13 (S1) ◽

Cited By ~ 1

Author(s):

Na Yu ◽

Ying-Lian Gao ◽

Jin-Xing Liu ◽

Juan Wang ◽

Junliang Shang

Keyword(s):

Feature Selection ◽

Matrix Factorization ◽

Gene Selection ◽

Matrix Decomposition ◽

Original Data ◽

Data Representation ◽

Abnormal Expression ◽

Sample Points ◽

Laplacian Regularization ◽

Non Negative Matrix Factorization

Abstract Background As one of the most popular data representation methods, non-negative matrix decomposition (NMF) has been widely concerned in the tasks of clustering and feature selection. However, most of the previously proposed NMF-based methods do not adequately explore the hidden geometrical structure in the data. At the same time, noise and outliers are inevitably present in the data. Results To alleviate these problems, we present a novel NMF framework named robust hypergraph regularized non-negative matrix factorization (RHNMF). In particular, the hypergraph Laplacian regularization is imposed to capture the geometric information of original data. Unlike graph Laplacian regularization which captures the relationship between pairwise sample points, it captures the high-order relationship among more sample points. Moreover, the robustness of the RHNMF is enhanced by using the L2,1-norm constraint when estimating the residual. This is because the L2,1-norm is insensitive to noise and outliers. Conclusions Clustering and common abnormal expression gene (com-abnormal expression gene) selection are conducted to test the validity of the RHNMF model. Extensive experimental results on multi-view datasets reveal that our proposed model outperforms other state-of-the-art methods.

Download Full-text

Blind Extraction of Correlated Fault Sources Based on Constrained Non-Negative Matrix Factorization

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.226-228.760 ◽

2012 ◽

Vol 226-228 ◽

pp. 760-764

Author(s):

Ning Li ◽

Hai Ting Chen

Keyword(s):

Matrix Factorization ◽

Matrix Decomposition ◽

Separation Performance ◽

Vibration Source ◽

Practical Performance ◽

Blind Extraction ◽

The Stability ◽

The Cost ◽

Correlated Signals ◽

Non Negative Matrix Factorization

Blind source separation (BSS) has been successfully used to extract undetected fault vibration sources from mixed observation signals by assuming that each unknown vibration source is mutually independent. However, conventional BSS algorithms cannot address the situation in which the fault source could be partially dependent on or correlated to other sources. For this, a new matrix decomposition method, called Non-negative Matrix Factorization (NMF), is introduced to separate these partially correlated signals. In this paper, the observed temporal signals are transformed into the frequency domain to satisfy the non-negative limit of NMF. The constraint of the least correlation between the separated sources is added into the cost function of NMF to enhance the stability of NMF, and the constrained non-negative matrix factorization (CNMF) is proposed. The simulation results show that the separation performance of CNMF is superior to the common BSS algorithms and the experiment result verifies the practical performance of CNMF.

Download Full-text

Swarm Intelligence for Non-Negative Matrix Factorization

International Journal of Swarm Intelligence Research ◽

10.4018/jsir.2011100102 ◽

2011 ◽

Vol 2 (4) ◽

pp. 12-34 ◽

Cited By ~ 44

Author(s):

Andreas Janecek ◽

Ying Tan

Keyword(s):

Swarm Intelligence ◽

Matrix Factorization ◽

Classification Accuracy ◽

Approximation Error ◽

Synthetic Data ◽

Low Rank ◽

Low Rank Approximation ◽

Approximation Quality ◽

Initialization Strategy ◽

Non Negative Matrix Factorization

The Non-negative Matrix Factorization (NMF) is a special low-rank approximation which allows for an additive parts-based and interpretable representation of the data. This article presents efforts to improve the convergence, approximation quality, and classification accuracy of NMF using five different meta-heuristics based on swarm intelligence. Several properties of the NMF objective function motivate the utilization of meta-heuristics: this function is non-convex, discontinuous, and may possess many local minima. The proposed optimization strategies are two-fold: On the one hand, a new initialization strategy for NMF is presented in order to initialize the NMF factors prior to the factorization; on the other hand, an iterative update strategy is proposed, which improves the accuracy per runtime for the multiplicative update NMF algorithm. The success of the proposed optimization strategies are shown by applying them on synthetic data and data sets coming from the areas of spam filtering/email classification, and evaluate them also in their application context. Experimental results show that both optimization strategies are able to improve NMF in terms of faster convergence, lower approximation error, and better classification accuracy. Especially the initialization strategy leads to significant reductions of the runtime per accuracy ratio for both, the NMF approximation as well as the classification results achieved with NMF.

Download Full-text

Rank Selection in Nonnegative Matrix Factorization using Minimum Description Length

Neural Computation ◽

10.1162/neco_a_00980 ◽

2017 ◽

Vol 29 (8) ◽

pp. 2164-2176 ◽

Cited By ~ 9

Author(s):

Steven Squires ◽

Adam Prügel-Bennett ◽

Mahesan Niranjan

Keyword(s):

Matrix Factorization ◽

Nonnegative Matrix Factorization ◽

Minimum Description Length ◽

Synthetic Data ◽

Nonnegative Matrix ◽

Real Data ◽

Data Matrix ◽

Constraint Forces ◽

Data Points ◽

Linear Dimensionality Reduction

Nonnegative matrix factorization (NMF) is primarily a linear dimensionality reduction technique that factorizes a nonnegative data matrix into two smaller nonnegative matrices: one that represents the basis of the new subspace and the second that holds the coefficients of all the data points in that new space. In principle, the nonnegativity constraint forces the representation to be sparse and parts based. Instead of extracting holistic features from the data, real parts are extracted that should be significantly easier to interpret and analyze. The size of the new subspace selects how many features will be extracted from the data. An effective choice should minimize the noise while extracting the key features. We propose a mechanism for selecting the subspace size by using a minimum description length technique. We demonstrate that our technique provides plausible estimates for real data as well as accurately predicting the known size of synthetic data. We provide an implementation of our code in a Matlab format.

Download Full-text

Graph Regularized Semi-Supervised Concept Factorization

Advanced Engineering Forum ◽

10.4028/www.scientific.net/aef.6-7.583 ◽

2012 ◽

Vol 6-7 ◽

pp. 583-588

Author(s):

Yu Qing Shi ◽

Shi Qiang Du ◽

Wei Lan Wang

Keyword(s):

Matrix Factorization ◽

Matrix Decomposition ◽

Image Database ◽

Data Representation ◽

Face Database ◽

Class Label ◽

Intrinsic Geometry ◽

Label Information ◽

Concept Factorization ◽

Non Negative Matrix Factorization

Concept Factorization (CF) is a new matrix decomposition technique for data representation. A modified CF algorithm called Graph Regularized Semi-supervised Concept Factorization (GRSCF) is proposed for addressing the limitations of CF and Local Consistent Concept Factorization (LCCF), which did not consider the geometric structure or the label information of the data. GRSCF preserves the intrinsic geometry of data as regularized term and use the label information as semi-supervised learning, it makes nearby samples with the same class-label are more compact, and nearby classes are separated. Compared with Non-Negative Matrix Factorization (NMF), CNMF, CF and LCCF, experiment results on ORL face database and Coil20 image database have shown that the proposed method achieves better clustering results.

Download Full-text

Dimensionality reduction for single cell RNA sequencing data using constrained robust non-negative matrix factorization

NAR Genomics and Bioinformatics ◽

10.1093/nargab/lqaa064 ◽

2020 ◽

Vol 2 (3) ◽

Author(s):

Shuqin Zhang ◽

Liu Yang ◽

Jinwen Yang ◽

Zhixiang Lin ◽

Michael K Ng

Keyword(s):

Dimensionality Reduction ◽

Single Cell ◽

Rna Sequencing ◽

Matrix Factorization ◽

Sparse Matrix ◽

Differential Expression Analysis ◽

Data Matrix ◽

Research Attention ◽

Single Cell Rna Sequencing ◽

Non Negative Matrix Factorization

Abstract Single cell RNA-sequencing (scRNA-seq) technology, a powerful tool for analyzing the entire transcriptome at single cell level, is receiving increasing research attention. The presence of dropouts is an important characteristic of scRNA-seq data that may affect the performance of downstream analyses, such as dimensionality reduction and clustering. Cells sequenced to lower depths tend to have more dropouts than those sequenced to greater depths. In this study, we aimed to develop a dimensionality reduction method to address both dropouts and the non-negativity constraints in scRNA-seq data. The developed method simultaneously performs dimensionality reduction and dropout imputation under the non-negative matrix factorization (NMF) framework. The dropouts were modeled as a non-negative sparse matrix. Summation of the observed data matrix and dropout matrix was approximated by NMF. To ensure the sparsity pattern was maintained, a weighted ℓ1 penalty that took into account the dependency of dropouts on the sequencing depth in each cell was imposed. An efficient algorithm was developed to solve the proposed optimization problem. Experiments using both synthetic data and real data showed that dimensionality reduction via the proposed method afforded more robust clustering results compared with those obtained from the existing methods, and that dropout imputation improved the differential expression analysis.

Download Full-text

Image Encryption Based on Matrix Factorization

International Journal of Safety and Security Engineering ◽

10.18280/ijsse.100510 ◽

2020 ◽

Vol 10 (5) ◽

pp. 655-661

Author(s):

Vivek Khalane ◽

Shekhar Suralkar ◽

Umesh Bhadade

Keyword(s):

Image Encryption ◽

Matrix Factorization ◽

Decomposition Method ◽

Computation Time ◽

Matrix Decomposition ◽

Input Image ◽

Decomposition Techniques ◽

Encryption Method ◽

State Of Art ◽

Non Negative Matrix Factorization

In this paper, we present a matrix decomposition-based approach for image cryptography. The proposed method consists of decomposing the image into different component and scrambling the components to form the image encryption technique. We use two different type of matrix decomposition techniques to check the efficiency of proposed encryption method. The decomposition techniques used are Independent component analysis (ICA) and Non-Negative Matrix factorization (NMF). The proposed technique has unique user defined parameters (key) such as decomposition method, number of decomposition components and order in which the components are arranged. The unique encryption technique is designed on the basis of these key parameters. The original image can be reconstructed at the decryption end only if the selected parameters are known to the user. The design examples for both decomposition approaches are presented for illustration purpose. We analyze the complexity and encryption time of cryptography system. Results prove that the proposed scheme is more secure as it has less correlation between the input image and the encrypted version of the same as compared to state-of-art methods. The computation time of the proposed approach is found to be comparable.

Download Full-text