Enter the matrix: factorization uncovers knowledge from omics Names/Affiliations

Mapping Intimacies ◽

10.1101/196915 ◽

2017 ◽

Cited By ~ 2

Author(s):

Genevieve L. Stein-O’Brien ◽

Raman Arora ◽

Aedin C. Culhane ◽

Alexander V. Favorov ◽

Lana X. Garmire ◽

...

Keyword(s):

High Throughput ◽

Matrix Factorization ◽

Time Course ◽

High Dimensional Data ◽

Dimensional Structure ◽

High Dimensional ◽

Biological Knowledge ◽

Omics Data ◽

Cellular Interactions ◽

Low Dimensional

AbstractOmics data contains signal from the molecular, physical, and kinetic inter- and intra-cellular interactions that control biological systems. Matrix factorization techniques can reveal low-dimensional structure from high-dimensional data that reflect these interactions. These techniques can uncover new biological knowledge from diverse high-throughput omics data in topics ranging from pathway discovery to time course analysis. We review exemplary applications of matrix factorization for systems-level analyses. We discuss appropriate application of these methods, their limitations, and focus on analysis of results to facilitate optimal biological interpretation. The inference of biologically relevant features with matrix factorization enables discovery from high-throughput data beyond the limits of current biological knowledge—answering questions from high-dimensional data that we have not yet thought to ask.

Download Full-text

Efficient recovery of low-dimensional structure from high-dimensional data

Proceedings of the Seventh IEEE International Conference on Computer Vision ◽

10.1109/iccv.1999.791278 ◽

1999 ◽

Author(s):

S. Mahamud ◽

M. Henert

Keyword(s):

High Dimensional Data ◽

Dimensional Structure ◽

High Dimensional ◽

Efficient Recovery ◽

Low Dimensional

Download Full-text

LOCALLY LINEAR EMBEDDING: A REVIEW

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001411008993 ◽

2011 ◽

Vol 25 (07) ◽

pp. 985-1008 ◽

Cited By ~ 10

Author(s):

JING CHEN ◽

ZHENGMING MA

Keyword(s):

High Dimensional Data ◽

Dimensional Structure ◽

High Dimensional ◽

Locally Linear Embedding ◽

Future Research ◽

Nonlinear Dimensionality Reduction ◽

The Novel ◽

Linear Embedding ◽

Low Dimensional ◽

Locally Linear

The goal of nonlinear dimensionality reduction is to find the meaningful low dimensional structure of the nonlinear manifold from the high dimensional data. As a classic method of nonlinear dimensional reduction, locally linear embedding (LLE) is more and more attractive to researchers due to its ability to deal with large amounts of high dimensional data and its noniterative way of finding the embeddings. However, several problems in the LLE algorithm still remain open, such as its sensitivity to noise, inevitable ill-conditioned eigenproblems, the inability to deal with the novel data, etc. The existing extensions are comprehensively reviewed and discussed classifying into different categories in this paper. Their strategies, advantages/disadvantages and performances are elaborated. By generalizing different tactics in various extensions related to different stages of LLE and evaluating their performances, several promising directions for future research have been suggested.

Download Full-text

A Review of Recent Advancement in Integrating Omics Data with Literature Mining towards Biomedical Discoveries

International Journal of Genomics ◽

10.1155/2017/6213474 ◽

2017 ◽

Vol 2017 ◽

pp. 1-10 ◽

Cited By ~ 15

Author(s):

Kalpana Raja ◽

Matthew Patrick ◽

Yilin Gao ◽

Desmond Madu ◽

Yuyang Yang ◽

...

Keyword(s):

Text Mining ◽

High Throughput ◽

Literature Mining ◽

Biological Knowledge ◽

Omics Data ◽

Huge Number ◽

Automated Generation ◽

The Past ◽

Independent Information ◽

Recent Advancement

In the past decade, the volume of “omics” data generated by the different high-throughput technologies has expanded exponentially. The managing, storing, and analyzing of this big data have been a great challenge for the researchers, especially when moving towards the goal of generating testable data-driven hypotheses, which has been the promise of the high-throughput experimental techniques. Different bioinformatics approaches have been developed to streamline the downstream analyzes by providing independent information to interpret and provide biological inference. Text mining (also known as literature mining) is one of the commonly used approaches for automated generation of biological knowledge from the huge number of published articles. In this review paper, we discuss the recent advancement in approaches that integrate results from omics data and information generated from text mining approaches to uncover novel biomedical information.

Download Full-text

Statistical Approaches for High Dimensional Data Derived from High Throughput Assays: A Case Study of Protein Expression Levels in Lung Cancer

Handbook of Statistics in Clinical Oncology ◽

10.1201/9781420027761-34 ◽

2005 ◽

pp. 477-490

Keyword(s):

Lung Cancer ◽

Protein Expression ◽

High Throughput ◽

High Dimensional Data ◽

High Dimensional ◽

Expression Levels ◽

Statistical Approaches

Download Full-text

A Novel Density-based Technique for Outlier Detection of High Dimensional Data Utilizing Full Feature Space

Information Technology And Control ◽

10.5755/j01.itc.50.1.25588 ◽

2021 ◽

Vol 50 (1) ◽

pp. 138-152

Author(s):

Mujeeb Ur Rehman ◽

Dost Muhammad Khan

Keyword(s):

Data Mining ◽

Outlier Detection ◽

High Dimensional Data ◽

Research Work ◽

Feature Space ◽

High Dimensional ◽

Data Set ◽

Data Points ◽

Low Dimensional ◽

Intrinsic Feature

Recently, anomaly detection has acquired a realistic response from data mining scientists as a graph of its reputation has increased smoothly in various practical domains like product marketing, fraud detection, medical diagnosis, fault detection and so many other fields. High dimensional data subjected to outlier detection poses exceptional challenges for data mining experts and it is because of natural problems of the curse of dimensionality and resemblance of distant and adjoining points. Traditional algorithms and techniques were experimented on full feature space regarding outlier detection. Customary methodologies concentrate largely on low dimensional data and hence show ineffectiveness while discovering anomalies in a data set comprised of a high number of dimensions. It becomes a very difficult and tiresome job to dig out anomalies present in high dimensional data set when all subsets of projections need to be explored. All data points in high dimensional data behave like similar observations because of its intrinsic feature i.e., the distance between observations approaches to zero as the number of dimensions extends towards infinity. This research work proposes a novel technique that explores deviation among all data points and embeds its findings inside well established density-based techniques. This is a state of art technique as it gives a new breadth of research towards resolving inherent problems of high dimensional data where outliers reside within clusters having different densities. A high dimensional dataset from UCI Machine Learning Repository is chosen to test the proposed technique and then its results are compared with that of density-based techniques to evaluate its efficiency.

Download Full-text

Unsupervised Text Feature Learning via Deep Variational Auto-encoder

Information Technology And Control ◽

10.5755/j01.itc.49.3.25918 ◽

2020 ◽

Vol 49 (3) ◽

pp. 421-437

Author(s):

Genggeng Liu ◽

Lin Xie ◽

Chi-Hua Chen

Keyword(s):

Dimensionality Reduction ◽

High Dimensional Data ◽

Image Data ◽

Original Data ◽

Feature Representation ◽

High Dimensional ◽

Learning To Learn ◽

Text Feature ◽

Reduction Methods ◽

Low Dimensional

Dimensionality reduction plays an important role in the data processing of machine learning and data mining, which makes the processing of high-dimensional data more efficient. Dimensionality reduction can extract the low-dimensional feature representation of high-dimensional data, and an effective dimensionality reduction method can not only extract most of the useful information of the original data, but also realize the function of removing useless noise. The dimensionality reduction methods can be applied to all types of data, especially image data. Although the supervised learning method has achieved good results in the application of dimensionality reduction, its performance depends on the number of labeled training samples. With the growing of information from internet, marking the data requires more resources and is more difficult. Therefore, using unsupervised learning to learn the feature of data has extremely important research value. In this paper, an unsupervised multilayered variational auto-encoder model is studied in the text data, so that the high-dimensional feature to the low-dimensional feature becomes efficient and the low-dimensional feature can retain mainly information as much as possible. Low-dimensional feature obtained by different dimensionality reduction methods are used to compare with the dimensionality reduction results of variational auto-encoder (VAE), and the method can be significantly improved over other comparison methods.

Download Full-text

Improved Nonnegative Matrix Factorization Based Feature Selection for High Dimensional Data Analysis

Proceedings of the 2nd International Conference on Computer Science and Electronics Engineering (ICCSEE 2013) ◽

10.2991/iccsee.2013.583 ◽

2013 ◽

Author(s):

Lincheng Jiang ◽

Wentang Tan ◽

Zhenwen Wang ◽

Fengjing Yin ◽

Bin Ge ◽

...

Keyword(s):

Feature Selection ◽

Data Analysis ◽

Matrix Factorization ◽

Nonnegative Matrix Factorization ◽

High Dimensional Data ◽

Nonnegative Matrix ◽

High Dimensional ◽

High Dimensional Data Analysis ◽

Selection For

Download Full-text

Machine learning for single cell genomics data analysis

10.1101/2021.02.04.429763 ◽

2021 ◽

Author(s):

Félix Raimundo ◽

Laetitia Papaxanthos ◽

Céline Vallot ◽

Jean-Philippe Vert

Keyword(s):

Machine Learning ◽

Single Cell ◽

Network Inference ◽

Method Development ◽

Biological Knowledge ◽

Omics Data ◽

Gene Regulatory Network Inference ◽

Multimodal Data ◽

Low Dimensional ◽

Type Classification

AbstractSingle-cell omics technologies produce large quantities of data describing the genomic, transcriptomic or epigenomic profiles of many individual cells in parallel. In order to infer biological knowledge and develop predictive models from these data, machine learning (ML)-based model are increasingly used due to their flexibility, scalability, and impressive success in other fields. In recent years, we have seen a surge of new ML-based method development for low-dimensional representations of single-cell omics data, batch normalization, cell type classification, trajectory inference, gene regulatory network inference or multimodal data integration. To help readers navigate this fast-moving literature, we survey in this review recent advances in ML approaches developed to analyze single-cell omics data, focusing mainly on peer-reviewed publications published in the last two years (2019-2020).

Download Full-text

Subspace Clustering with Sparsity and Grouping Effect

Mathematical Problems in Engineering ◽

10.1155/2017/4787039 ◽

2017 ◽

Vol 2017 ◽

pp. 1-9 ◽

Cited By ~ 1

Author(s):

Binbin Zhang ◽

Weiwei Wang ◽

Xiangchu Feng

Keyword(s):

State Of The Art ◽

Subspace Clustering ◽

The Self ◽

Dimensional Structure ◽

High Dimensional ◽

Affinity Matrix ◽

Grouping Effect ◽

Art Methods ◽

Data Points ◽

Low Dimensional

Subspace clustering aims to group a set of data from a union of subspaces into the subspace from which it was drawn. It has become a popular method for recovering the low-dimensional structure underlying high-dimensional dataset. The state-of-the-art methods construct an affinity matrix based on the self-representation of the dataset and then use a spectral clustering method to obtain the final clustering result. These methods show that sparsity and grouping effect of the affinity matrix are important in recovering the low-dimensional structure. In this work, we propose a weighted sparse penalty and a weighted grouping effect penalty in modeling the self-representation of data points. The experimental results on Extended Yale B, USPS, and Berkeley 500 image segmentation datasets show that the proposed model is more effective than state-of-the-art methods in revealing the subspace structure underlying high-dimensional dataset.

Download Full-text

Knowledge-Guided Statistical Learning Methods for Analysis of High-Dimensional -Omics Data in Precision Oncology

JCO Precision Oncology ◽

10.1200/po.19.00018 ◽

2019 ◽

pp. 1-9 ◽

Cited By ~ 2

Author(s):

Yize Zhao ◽

Changgee Chang ◽

Qi Long

Keyword(s):

Statistical Learning ◽

Current Knowledge ◽

Complex Diseases ◽

High Dimensional ◽

Future Research ◽

Great Promise ◽

Biological Knowledge ◽

Precision Oncology ◽

Omics Data ◽

Learning Methods

High-dimensional -omics data such as genomic, transcriptomic, and metabolomic data offer great promise in advancing precision medicine. In particular, such data have enabled the investigation of complex diseases such as cancer at an unprecedented scale and in multiple dimensions. However, a number of analytical challenges complicate analysis of high-dimensional -omics data. One is the growing recognition that complex diseases such as cancer are multifactorial and may be attributed to harmful changes on multiple -omics levels and on the pathway level. When individual genes in an important pathway have relatively weak signals, it can be challenging to detect them on their own, but the aggregated signal in the pathway can be considerably stronger and hence easier to detect with the same sample size. To address these challenges, there is a growing body of literature on knowledge-guided statistical learning methods for analysis of high-dimensional -omics data that can incorporate biological knowledge such as functional genomics and functional proteomics. These methods have been shown to improve predication and classification accuracy and yield biologically more interpretable results compared with statistical learning methods that do not use biological knowledge. In this review, we survey current knowledge-guided statistical learning methods, including both supervised learning and unsupervised learning, and their applications to precision oncology, and we discuss future research directions.

Download Full-text