scholarly journals Enter the matrix: factorization uncovers knowledge from omics Names/Affiliations

2017 ◽  
Author(s):  
Genevieve L. Stein-O’Brien ◽  
Raman Arora ◽  
Aedin C. Culhane ◽  
Alexander V. Favorov ◽  
Lana X. Garmire ◽  
...  

AbstractOmics data contains signal from the molecular, physical, and kinetic inter- and intra-cellular interactions that control biological systems. Matrix factorization techniques can reveal low-dimensional structure from high-dimensional data that reflect these interactions. These techniques can uncover new biological knowledge from diverse high-throughput omics data in topics ranging from pathway discovery to time course analysis. We review exemplary applications of matrix factorization for systems-level analyses. We discuss appropriate application of these methods, their limitations, and focus on analysis of results to facilitate optimal biological interpretation. The inference of biologically relevant features with matrix factorization enables discovery from high-throughput data beyond the limits of current biological knowledge—answering questions from high-dimensional data that we have not yet thought to ask.

Author(s):  
JING CHEN ◽  
ZHENGMING MA

The goal of nonlinear dimensionality reduction is to find the meaningful low dimensional structure of the nonlinear manifold from the high dimensional data. As a classic method of nonlinear dimensional reduction, locally linear embedding (LLE) is more and more attractive to researchers due to its ability to deal with large amounts of high dimensional data and its noniterative way of finding the embeddings. However, several problems in the LLE algorithm still remain open, such as its sensitivity to noise, inevitable ill-conditioned eigenproblems, the inability to deal with the novel data, etc. The existing extensions are comprehensively reviewed and discussed classifying into different categories in this paper. Their strategies, advantages/disadvantages and performances are elaborated. By generalizing different tactics in various extensions related to different stages of LLE and evaluating their performances, several promising directions for future research have been suggested.


2017 ◽  
Vol 2017 ◽  
pp. 1-10 ◽  
Author(s):  
Kalpana Raja ◽  
Matthew Patrick ◽  
Yilin Gao ◽  
Desmond Madu ◽  
Yuyang Yang ◽  
...  

In the past decade, the volume of “omics” data generated by the different high-throughput technologies has expanded exponentially. The managing, storing, and analyzing of this big data have been a great challenge for the researchers, especially when moving towards the goal of generating testable data-driven hypotheses, which has been the promise of the high-throughput experimental techniques. Different bioinformatics approaches have been developed to streamline the downstream analyzes by providing independent information to interpret and provide biological inference. Text mining (also known as literature mining) is one of the commonly used approaches for automated generation of biological knowledge from the huge number of published articles. In this review paper, we discuss the recent advancement in approaches that integrate results from omics data and information generated from text mining approaches to uncover novel biomedical information.


2021 ◽  
Vol 50 (1) ◽  
pp. 138-152
Author(s):  
Mujeeb Ur Rehman ◽  
Dost Muhammad Khan

Recently, anomaly detection has acquired a realistic response from data mining scientists as a graph of its reputation has increased smoothly in various practical domains like product marketing, fraud detection, medical diagnosis, fault detection and so many other fields. High dimensional data subjected to outlier detection poses exceptional challenges for data mining experts and it is because of natural problems of the curse of dimensionality and resemblance of distant and adjoining points. Traditional algorithms and techniques were experimented on full feature space regarding outlier detection. Customary methodologies concentrate largely on low dimensional data and hence show ineffectiveness while discovering anomalies in a data set comprised of a high number of dimensions. It becomes a very difficult and tiresome job to dig out anomalies present in high dimensional data set when all subsets of projections need to be explored. All data points in high dimensional data behave like similar observations because of its intrinsic feature i.e., the distance between observations approaches to zero as the number of dimensions extends towards infinity. This research work proposes a novel technique that explores deviation among all data points and embeds its findings inside well established density-based techniques. This is a state of art technique as it gives a new breadth of research towards resolving inherent problems of high dimensional data where outliers reside within clusters having different densities. A high dimensional dataset from UCI Machine Learning Repository is chosen to test the proposed technique and then its results are compared with that of density-based techniques to evaluate its efficiency.


2020 ◽  
Vol 49 (3) ◽  
pp. 421-437
Author(s):  
Genggeng Liu ◽  
Lin Xie ◽  
Chi-Hua Chen

Dimensionality reduction plays an important role in the data processing of machine learning and data mining, which makes the processing of high-dimensional data more efficient. Dimensionality reduction can extract the low-dimensional feature representation of high-dimensional data, and an effective dimensionality reduction method can not only extract most of the useful information of the original data, but also realize the function of removing useless noise. The dimensionality reduction methods can be applied to all types of data, especially image data. Although the supervised learning method has achieved good results in the application of dimensionality reduction, its performance depends on the number of labeled training samples. With the growing of information from internet, marking the data requires more resources and is more difficult. Therefore, using unsupervised learning to learn the feature of data has extremely important research value. In this paper, an unsupervised multilayered variational auto-encoder model is studied in the text data, so that the high-dimensional feature to the low-dimensional feature becomes efficient and the low-dimensional feature can retain mainly information as much as possible. Low-dimensional feature obtained by different dimensionality reduction methods are used to compare with the dimensionality reduction results of variational auto-encoder (VAE), and the method can be significantly improved over other comparison methods.


2021 ◽  
Author(s):  
Félix Raimundo ◽  
Laetitia Papaxanthos ◽  
Céline Vallot ◽  
Jean-Philippe Vert

AbstractSingle-cell omics technologies produce large quantities of data describing the genomic, transcriptomic or epigenomic profiles of many individual cells in parallel. In order to infer biological knowledge and develop predictive models from these data, machine learning (ML)-based model are increasingly used due to their flexibility, scalability, and impressive success in other fields. In recent years, we have seen a surge of new ML-based method development for low-dimensional representations of single-cell omics data, batch normalization, cell type classification, trajectory inference, gene regulatory network inference or multimodal data integration. To help readers navigate this fast-moving literature, we survey in this review recent advances in ML approaches developed to analyze single-cell omics data, focusing mainly on peer-reviewed publications published in the last two years (2019-2020).


2017 ◽  
Vol 2017 ◽  
pp. 1-9 ◽  
Author(s):  
Binbin Zhang ◽  
Weiwei Wang ◽  
Xiangchu Feng

Subspace clustering aims to group a set of data from a union of subspaces into the subspace from which it was drawn. It has become a popular method for recovering the low-dimensional structure underlying high-dimensional dataset. The state-of-the-art methods construct an affinity matrix based on the self-representation of the dataset and then use a spectral clustering method to obtain the final clustering result. These methods show that sparsity and grouping effect of the affinity matrix are important in recovering the low-dimensional structure. In this work, we propose a weighted sparse penalty and a weighted grouping effect penalty in modeling the self-representation of data points. The experimental results on Extended Yale B, USPS, and Berkeley 500 image segmentation datasets show that the proposed model is more effective than state-of-the-art methods in revealing the subspace structure underlying high-dimensional dataset.


2019 ◽  
pp. 1-9 ◽  
Author(s):  
Yize Zhao ◽  
Changgee Chang ◽  
Qi Long

High-dimensional -omics data such as genomic, transcriptomic, and metabolomic data offer great promise in advancing precision medicine. In particular, such data have enabled the investigation of complex diseases such as cancer at an unprecedented scale and in multiple dimensions. However, a number of analytical challenges complicate analysis of high-dimensional -omics data. One is the growing recognition that complex diseases such as cancer are multifactorial and may be attributed to harmful changes on multiple -omics levels and on the pathway level. When individual genes in an important pathway have relatively weak signals, it can be challenging to detect them on their own, but the aggregated signal in the pathway can be considerably stronger and hence easier to detect with the same sample size. To address these challenges, there is a growing body of literature on knowledge-guided statistical learning methods for analysis of high-dimensional -omics data that can incorporate biological knowledge such as functional genomics and functional proteomics. These methods have been shown to improve predication and classification accuracy and yield biologically more interpretable results compared with statistical learning methods that do not use biological knowledge. In this review, we survey current knowledge-guided statistical learning methods, including both supervised learning and unsupervised learning, and their applications to precision oncology, and we discuss future research directions.


Sign in / Sign up

Export Citation Format

Share Document