Transcriptome Deconvolution of Heterogeneous Tumor Samples with Immune Infiltration

Mapping Intimacies ◽

10.1101/146795 ◽

2017 ◽

Cited By ~ 1

Author(s):

Zeya Wang ◽

Shaolong Cao ◽

Jeffrey S. Morris ◽

Jaeil Ahn ◽

Rongjie Liu ◽

...

Keyword(s):

Experimental Validation ◽

Expression Profiles ◽

R Package ◽

High Accuracy ◽

High Dimensional ◽

Transcriptomic Data ◽

Conditional Mode ◽

Improve Accuracy ◽

Heterogeneous Tissues ◽

Heterogeneous Tumor

AbstractTranscriptomic deconvolution in cancer and other heterogeneous tissues remains challenging. Available methods lack the ability to estimate both component-specific proportions and expression profiles for individual samples. We present DeMixT, a new tool to deconvolve high dimensional data from mixtures of more than two components. DeMixT implements an iterated conditional mode algorithm and a novel gene-set-based component merging approach to improve accuracy. In a series of experimental validation studies and application to TCGA data, DeMixT showed high accuracy. Improved deconvolution is an important step towards linking tumor transcriptomic data with clinical outcomes. An R package, scripts and data are available: https://github.com/wwylab/DeMixT/.

Download Full-text

Tissue-Specific Enrichment Analysis (TSEA) to decode tissue specificity

10.1101/456293 ◽

2018 ◽

Author(s):

Guangsheng Pei ◽

Yulin Dai ◽

Zhongming Zhao ◽

Peilin Jia

Keyword(s):

Tissue Specificity ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Enrichment Analysis ◽

R Package ◽

Tissue Specific ◽

Genome Wide ◽

Specific Regulation ◽

Association Data ◽

Heterogeneous Tissues

AbstractMotivationDiseases and traits are under dynamic tissue-specific regulation. However, heterogeneous tissues are often collected in biomedical studies, which reduce the power in the identification of disease-associated variants and gene expression profiles.ResultsWe present TSEA, an R package to conduct Tissue-Specific Enrichment Analysis (TSEA) with two built-in reference panels. Statistical methods are developed and implemented for detecting tissue-specific genes and for enrichment test of different forms of query data. Our applications using multi-trait genome-wide association data and cancer expression data showed that TSEA could effectively identify the most relevant tissues for each query trait or sample, providing insights for future studies.Availabilityhttps://github.com/bsml320/[email protected] or [email protected]

Download Full-text

Robust partial reference-free cell composition estimation from tissue expression

Bioinformatics ◽

10.1093/bioinformatics/btaa184 ◽

2020 ◽

Vol 36 (11) ◽

pp. 3431-3438

Author(s):

Ziyi Li ◽

Zhenxing Guo ◽

Ying Cheng ◽

Peng Jin ◽

Hao Wu

Keyword(s):

Expression Profiles ◽

Gene Expression Profiles ◽

Real Data ◽

Estimation Procedure ◽

Free Cell ◽

Biological Information ◽

Supplementary Information ◽

Tissue Samples ◽

Cell Composition ◽

Heterogeneous Tissues

Abstract Motivation In the analysis of high-throughput omics data from tissue samples, estimating and accounting for cell composition have been recognized as important steps. High cost, intensive labor requirements and technical limitations hinder the cell composition quantification using cell-sorting or single-cell technologies. Computational methods for cell composition estimation are available, but they are either limited by the availability of a reference panel or suffer from low accuracy. Results We introduce TOols for the Analysis of heterogeneouS Tissues TOAST/-P and TOAST/+P, two partial reference-free algorithms for estimating cell composition of heterogeneous tissues based on their gene expression profiles. TOAST/-P and TOAST/+P incorporate additional biological information, including cell-type-specific markers and prior knowledge of compositions, in the estimation procedure. Extensive simulation studies and real data analyses demonstrate that the proposed methods provide more accurate and robust cell composition estimation than existing methods. Availability and implementation The proposed methods TOAST/-P and TOAST/+P are implemented as part of the R/Bioconductor package TOAST at https://bioconductor.org/packages/TOAST. Contact [email protected] or [email protected] Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

An R Package for Divergence Analysis of Omics Data

10.1101/720391 ◽

2019 ◽

Author(s):

Wikum Dinalankara ◽

Qian Ke ◽

Donald Geman ◽

Luigi Marchionni

Keyword(s):

High Throughput Sequencing ◽

R Package ◽

The Cancer Genome Atlas ◽

High Dimensional ◽

Omics Data ◽

Sequencing Data ◽

High Throughput Sequencing Data ◽

Ternary Code ◽

Cancer Genome Atlas ◽

Level Analysis

AbstractGiven the ever-increasing amount of high-dimensional and complex omics data becoming available, it is increasingly important to discover simple but effective methods of analysis. Divergence analysis transforms each entry of a high-dimensional omics profile into a digitized (binary or ternary) code based on the deviation of the entry from a given baseline population. This is a novel framework that is significantly different from existing omics data analysis methods: it allows digitization of continuous omics data at the univariate or multivariate level, facilitates sample level analysis, and is applicable on many different omics platforms. The divergence package, available on the R platform through the Bioconductor repository collection, provides easy-to-use functions for carrying out this transformation. Here we demonstrate how to use the package with sample high throughput sequencing data from the Cancer Genome Atlas.

Download Full-text

R/qtlcharts: interactive graphics for quantitative trait locus mapping

10.1101/011437 ◽

2014 ◽

Cited By ~ 1

Author(s):

Karl W Broman

Keyword(s):

Quantitative Trait Locus ◽

Quantitative Trait Locus Mapping ◽

Quantitative Trait ◽

Quantitative Traits ◽

R Package ◽

High Dimensional ◽

Interactive Graphics ◽

Phenotype Data ◽

Trait Locus ◽

Locus Mapping

Every data visualization can be improved with some level of interactivity. Interactive graphics hold particular promise for the exploration of high-dimensional data. R/qtlcharts is an R package to create interactive graphics for experiments to map quantitative trait loci (QTL; genetic loci that influence quantitative traits). R/qtlcharts serves as a companion to the R/qtl package, providing interactive versions of R/qtl's static graphs, as well as additional interactive graphs for the exploration of high-dimensional genotype and phenotype data.

Download Full-text

An R package for divergence analysis of omics data

PLoS ONE ◽

10.1371/journal.pone.0249002 ◽

2021 ◽

Vol 16 (4) ◽

pp. e0249002

Author(s):

Wikum Dinalankara ◽

Qian Ke ◽

Donald Geman ◽

Luigi Marchionni

Keyword(s):

R Package ◽

The Cancer Genome Atlas ◽

High Dimensional ◽

Omics Data ◽

Ternary Code ◽

Cancer Genome Atlas ◽

Level Analysis ◽

Data Analysis Methods ◽

Genome Atlas ◽

Omics Data Analysis

Given the ever-increasing amount of high-dimensional and complex omics data becoming available, it is increasingly important to discover simple but effective methods of analysis. Divergence analysis transforms each entry of a high-dimensional omics profile into a digitized (binary or ternary) code based on the deviation of the entry from a given baseline population. This is a novel framework that is significantly different from existing omics data analysis methods: it allows digitization of continuous omics data at the univariate or multivariate level, facilitates sample level analysis, and is applicable on many different omics platforms. The divergence package, available on the R platform through the Bioconductor repository collection, provides easy-to-use functions for carrying out this transformation. Here we demonstrate how to use the package with data from the Cancer Genome Atlas.

Download Full-text

Simulation of the Laminar Flow in a PRIMIX Static Mixer

Computational Technologies for Fluid/Thermal/Structural/Chemical Systems With Industrial Applications, Volume 1 ◽

10.1115/pvp2002-1575 ◽

2002 ◽

Author(s):

R. F. Mudde ◽

C. Van Pijpen ◽

R. Beugels

Keyword(s):

Pressure Drop ◽

Particle Tracking ◽

Experimental Validation ◽

Independent Solution ◽

High Accuracy ◽

Static Mixer ◽

Laminar Regime ◽

Time Step ◽

Static Mixers ◽

Direct Use

The PRIMIX helical static mixer has been investigated using numerical simulations. The flow is in the laminar regime (Re = 1 to 1000). The simulations concentrate on the pressure drop and on the use of particle tracking for mixing studies. For the pressure drop, experimental validation is provided. It is found that the pressure drop can be simulated with high accuracy for Re < 350. For higher Re-values no grid independent solution could be obtained and the experimental results no longer agree with those of the simulations. The simulated pressure drop results scaled to the empty pipe pressure drop, can be well summarized as K = 4.99 + Re/31.4. Using Particle Tracking it has been possible to reproduce literature data. However, it has been shown that the obtained results are rather sensitive to the choice of the time step. This limits the direct use of particle tracking techniques for studying the mixing of static mixers in the laminar regime.

Download Full-text

A Tutorial on : R Package for the Linearized Bregman Algorithm in High-Dimensional Statistics

Handbook of Big Data Analytics - Springer Handbooks of Computational Statistics ◽

10.1007/978-3-319-18284-1_17 ◽

2018 ◽

pp. 425-453

Author(s):

Jiechao Xiong ◽

Feng Ruan ◽

Yuan Yao

Keyword(s):

R Package ◽

High Dimensional ◽

High Dimensional Statistics

Download Full-text

Seagull: lasso, group lasso and sparse-group lasso regularization for linear regression models via proximal gradient descent

BMC Bioinformatics ◽

10.1186/s12859-020-03725-w ◽

2020 ◽

Vol 21 (1) ◽

Author(s):

Jan Klosa ◽

Noah Simon ◽

Pål Olof Westermark ◽

Volkmar Liebscher ◽

Dörte Wittenburg

Keyword(s):

Linear Regression ◽

Regression Models ◽

Gradient Descent ◽

Methylation Status ◽

R Package ◽

Group Lasso ◽

High Dimensional ◽

Linear Regression Models ◽

Sparse Group Lasso ◽

Proximal Gradient Descent

Abstract Background Statistical analyses of biological problems in life sciences often lead to high-dimensional linear models. To solve the corresponding system of equations, penalization approaches are often the methods of choice. They are especially useful in case of multicollinearity, which appears if the number of explanatory variables exceeds the number of observations or for some biological reason. Then, the model goodness of fit is penalized by some suitable function of interest. Prominent examples are the lasso, group lasso and sparse-group lasso. Here, we offer a fast and numerically cheap implementation of these operators via proximal gradient descent. The grid search for the penalty parameter is realized by warm starts. The step size between consecutive iterations is determined with backtracking line search. Finally, seagull -the R package presented here- produces complete regularization paths. Results Publicly available high-dimensional methylation data are used to compare seagull to the established R package SGL. The results of both packages enabled a precise prediction of biological age from DNA methylation status. But even though the results of seagull and SGL were very similar (R2 > 0.99), seagull computed the solution in a fraction of the time needed by SGL. Additionally, seagull enables the incorporation of weights for each penalized feature. Conclusions The following operators for linear regression models are available in seagull: lasso, group lasso, sparse-group lasso and Integrative LASSO with Penalty Factors (IPF-lasso). Thus, seagull is a convenient envelope of lasso variants.

Download Full-text

A Compendium to Ensure Computational Reproducibility in High-Dimensional Classification Tasks

Statistical Applications in Genetics and Molecular Biology ◽

10.2202/1544-6115.1078 ◽

2004 ◽

Vol 3 (1) ◽

pp. 1-24 ◽

Cited By ~ 46

Author(s):

Markus Ruschhaupt ◽

Wolfgang Huber ◽

Annemarie Poustka ◽

Ulrich Mansmann

Keyword(s):

Expression Profiles ◽

Gene Expression Profiles ◽

Statistical Processing ◽

Primary Data ◽

High Dimensional ◽

Microarray Gene Expression ◽

Data Set ◽

Dimensional Classification ◽

Biological Interpretation

We demonstrate a concept and implementation of a compendium for the classification of high-dimensional data from microarray gene expression profiles. A compendium is an interactive document that bundles primary data, statistical processing methods, figures, and derived data together with the textual documentation and conclusions. Interactivity allows the reader to modify and extend these components. We address the following questions: how much does the discriminatory power of a classifier depend on the choice of the algorithm that was used to identify it; what alternative classifiers could be used just as well; how robust is the result. The answers to these questions are essential prerequisites for validation and biological interpretation of the classifiers. We show how to use this approach by looking at these questions for a specific breast cancer microarray data set that first has been studied by Huang et al. (2003).

Download Full-text

Interest-Based Ordering for Fuzzy Morphology on White Blood Cell Image Segmentation

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2012.p0076 ◽

2012 ◽

Vol 16 (1) ◽

pp. 76-86 ◽

Cited By ~ 20

Author(s):

Chastine Fatichah ◽

◽

Martin Leonard Tangel ◽

Muhammad Rahmat Widyanto ◽

Fangyan Dong ◽

...

Keyword(s):

Image Segmentation ◽

Blood Cell ◽

White Blood Cell ◽

Cancer Diagnosis ◽

Fuzzy Clustering ◽

High Accuracy ◽

Low Density ◽

Fuzzy Morphology ◽

Cell Image ◽

Improve Accuracy

An Interest-based Ordering Scheme (IOS) for fuzzy morphology on White-Blood-Cell (WBC) image segmentation is proposed to improve accuracy of segmentation. The proposed method shows a high accuracy in segmenting both high- and low-density nuclei. Further, its running time is low, so it can be used for real applications. To evaluate the performance of the proposed method, 100 WBC images and 10 leukemia images are used, and the experimental results show that the proposed IOS segments a nucleus in WBC images 3.99% more accurately on average than the Lexicographical Ordering Scheme (LOS) does and 5.29% more accurately on average than the combined Fuzzy Clustering and Binary Morphology (FCBM) method does. The proposal method segments a cytoplasm 20.72% more accurately on average than the FCBM method. The WBC image segmentation is a part of WBC classification in an automatic cancer-diagnosis application that is being developed. In addition, the proposed method can be used to segment any images that focus on the important color of an object of interest.

Download Full-text