An R Package for Divergence Analysis of Omics Data

Mapping Intimacies ◽

10.1101/720391 ◽

2019 ◽

Author(s):

Wikum Dinalankara ◽

Qian Ke ◽

Donald Geman ◽

Luigi Marchionni

Keyword(s):

High Throughput Sequencing ◽

R Package ◽

The Cancer Genome Atlas ◽

High Dimensional ◽

Omics Data ◽

Sequencing Data ◽

High Throughput Sequencing Data ◽

Ternary Code ◽

Cancer Genome Atlas ◽

Level Analysis

AbstractGiven the ever-increasing amount of high-dimensional and complex omics data becoming available, it is increasingly important to discover simple but effective methods of analysis. Divergence analysis transforms each entry of a high-dimensional omics profile into a digitized (binary or ternary) code based on the deviation of the entry from a given baseline population. This is a novel framework that is significantly different from existing omics data analysis methods: it allows digitization of continuous omics data at the univariate or multivariate level, facilitates sample level analysis, and is applicable on many different omics platforms. The divergence package, available on the R platform through the Bioconductor repository collection, provides easy-to-use functions for carrying out this transformation. Here we demonstrate how to use the package with sample high throughput sequencing data from the Cancer Genome Atlas.

Download Full-text

An R package for divergence analysis of omics data

PLoS ONE ◽

10.1371/journal.pone.0249002 ◽

2021 ◽

Vol 16 (4) ◽

pp. e0249002

Author(s):

Wikum Dinalankara ◽

Qian Ke ◽

Donald Geman ◽

Luigi Marchionni

Keyword(s):

R Package ◽

The Cancer Genome Atlas ◽

High Dimensional ◽

Omics Data ◽

Ternary Code ◽

Cancer Genome Atlas ◽

Level Analysis ◽

Data Analysis Methods ◽

Genome Atlas ◽

Omics Data Analysis

Given the ever-increasing amount of high-dimensional and complex omics data becoming available, it is increasingly important to discover simple but effective methods of analysis. Divergence analysis transforms each entry of a high-dimensional omics profile into a digitized (binary or ternary) code based on the deviation of the entry from a given baseline population. This is a novel framework that is significantly different from existing omics data analysis methods: it allows digitization of continuous omics data at the univariate or multivariate level, facilitates sample level analysis, and is applicable on many different omics platforms. The divergence package, available on the R platform through the Bioconductor repository collection, provides easy-to-use functions for carrying out this transformation. Here we demonstrate how to use the package with data from the Cancer Genome Atlas.

Download Full-text

Scalable Nonparametric Prescreening Method for Searching Higher-Order Genetic Interactions Underlying Quantitative Traits

Genetics ◽

10.1534/genetics.119.302658 ◽

2019 ◽

Vol 213 (4) ◽

pp. 1209-1224 ◽

Cited By ~ 2

Author(s):

Juho A. J. Kontio ◽

Mikko J. Sillanpää

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

Real Data ◽

Higher Order ◽

The Cancer Genome Atlas ◽

Sequencing Data ◽

High Throughput Sequencing Data ◽

Automatic Relevance Determination ◽

Cancer Genome Atlas ◽

Low Dimensional

Gaussian process (GP)-based automatic relevance determination (ARD) is known to be an efficient technique for identifying determinants of gene-by-gene interactions important to trait variation. However, the estimation of GP models is feasible only for low-dimensional datasets (∼200 variables), which severely limits application of the GP-based ARD method for high-throughput sequencing data. In this paper, we provide a nonparametric prescreening method that preserves virtually all the major benefits of the GP-based ARD method and extends its scalability to the typical high-dimensional datasets used in practice. In several simulated test scenarios, the proposed method compared favorably with existing nonparametric dimension reduction/prescreening methods suitable for higher-order interaction searches. As a real-data example, the proposed method was applied to a high-throughput dataset downloaded from the cancer genome atlas (TCGA) with measured expression levels of 16,976 genes (after preprocessing) from patients diagnosed with acute myeloid leukemia.

Download Full-text

MicroRNA-Related Prognosis Biomarkers from High-Throughput Sequencing Data of Colorectal Cancer

BioMed Research International ◽

10.1155/2020/7905380 ◽

2020 ◽

Vol 2020 ◽

pp. 1-12

Author(s):

Xiao-Liang Xing ◽

Zhi-Yong Yao ◽

Ti Zhang ◽

Ning Zhu ◽

Yuan-Wu Liu ◽

...

Keyword(s):

Colorectal Cancer ◽

Differential Expression ◽

High Throughput Sequencing ◽

Survival Rates ◽

Colon Adenocarcinoma ◽

The Cancer Genome Atlas ◽

Sequencing Data ◽

High Throughput Sequencing Data ◽

Differential Expression Genes ◽

Cancer Genome Atlas

Background. Colorectal cancer (CRC) is the third most common cancer in the world, and most of them are adenocarcinomas. CRC could be classified as colon adenocarcinoma (COAD) and rectum adenocarcinoma (READ) according to the original tumorigenesis position. Increasing evidences indicated that microRNAs (miRNAs) play an important role in the occurrence of multiple tumors. Methods. In this study, we firstly downloaded miRNA (COAD, 8 controls vs. 455 tumors; READ, 3 controls vs. 161 tumors) and mRNA (COAD, 41 controls vs. 478 tumors; READ, 10 controls vs. 166 tumors) data from The Cancer Genome Atlas (TCGA) database and then used DESeq2, RegParallel, miRDB, TargetScanHuman 7.2, DAVID 6.8, STRING, and Cytoscape software to identify the potential prognosis biomarkers. Results. We identified 175 differential expression miRNAs (DEMs) and 3747 differential expression genes (DEGs) in COAD and 184 DEMs and 3928 DEGs in READ. And then, we obtained 21 (13 in COAD and 8 in READ) DEMs associated with the survival rates, which correlated with 440 (217 in COAD and 223 in READ) overlapping DEGs. Through survival analysis for those overlapping DEGs, we found 11 (8 in COAD and 3 in READ) overlapping DGEs associated with survival rates of patients, which were correlated with 9 (7 in COAD and 2 in READ) DEMs significantly. Conclusion. In this study, we found several candidate prognostic biomarkers which have been identified in various cancers and also found several new prognosis biomarkers of COAD and READ. In conclusion, this analysis based on theoretical knowledge and clinical outcomes we have done needs further confirmation by more researches.

Download Full-text

hypeR: An R Package for Geneset Enrichment Workflows

10.1101/656637 ◽

2019 ◽

Cited By ~ 1

Author(s):

Anthony Federico ◽

Stefano Monti

Keyword(s):

High Throughput Sequencing ◽

R Package ◽

Supplementary Information ◽

Sequencing Data ◽

Wide Audience ◽

Popular Method ◽

Link Type ◽

High Throughput Sequencing Data ◽

One Stop ◽

Recent Version

ABSTRACTSummaryGeneset enrichment is a popular method for annotating high-throughput sequencing data. Existing tools fall short in providing the flexibility to tackle the varied challenges researchers face in such analyses, particularly when analyzing many signatures across multiple experiments. We present a comprehensive R package for geneset enrichment workflows that offers multiple enrichment, visualization, and sharing methods in addition to novel features such as hierarchical geneset analysis and built-in markdown reporting. hypeR is a one-stop solution to performing geneset enrichment for a wide audience and range of use cases.Availability and implementationThe most recent version of the package is available at https://github.com/montilab/hypeR.Supplementary informationComprehensive documentation and tutorials, are available at https://montilab.github.io/hypeR-docs.

Download Full-text

hypeR: an R package for geneset enrichment workflows

Bioinformatics ◽

10.1093/bioinformatics/btz700 ◽

2019 ◽

Cited By ~ 3

Author(s):

Anthony Federico ◽

Stefano Monti

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

R Package ◽

Use Cases ◽

Sequencing Data ◽

Wide Audience ◽

Popular Method ◽

High Throughput Sequencing Data ◽

One Stop ◽

Recent Version

Abstract Summary Geneset enrichment is a popular method for annotating high-throughput sequencing data. Existing tools fall short in providing the flexibility to tackle the varied challenges researchers face in such analyses, particularly when analyzing many signatures across multiple experiments. We present a comprehensive R package for geneset enrichment workflows that offers multiple enrichment, visualization, and sharing methods in addition to novel features such as hierarchical geneset analysis and built-in markdown reporting. hypeR is a one-stop solution to performing geneset enrichment for a wide audience and range of use cases. Availability and implementation The most recent version of the package is available at https://github.com/montilab/hypeR. Contact [email protected] or [email protected]

Download Full-text

HTSSIP: An R package for analysis of high throughput sequencing data from nucleic acid stable isotope probing (SIP) experiments

PLoS ONE ◽

10.1371/journal.pone.0189616 ◽

2018 ◽

Vol 13 (1) ◽

pp. e0189616 ◽

Cited By ~ 13

Author(s):

Nicholas D. Youngblut ◽

Samuel E. Barnett ◽

Daniel H. Buckley

Keyword(s):

Nucleic Acid ◽

Stable Isotope ◽

High Throughput ◽

High Throughput Sequencing ◽

R Package ◽

Stable Isotope Probing ◽

Sequencing Data ◽

High Throughput Sequencing Data ◽

Acid Stable

Download Full-text

seqCAT: a Bioconductor R-package for variant analysis of high throughput sequencing data

F1000Research ◽

10.12688/f1000research.16083.1 ◽

2018 ◽

Vol 7 ◽

pp. 1466 ◽

Cited By ~ 2

Author(s):

Erik Fasterius ◽

Cristina Al-Khalili Szigyarto

Keyword(s):

Genetic Variation ◽

Liver Cancer ◽

High Throughput ◽

High Throughput Sequencing ◽

R Package ◽

Ease Of Use ◽

Sequencing Data ◽

Dna And Rna ◽

High Throughput Sequencing Data ◽

Wide Range

High throughput sequencing technologies are flourishing in the biological sciences, enabling unprecedented insights into e.g. genetic variation, but require extensive bioinformatic expertise for the analysis. There is thus a need for simple yet effective software that can analyse both existing and novel data, providing interpretable biological results with little bioinformatic prowess. We present seqCAT, a Bioconductor toolkit for analysing genetic variation in high throughput sequencing data. It is a highly accessible, easy-to-use and well-documented R-package that enables a wide range of researchers to analyse their own and publicly available data, providing biologically relevant conclusions and publication-ready figures. SeqCAT can provide information regarding genetic similarities between an arbitrary number of samples, validate specific variants as well as define functionally similar variant groups for further downstream analyses. Its ease of use, installation, complete data-to-conclusions functionality and the inherent flexibility of the R programming language make seqCAT a powerful tool for variant analyses compared to already existing solutions. A publicly available dataset of liver cancer-derived organoids is analysed herein using the seqCAT package, demonstrating that the organoids are genetically stable. A previously known liver cancer-related mutation is additionally shown to be present in a sample though it was not listed in the original publication. Differences between DNA- and RNA-based variant calls in this dataset are also analysed revealing a high median concordance of 97.5%.

Download Full-text

Using association signal annotations to boost similarity network fusion

Bioinformatics ◽

10.1093/bioinformatics/btz124 ◽

2019 ◽

Vol 35 (19) ◽

pp. 3718-3726 ◽

Cited By ~ 5

Author(s):

Peifeng Ruan ◽

Ya Wang ◽

Ronglai Shen ◽

Shuang Wang

Keyword(s):

Similarity Measures ◽

R Package ◽

The Cancer Genome Atlas ◽

Supplementary Information ◽

Omics Data ◽

Similarity Network ◽

Signal Features ◽

Cancer Genome Atlas ◽

Cancer Types ◽

Similarity Networks

Abstract Motivation Recent technology developments have made it possible to generate various kinds of omics data, which provides opportunities to better solve problems such as disease subtyping or disease mapping using more comprehensive omics data jointly. Among many developed data-integration methods, the similarity network fusion (SNF) method has shown a great potential to identify new disease subtypes through separating similar subjects using multi-omics data. SNF effectively fuses similarity networks with pairwise patient similarity measures from different types of omics data into one fused network using both shared and complementary information across multiple types of omics data. Results In this article, we proposed an association-signal-annotation boosted similarity network fusion (ab-SNF) method, adding feature-level association signal annotations as weights aiming to up-weight signal features and down-weight noise features when constructing subject similarity networks to boost the performance in disease subtyping. In various simulation studies, the proposed ab-SNF outperforms the original SNF approach without weights. Most importantly, the improvement in the subtyping performance due to association-signal-annotation weights is amplified in the integration process. Applications to somatic mutation data, DNA methylation data and gene expression data of three cancer types from The Cancer Genome Atlas project suggest that the proposed ab-SNF method consistently identifies new subtypes in each cancer that more accurately predict patient survival and are more biologically meaningful. Availability and implementation The R package abSNF is freely available for downloading from https://github.com/pfruan/abSNF. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

HTSSIP: an R package for analysis of high throughput sequencing data from nucleic acid stable isotope probing (SIP) experiments

10.1101/166009 ◽

2017 ◽

Author(s):

Nicholas D. Youngblut ◽

Samuel E. Barnett ◽

Daniel H. Buckley

Keyword(s):

High Resolution ◽

Stable Isotope ◽

High Throughput ◽

High Throughput Sequencing ◽

R Package ◽

Stable Isotope Probing ◽

Sequencing Data ◽

Metabolic Processes ◽

Link Type ◽

High Throughput Sequencing Data

AbstractCombining high throughput sequencing with stable isotope probing (HTS-SIP) is a powerful method for mapping in situ metabolic processes to thousands of microbial taxa. However, accurately mapping metabolic processes to taxa is complex and challenging. Multiple HTS-SIP data analysis methods have been developed, including high-resolution stable isotope probing (HR-SIP), multi-window high-resolution stable isotope probing (MW-HR-SIP), quantitative stable isotope probing (q-SIP), and ΔBD. Currently, the computational tools to perform these analyses are either not publicly available or lack documentation, testing, and developer support. To address this shortfall, we have developed the HTSSIP R package, a toolset for conducting HTS-SIP analyses in a straightforward and easily reproducible manner. The HTSSIP package, along with full documentation and examples, is available from CRAN at https://cran.r-project.org/web/packages/HTSSIP/index.html and Github at https://github.com/nick-youngblut/HTSSIP.

Download Full-text

seqCAT: a Bioconductor R-package for variant analysis of high throughput sequencing data

F1000Research ◽

10.12688/f1000research.16083.2 ◽

2019 ◽

Vol 7 ◽

pp. 1466 ◽

Cited By ~ 1

Author(s):

Erik Fasterius ◽

Cristina Al-Khalili Szigyarto

Keyword(s):

Genetic Variation ◽

Liver Cancer ◽

High Throughput ◽

High Throughput Sequencing ◽

R Package ◽

Ease Of Use ◽

Sequencing Data ◽

Dna And Rna ◽

High Throughput Sequencing Data ◽

Wide Range

High throughput sequencing technologies are flourishing in the biological sciences, enabling unprecedented insights into e.g. genetic variation, but require extensive bioinformatic expertise for the analysis. There is thus a need for simple yet effective software that can analyse both existing and novel data, providing interpretable biological results with little bioinformatic prowess. We present seqCAT, a Bioconductor toolkit for analysing genetic variation in high throughput sequencing data. It is a highly accessible, easy-to-use and well-documented R-package that enables a wide range of researchers to analyse their own and publicly available data, providing biologically relevant conclusions and publication-ready figures. SeqCAT can provide information regarding genetic similarities between an arbitrary number of samples, validate specific variants as well as define functionally similar variant groups for further downstream analyses. Its ease of use, installation, complete data-to-conclusions functionality and the inherent flexibility of the R programming language make seqCAT a powerful tool for variant analyses compared to already existing solutions. A publicly available dataset of liver cancer-derived organoids is analysed herein using the seqCAT package, corroborating the original authors' conclusions that the organoids are genetically stable. A previously known liver cancer-related mutation is additionally shown to be present in a sample though it was not listed in the original publication. Differences between DNA- and RNA-based variant calls in this dataset are also analysed revealing a high median concordance of 97.5%. SeqCAT is an open source software under a MIT licence available at https://bioconductor.org/packages/release/bioc/html/seqCAT.html.

Download Full-text