scholarly journals An R Package for Divergence Analysis of Omics Data

2019 ◽  
Author(s):  
Wikum Dinalankara ◽  
Qian Ke ◽  
Donald Geman ◽  
Luigi Marchionni

AbstractGiven the ever-increasing amount of high-dimensional and complex omics data becoming available, it is increasingly important to discover simple but effective methods of analysis. Divergence analysis transforms each entry of a high-dimensional omics profile into a digitized (binary or ternary) code based on the deviation of the entry from a given baseline population. This is a novel framework that is significantly different from existing omics data analysis methods: it allows digitization of continuous omics data at the univariate or multivariate level, facilitates sample level analysis, and is applicable on many different omics platforms. The divergence package, available on the R platform through the Bioconductor repository collection, provides easy-to-use functions for carrying out this transformation. Here we demonstrate how to use the package with sample high throughput sequencing data from the Cancer Genome Atlas.

PLoS ONE ◽  
2021 ◽  
Vol 16 (4) ◽  
pp. e0249002
Author(s):  
Wikum Dinalankara ◽  
Qian Ke ◽  
Donald Geman ◽  
Luigi Marchionni

Given the ever-increasing amount of high-dimensional and complex omics data becoming available, it is increasingly important to discover simple but effective methods of analysis. Divergence analysis transforms each entry of a high-dimensional omics profile into a digitized (binary or ternary) code based on the deviation of the entry from a given baseline population. This is a novel framework that is significantly different from existing omics data analysis methods: it allows digitization of continuous omics data at the univariate or multivariate level, facilitates sample level analysis, and is applicable on many different omics platforms. The divergence package, available on the R platform through the Bioconductor repository collection, provides easy-to-use functions for carrying out this transformation. Here we demonstrate how to use the package with data from the Cancer Genome Atlas.


Genetics ◽  
2019 ◽  
Vol 213 (4) ◽  
pp. 1209-1224 ◽  
Author(s):  
Juho A. J. Kontio ◽  
Mikko J. Sillanpää

Gaussian process (GP)-based automatic relevance determination (ARD) is known to be an efficient technique for identifying determinants of gene-by-gene interactions important to trait variation. However, the estimation of GP models is feasible only for low-dimensional datasets (∼200 variables), which severely limits application of the GP-based ARD method for high-throughput sequencing data. In this paper, we provide a nonparametric prescreening method that preserves virtually all the major benefits of the GP-based ARD method and extends its scalability to the typical high-dimensional datasets used in practice. In several simulated test scenarios, the proposed method compared favorably with existing nonparametric dimension reduction/prescreening methods suitable for higher-order interaction searches. As a real-data example, the proposed method was applied to a high-throughput dataset downloaded from the cancer genome atlas (TCGA) with measured expression levels of 16,976 genes (after preprocessing) from patients diagnosed with acute myeloid leukemia.


2020 ◽  
Vol 2020 ◽  
pp. 1-12
Author(s):  
Xiao-Liang Xing ◽  
Zhi-Yong Yao ◽  
Ti Zhang ◽  
Ning Zhu ◽  
Yuan-Wu Liu ◽  
...  

Background. Colorectal cancer (CRC) is the third most common cancer in the world, and most of them are adenocarcinomas. CRC could be classified as colon adenocarcinoma (COAD) and rectum adenocarcinoma (READ) according to the original tumorigenesis position. Increasing evidences indicated that microRNAs (miRNAs) play an important role in the occurrence of multiple tumors. Methods. In this study, we firstly downloaded miRNA (COAD, 8 controls vs. 455 tumors; READ, 3 controls vs. 161 tumors) and mRNA (COAD, 41 controls vs. 478 tumors; READ, 10 controls vs. 166 tumors) data from The Cancer Genome Atlas (TCGA) database and then used DESeq2, RegParallel, miRDB, TargetScanHuman 7.2, DAVID 6.8, STRING, and Cytoscape software to identify the potential prognosis biomarkers. Results. We identified 175 differential expression miRNAs (DEMs) and 3747 differential expression genes (DEGs) in COAD and 184 DEMs and 3928 DEGs in READ. And then, we obtained 21 (13 in COAD and 8 in READ) DEMs associated with the survival rates, which correlated with 440 (217 in COAD and 223 in READ) overlapping DEGs. Through survival analysis for those overlapping DEGs, we found 11 (8 in COAD and 3 in READ) overlapping DGEs associated with survival rates of patients, which were correlated with 9 (7 in COAD and 2 in READ) DEMs significantly. Conclusion. In this study, we found several candidate prognostic biomarkers which have been identified in various cancers and also found several new prognosis biomarkers of COAD and READ. In conclusion, this analysis based on theoretical knowledge and clinical outcomes we have done needs further confirmation by more researches.


2019 ◽  
Author(s):  
Anthony Federico ◽  
Stefano Monti

ABSTRACTSummaryGeneset enrichment is a popular method for annotating high-throughput sequencing data. Existing tools fall short in providing the flexibility to tackle the varied challenges researchers face in such analyses, particularly when analyzing many signatures across multiple experiments. We present a comprehensive R package for geneset enrichment workflows that offers multiple enrichment, visualization, and sharing methods in addition to novel features such as hierarchical geneset analysis and built-in markdown reporting. hypeR is a one-stop solution to performing geneset enrichment for a wide audience and range of use cases.Availability and implementationThe most recent version of the package is available at https://github.com/montilab/hypeR.Supplementary informationComprehensive documentation and tutorials, are available at https://montilab.github.io/hypeR-docs.


Author(s):  
Anthony Federico ◽  
Stefano Monti

Abstract Summary Geneset enrichment is a popular method for annotating high-throughput sequencing data. Existing tools fall short in providing the flexibility to tackle the varied challenges researchers face in such analyses, particularly when analyzing many signatures across multiple experiments. We present a comprehensive R package for geneset enrichment workflows that offers multiple enrichment, visualization, and sharing methods in addition to novel features such as hierarchical geneset analysis and built-in markdown reporting. hypeR is a one-stop solution to performing geneset enrichment for a wide audience and range of use cases. Availability and implementation The most recent version of the package is available at https://github.com/montilab/hypeR. Contact [email protected] or [email protected]


F1000Research ◽  
2018 ◽  
Vol 7 ◽  
pp. 1466 ◽  
Author(s):  
Erik Fasterius ◽  
Cristina Al-Khalili Szigyarto

High throughput sequencing technologies are flourishing in the biological sciences, enabling unprecedented insights into e.g. genetic variation, but require extensive bioinformatic expertise for the analysis. There is thus a need for simple yet effective software that can analyse both existing and novel data, providing interpretable biological results with little bioinformatic prowess. We present seqCAT, a Bioconductor toolkit for analysing genetic variation in high throughput sequencing data. It is a highly accessible, easy-to-use and well-documented R-package that enables a wide range of researchers to analyse their own and publicly available data, providing biologically relevant conclusions and publication-ready figures. SeqCAT can provide information regarding genetic similarities between an arbitrary number of samples, validate specific variants as well as define functionally similar variant groups for further downstream analyses. Its ease of use, installation, complete data-to-conclusions functionality and the inherent flexibility of the R programming language make seqCAT a powerful tool for variant analyses compared to already existing solutions. A publicly available dataset of liver cancer-derived organoids is analysed herein using the seqCAT package, demonstrating that the organoids are genetically stable. A previously known liver cancer-related mutation is additionally shown to be present in a sample though it was not listed in the original publication. Differences between DNA- and RNA-based variant calls in this dataset are also analysed revealing a high median concordance of 97.5%.


2019 ◽  
Vol 35 (19) ◽  
pp. 3718-3726 ◽  
Author(s):  
Peifeng Ruan ◽  
Ya Wang ◽  
Ronglai Shen ◽  
Shuang Wang

Abstract Motivation Recent technology developments have made it possible to generate various kinds of omics data, which provides opportunities to better solve problems such as disease subtyping or disease mapping using more comprehensive omics data jointly. Among many developed data-integration methods, the similarity network fusion (SNF) method has shown a great potential to identify new disease subtypes through separating similar subjects using multi-omics data. SNF effectively fuses similarity networks with pairwise patient similarity measures from different types of omics data into one fused network using both shared and complementary information across multiple types of omics data. Results In this article, we proposed an association-signal-annotation boosted similarity network fusion (ab-SNF) method, adding feature-level association signal annotations as weights aiming to up-weight signal features and down-weight noise features when constructing subject similarity networks to boost the performance in disease subtyping. In various simulation studies, the proposed ab-SNF outperforms the original SNF approach without weights. Most importantly, the improvement in the subtyping performance due to association-signal-annotation weights is amplified in the integration process. Applications to somatic mutation data, DNA methylation data and gene expression data of three cancer types from The Cancer Genome Atlas project suggest that the proposed ab-SNF method consistently identifies new subtypes in each cancer that more accurately predict patient survival and are more biologically meaningful. Availability and implementation The R package abSNF is freely available for downloading from https://github.com/pfruan/abSNF. Supplementary information Supplementary data are available at Bioinformatics online.


2017 ◽  
Author(s):  
Nicholas D. Youngblut ◽  
Samuel E. Barnett ◽  
Daniel H. Buckley

AbstractCombining high throughput sequencing with stable isotope probing (HTS-SIP) is a powerful method for mapping in situ metabolic processes to thousands of microbial taxa. However, accurately mapping metabolic processes to taxa is complex and challenging. Multiple HTS-SIP data analysis methods have been developed, including high-resolution stable isotope probing (HR-SIP), multi-window high-resolution stable isotope probing (MW-HR-SIP), quantitative stable isotope probing (q-SIP), and ΔBD. Currently, the computational tools to perform these analyses are either not publicly available or lack documentation, testing, and developer support. To address this shortfall, we have developed the HTSSIP R package, a toolset for conducting HTS-SIP analyses in a straightforward and easily reproducible manner. The HTSSIP package, along with full documentation and examples, is available from CRAN at https://cran.r-project.org/web/packages/HTSSIP/index.html and Github at https://github.com/nick-youngblut/HTSSIP.


F1000Research ◽  
2019 ◽  
Vol 7 ◽  
pp. 1466 ◽  
Author(s):  
Erik Fasterius ◽  
Cristina Al-Khalili Szigyarto

High throughput sequencing technologies are flourishing in the biological sciences, enabling unprecedented insights into e.g. genetic variation, but require extensive bioinformatic expertise for the analysis. There is thus a need for simple yet effective software that can analyse both existing and novel data, providing interpretable biological results with little bioinformatic prowess. We present seqCAT, a Bioconductor toolkit for analysing genetic variation in high throughput sequencing data. It is a highly accessible, easy-to-use and well-documented R-package that enables a wide range of researchers to analyse their own and publicly available data, providing biologically relevant conclusions and publication-ready figures. SeqCAT can provide information regarding genetic similarities between an arbitrary number of samples, validate specific variants as well as define functionally similar variant groups for further downstream analyses. Its ease of use, installation, complete data-to-conclusions functionality and the inherent flexibility of the R programming language make seqCAT a powerful tool for variant analyses compared to already existing solutions. A publicly available dataset of liver cancer-derived organoids is analysed herein using the seqCAT package, corroborating the original authors' conclusions that the organoids are genetically stable. A previously known liver cancer-related mutation is additionally shown to be present in a sample though it was not listed in the original publication. Differences between DNA- and RNA-based variant calls in this dataset are also analysed revealing a high median concordance of 97.5%. SeqCAT is an open source software under a MIT licence available at https://bioconductor.org/packages/release/bioc/html/seqCAT.html.


Sign in / Sign up

Export Citation Format

Share Document