MOSim: Multi-Omics Simulation in R

Mapping Intimacies ◽

10.1101/421834 ◽

2018 ◽

Cited By ~ 5

Author(s):

Carlos Martínez-Mira ◽

Ana Conesa ◽

Sonia Tarazona

Keyword(s):

Time Series Data ◽

Simulated Data ◽

R Package ◽

Experimental Designs ◽

Supplementary Information ◽

Series Data ◽

Data Sets ◽

Expression Data ◽

Supplementary Material ◽

Omic Data

AbstractMotivationAs new integrative methodologies are being developed to analyse multi-omic experiments, validation strategies are required for benchmarking. In silico approaches such as simulated data are popular as they are fast and cheap. However, few tools are available for creating synthetic multi-omic data sets.ResultsMOSim is a new R package for easily simulating multi-omic experiments consisting of gene expression data, other regulatory omics and the regulatory relationships between them. MOSim supports different experimental designs including time series data.AvailabilityThe package is freely available under the GPL-3 license from the Bitbucket repository (https://bitbucket.org/ConesaLab/mosim/)[email protected] informationSupplementary material is available at bioRxiv online.

Download Full-text

Cell cycle time series gene expression data encoded as cyclic attractors in Hopfield systems

10.1101/170027 ◽

2017 ◽

Author(s):

Anthony Szedlak ◽

Spencer Sims ◽

Nicholas Smith ◽

Giovanni Paternostro ◽

Carlo Piermarocchi

Keyword(s):

Neural Network ◽

Gene Expression ◽

Cell Cycle ◽

Time Series ◽

Time Series Data ◽

Series Data ◽

Data Sets ◽

Expression Data ◽

Time Series Gene Expression ◽

Human Cervical Cancer

AbstractModern time series gene expression and other omics data sets have enabled unprecedented resolution of the dynamics of cellular processes such as cell cycle and response to pharmaceutical compounds. In anticipation of the proliferation of time series data sets in the near future, we use the Hopfield model, a recurrent neural network based on spin glasses, to model the dynamics of cell cycle in HeLa (human cervical cancer) and S. cerevisiae cells. We study some of the rich dynamical properties of these cyclic Hopfield systems, including the ability of populations of simulated cells to recreate experimental expression data and the effects of noise on the dynamics. Next, we use a genetic algorithm to identify sets of genes which, when selectively inhibited by local external fields representing gene silencing compounds such as kinase inhibitors, disrupt the encoded cell cycle. We find, for example, that inhibiting the set of four kinases BRD4, MAPK1, NEK7, and YES1 in HeLa cells causes simulated cells to accumulate in the M phase. Finally, we suggest possible improvements and extensions to our model.Author SummaryCell cycle – the process in which a parent cell replicates its DNA and divides into two daughter cells – is an upregulated process in many forms of cancer. Identifying gene inhibition targets to regulate cell cycle is important to the development of effective therapies. Although modern high throughput techniques offer unprecedented resolution of the molecular details of biological processes like cell cycle, analyzing the vast quantities of the resulting experimental data and extracting actionable information remains a formidable task. Here, we create a dynamical model of the process of cell cycle using the Hopfield model (a type of recurrent neural network) and gene expression data from human cervical cancer cells and yeast cells. We find that the model recreates the oscillations observed in experimental data. Tuning the level of noise (representing the inherent randomness in gene expression and regulation) to the “edge of chaos” is crucial for the proper behavior of the system. We then use this model to identify potential gene targets for disrupting the process of cell cycle. This method could be applied to other time series data sets and used to predict the effects of untested targeted perturbations.

Download Full-text

Single cell network analysis with a mixture of Nested Effects Models

10.1101/258202 ◽

2018 ◽

Author(s):

Martin Pirkl ◽

Niko Beerenwinkel

Keyword(s):

Single Cell ◽

New Technologies ◽

Single Cells ◽

R Package ◽

Supplementary Information ◽

Data Sets ◽

Cell Network ◽

A Cell ◽

Supplementary Material ◽

Cell Data

AbstractMotivationNew technologies allow for the elaborate measurement of different traits of single cells. These data promise to elucidate intra-cellular networks in unprecedented detail and further help to improve treatment of diseases like cancer. However, cell populations can be very heterogeneous.ResultsWe developed a mixture of Nested Effects Models (M&NEM) for single-cell data to simultaneously identify different cellular sub-populations and their corresponding causal networks to explain the heterogeneity in a cell population. For inference, we assign each cell to a network with a certain probability and iteratively update the optimal networks and cell probabilities in an Expectation Maximization scheme. We validate our method in the controlled setting of a simulation study and apply it to three data sets of pooled CRISPR screens generated previously by two novel experimental techniques, namely Crop-Seq and Perturb-Seq.AvailabilityThe mixture Nested Effects Model (M&NEM) is available as the R-package mnem at https://github.com/cbgethz/mnem/[email protected], [email protected] informationSupplementary data are available.online.

Download Full-text

BiomeHorizon: visualizing microbiome time series data in R

10.1101/2021.08.29.458140 ◽

2021 ◽

Author(s):

Isaac Fink ◽

Richard J. Abdill ◽

Ran Blekhman ◽

Laura Grieneisen

Keyword(s):

Time Series ◽

Open Source ◽

Time Series Data ◽

R Package ◽

Supplementary Information ◽

Series Data ◽

Link Type ◽

Microbiome Research ◽

Microbiome Data ◽

Over Time

AbstractSummaryA key aspect of microbiome research is analysis of longitudinal dynamics using time series data. A method to visualize both the proportional and absolute change in the abundance of multiple taxa across multiple subjects over time is needed. We developed BiomeHorizon, an open-source R package that visualizes longitudinal compositional microbiome data using horizon plots.Availability and ImplementationBiomeHorizon is available at https://github.com/blekhmanlab/biomehorizon/ and released under the MIT license. A guide with step-by-step instructions for using the package is provided at https://blekhmanlab.github.io/biomehorizon/. The guide also provides code to reproduce all plots in this [email protected], [email protected], [email protected] informationNone

Download Full-text

DECO: decompose heterogeneous population cohorts for patient stratification and discovery of sample biomarkers using omic data profiling

Bioinformatics ◽

10.1093/bioinformatics/btz148 ◽

2019 ◽

Vol 35 (19) ◽

pp. 3651-3662 ◽

Cited By ~ 1

Author(s):

F J Campos-Laborie ◽

A Risueño ◽

M Ortiz-Estévez ◽

B Rosón-Burgo ◽

C Droste ◽

...

Keyword(s):

Correspondence Analysis ◽

Large Scale ◽

Simulated Data ◽

R Package ◽

Heterogeneous Data ◽

Supplementary Information ◽

Patient Stratification ◽

Differential Analysis ◽

Data Profiling ◽

Omic Data

Abstract Motivation Patient and sample diversity is one of the main challenges when dealing with clinical cohorts in biomedical genomics studies. During last decade, several methods have been developed to identify biomarkers assigned to specific individuals or subtypes of samples. However, current methods still fail to discover markers in complex scenarios where heterogeneity or hidden phenotypical factors are present. Here, we propose a method to analyze and understand heterogeneous data avoiding classical normalization approaches of reducing or removing variation. Results DEcomposing heterogeneous Cohorts using Omic data profiling (DECO) is a method to find significant association among biological features (biomarkers) and samples (individuals) analyzing large-scale omic data. The method identifies and categorizes biomarkers of specific phenotypic conditions based on a recurrent differential analysis integrated with a non-symmetrical correspondence analysis. DECO integrates both omic data dispersion and predictor–response relationship from non-symmetrical correspondence analysis in a unique statistic (called h-statistic), allowing the identification of closely related sample categories within complex cohorts. The performance is demonstrated using simulated data and five experimental transcriptomic datasets, and comparing to seven other methods. We show DECO greatly enhances the discovery and subtle identification of biomarkers, making it especially suited for deep and accurate patient stratification. Availability and implementation DECO is freely available as an R package (including a practical vignette) at Bioconductor repository (http://bioconductor.org/packages/deco/). Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

MetaCycle: an integrated R package to evaluate periodicity in large scale data

10.1101/040345 ◽

2016 ◽

Cited By ~ 6

Author(s):

Gang Wu ◽

Ron C Anafi ◽

Michael E Hughes ◽

Karl Kornacker ◽

John B Hogenesch

Keyword(s):

Statistical Power ◽

Large Scale ◽

Time Series Data ◽

R Package ◽

Ease Of Use ◽

Data Availability ◽

Supplementary Information ◽

Series Data ◽

Large Scale Data ◽

Scale Data

Summary: Detecting periodicity in large scale data remains a challenge. Different algorithms offer strengths and weaknesses in statistical power, sensitivity to outliers, ease of use, and sampling requirements. While efforts have been made to identify best of breed algorithms, relatively little research has gone into integrating these methods in a generalizable method. Here we present MetaCycle, an R package that incorporates ARSER, JTK_CYCLE, and Lomb-Scargle to conveniently evaluate periodicity in time-series data. Availability and implementation: MetaCycle package is available on the CRAN repository (https://cran.r-project.org/web/packages/MetaCycle/index.html) and GitHub (https://github.com/gangwug/MetaCycle). Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.

Download Full-text

FORESEE: a tool for the systematic comparison of translational drug response modeling pipelines

10.7287/peerj.preprints.27256v1 ◽

2018 ◽

Author(s):

Lisa-Katrin Turnhoff ◽

Ali Hadizadeh Esfahani ◽

Maryam Montazeri ◽

Nina Kusch ◽

Andreas Schuppert

Keyword(s):

Drug Response ◽

Drug Efficacy ◽

Response Prediction ◽

R Package ◽

Supplementary Information ◽

Supplementary File ◽

Data Sets ◽

Training Algorithms ◽

Model Training

Translational models that utilize omics data generated in in vitro studies to predict the drug efficacy of anti-cancer compounds in patients are highly distinct, which complicates the benchmarking process for new computational approaches. In reaction to this, we introduce the uniFied translatiOnal dRug rESponsE prEdiction platform FORESEE, an open-source R-package. FORESEE not only provides a uniform data format for public cell line and patient data sets, but also establishes a standardized environment for drug response prediction pipelines, incorporating various state-of-the-art preprocessing methods, model training algorithms and validation techniques. The modular implementation of individual elements of the pipeline facilitates a straightforward development of combinatorial models, which can be used to re-evaluate and improve already existing pipelines as well as to develop new ones. Availability and Implementation: FORESEE is licensed under GNU General Public License v3.0 and available at https://github.com/JRC-COMBINE/FORESEE . Supplementary Information: Supplementary Files 1 and 2 provide detailed descriptions of the pipeline and the data preparation process, while Supplementary File 3 presents basic use cases of the package. Contact: [email protected]

Download Full-text

SpiderSeqR: an R package for crawling the web of high-throughput multi-omic data repositories for data-sets and annotation

10.1101/2020.04.13.039420 ◽

2020 ◽

Author(s):

Anna M. Sozanska ◽

Charles Fletcher ◽

Dóra Bihary ◽

Shamith A. Samarajiwa

Keyword(s):

High Throughput ◽

R Package ◽

Data Reuse ◽

Massively Parallel ◽

Data Sets ◽

Similar Data ◽

Data Generation ◽

Data Repositories ◽

Public Data ◽

Omic Data

AbstractMore than three decades ago, the microarray revolution brought about high-throughput data generation capability to biology and medicine. Subsequently, the emergence of massively parallel sequencing technologies led to many big-data initiatives such as the human genome project and the encyclopedia of DNA elements (ENCODE) project. These, in combination with cheaper, faster massively parallel DNA sequencing capabilities, have democratised multi-omic (genomic, transcriptomic, translatomic and epigenomic) data generation leading to a data deluge in bio-medicine. While some of these data-sets are trapped in inaccessible silos, the vast majority of these data-sets are stored in public data resources and controlled access data repositories, enabling their wider use (or misuse). Currently, most peer reviewed publications require the deposition of the data-set associated with a study under consideration in one of these public data repositories. However, clunky and difficult to use interfaces, subpar or incomplete annotation prevent discovering, searching and filtering of these multi-omic data and hinder their re-purposing in other use cases. In addition, the proliferation of multitude of different data repositories, with partially redundant storage of similar data are yet another obstacle to their continued usefulness. Similarly, interfaces where annotation is spread across multiple web pages, use of accession identifiers with ambiguous and multiple interpretations and lack of good curation make these data-sets difficult to use. We have produced SpiderSeqR, an R package, whose main features include the integration between NCBI GEO and SRA databases, enabling an integrated unified search of SRA and GEO data-sets and associated annotations, conversion between database accessions, as well as convenient filtering of results and saving past queries for future use. All of the above features aim to promote data reuse to facilitate making new discoveries and maximising the potential of existing data-sets.Availabilityhttps://github.com/ss-lab-cancerunit/SpiderSeqR

Download Full-text

An Efficient Method for Forecasting Using Fuzzy Time Series

Emerging Research on Applied Fuzzy Sets and Intuitionistic Fuzzy Matrices - Advances in Computational Intelligence and Robotics ◽

10.4018/978-1-5225-0914-1.ch013 ◽

2017 ◽

pp. 287-304 ◽

Cited By ~ 3

Author(s):

Pritpal Singh

Keyword(s):

Time Series ◽

Time Series Data ◽

Weather Forecasting ◽

Small Error ◽

Fuzzy Time Series ◽

Series Data ◽

Data Sets ◽

Proposed Model ◽

Temperature Forecasting ◽

The University

Forecasting using fuzzy time series has been applied in several areas including forecasting university enrollments, sales, road accidents, financial forecasting, weather forecasting, etc. Recently, many researchers have paid attention to apply fuzzy time series in time series forecasting problems. In this paper, we present a new model to forecast the enrollments in the University of Alabama and the daily average temperature in Taipei, based on one-factor fuzzy time series. In this model, a new frequency based clustering technique is employed for partitioning the time series data sets into different intervals. For defuzzification function, two new principles are also incorporated in this model. In case of enrollments as well daily temperature forecasting, proposed model exhibits very small error rate.

Download Full-text

iMIRAGE: an R package to impute microRNA expression using protein-coding genes

Bioinformatics ◽

10.1093/bioinformatics/btz939 ◽

2019 ◽

Vol 36 (8) ◽

pp. 2608-2610

Author(s):

Aritro Nath ◽

Jeremy Chang ◽

R Stephanie Huang

Keyword(s):

Gene Expression ◽

Small Rnas ◽

Transcriptional Regulators ◽

R Package ◽

Machine Learning Algorithms ◽

Supplementary Information ◽

Expression Data ◽

Protein Coding ◽

Altered Protein ◽

Independent Test

Abstract Summary MicroRNAs (miRNAs) are critical post-transcriptional regulators of gene expression. Due to challenges in accurate profiling of small RNAs, a vast majority of public transcriptome datasets lack reliable miRNA profiles. However, the biological consequence of miRNA activity in the form of altered protein-coding gene (PCG) expression can be captured using machine-learning algorithms. Here, we present iMIRAGE (imputed miRNA activity from gene expression), a convenient tool to predict miRNA expression using PCG expression of the test datasets. The iMIRAGE package provides an integrated workflow for normalization and transformation of miRNA and PCG expression data, along with the option to utilize predicted miRNA targets to impute miRNA activity from independent test PCG datasets. Availability and implementation The iMIRAGE package for R, along with package documentation and vignette, is available at https://aritronath.github.io/iMIRAGE/index.html. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Modeling cell proliferation in human acute myeloid leukemia xenografts

Bioinformatics ◽

10.1093/bioinformatics/btz063 ◽

2019 ◽

Vol 35 (18) ◽

pp. 3378-3386 ◽

Cited By ~ 5

Author(s):

Marco S Nobile ◽

Thalia Vlachou ◽

Simone Spolaor ◽

Daniela Bossi ◽

Paolo Cazzaniga ◽

...

Keyword(s):

Experimental Data ◽

Cell Proliferation ◽

Acute Myeloid Leukemia ◽

Myeloid Leukemia ◽

Time Series Data ◽

Supplementary Information ◽

Series Data ◽

Proliferative Potential ◽

Proliferating Cells ◽

Acute Myeloid

Abstract Motivation Acute myeloid leukemia (AML) is one of the most common hematological malignancies, characterized by high relapse and mortality rates. The inherent intra-tumor heterogeneity in AML is thought to play an important role in disease recurrence and resistance to chemotherapy. Although experimental protocols for cell proliferation studies are well established and widespread, they are not easily applicable to in vivo contexts, and the analysis of related time-series data is often complex to achieve. To overcome these limitations, model-driven approaches can be exploited to investigate different aspects of cell population dynamics. Results In this work, we present ProCell, a novel modeling and simulation framework to investigate cell proliferation dynamics that, differently from other approaches, takes into account the inherent stochasticity of cell division events. We apply ProCell to compare different models of cell proliferation in AML, notably leveraging experimental data derived from human xenografts in mice. ProCell is coupled with Fuzzy Self-Tuning Particle Swarm Optimization, a swarm-intelligence settings-free algorithm used to automatically infer the models parameterizations. Our results provide new insights on the intricate organization of AML cells with highly heterogeneous proliferative potential, highlighting the important role played by quiescent cells and proliferating cells characterized by different rates of division in the progression and evolution of the disease, thus hinting at the necessity to further characterize tumor cell subpopulations. Availability and implementation The source code of ProCell and the experimental data used in this work are available under the GPL 2.0 license on GITHUB at the following URL: https://github.com/aresio/ProCell. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text