snakePipes enable flexible, scalable and integrative epigenomic analysis

Mapping Intimacies ◽

10.1101/407312 ◽

2018 ◽

Cited By ~ 2

Author(s):

Vivek Bhardwaj ◽

Steffen Heyne ◽

Katarzyna Sikora ◽

Leily Rabbani ◽

Michael Rauer ◽

...

Keyword(s):

Single Cell ◽

Open Source ◽

Large Scale ◽

Integrative Analysis ◽

Exploratory Research ◽

Rna Seq ◽

Link Type ◽

Open Source License ◽

Fast Processing ◽

Downstream Analysis

AbstractThe scale and diversity of epigenomics data has been rapidly increasing and ever more studies now present analyses of data from multiple epigenomic techniques. Performing such integrative analysis is time-consuming, especially for exploratory research, since there are currently no pipelines available that allow fast processing of datasets from multiple epigenomic assays while also allow for flexibility in running or upgrading the workflows. Here we present a solution to this problem: snakePipes, which can process and perform downstream analysis of data from all common epigenomic techniques (ChIP-seq, RNA-seq, Bisulfite-seq, ATAC-seq, Hi-C and single-cell RNA-seq) in a single package. We demonstrate how snakePipes can simplify integrative analysis by reproducing and extending the results from a recently published large-scale epigenomics study with a few simple commands. snakePipes are available under an open-source license at https://github.com/maxplanck-ie/snakepipes.

Download Full-text

Exploring the single-cell RNA-seq analysis landscape with the scRNA-tools database

10.1101/206573 ◽

2017 ◽

Cited By ~ 2

Author(s):

Luke Zappia ◽

Belinda Phipson ◽

Alicia Oshlack

Keyword(s):

Single Cell ◽

Open Source ◽

Rapid Development ◽

Analysis Tool ◽

Rna Seq ◽

Link Type ◽

Analysis Tools ◽

Rapid Pace ◽

Cell Data ◽

Selection Of

AbstractAs single-cell RNA-sequencing (scRNA-seq) datasets have become more widespread the number of tools designed to analyse these data has dramatically increased. Navigating the vast sea of tools now available is becoming increasingly challenging for researchers. In order to better facilitate selection of appropriate analysis tools we have created the scRNA-tools database (www.scRNA-tools.org) to catalogue and curate analysis tools as they become available. Our database collects a range of information on each scRNA-seq analysis tool and categorises them according to the analysis tasks they perform. Exploration of this database gives insights into the areas of rapid development of analysis methods for scRNA-seq data. We see that many tools perform tasks specific to scRNA-seq analysis, particularly clustering and ordering of cells. We also find that the scRNA-seq community embraces an open-source approach, with most tools available under open-source licenses and preprints being extensively used as a means to describe methods. The scRNA-tools database provides a valuable resource for researchers embarking on scRNA-seq analysis and records of the growth of the field over time.Author summaryIn recent years single-cell RNA-sequeing technologies have emerged that allow scientists to measure the activity of genes in thousands of individual cells simultaneously. This means we can start to look at what each cell in a sample is doing instead of considering an average across all cells in a sample, as was the case with older technologies. However, while access to this kind of data presents a wealth of opportunities it comes with a new set of challenges. Researchers across the world have developed new methods and software tools to make the most of these datasets but the field is moving at such a rapid pace it is difficult to keep up with what is currently available. To make this easier we have developed the scRNA-tools database and website (www.scRNA-tools.org). Our database catalogues analysis tools, recording the tasks they can be used for, where they can be downloaded from and the publications that describe how they work. By looking at this database we can see that developers have focued on methods specific to single-cell data and that they embrace an open-source approach with permissive licensing, sharing of code and preprint publications.

Download Full-text

Improved dropClust R package with integrative analysis support for scRNA-seq data

Bioinformatics ◽

10.1093/bioinformatics/btz823 ◽

2019 ◽

Author(s):

Debajyoti Sinha ◽

Pradyumn Sinha ◽

Ritwik Saha ◽

Sanghamitra Bandyopadhyay ◽

Debarka Sengupta

Keyword(s):

Single Cell ◽

Large Scale ◽

R Package ◽

Integrative Analysis ◽

Locality Sensitive Hashing ◽

Supplementary Information ◽

Expression Data ◽

Rna Seq ◽

Speed Up ◽

Cell Expression

Abstract Summary DropClust leverages Locality Sensitive Hashing (LSH) to speed up clustering of large scale single cell expression data. Here we present the improved dropClust, a complete R package that is, fast, interoperable and minimally resource intensive. The new dropClust features a novel batch effect removal algorithm that allows integrative analysis of single cell RNA-seq (scRNA-seq) datasets. Availability and implementation dropClust is freely available at https://github.com/debsin/dropClust as an R package. A lightweight online version of the dropClust is available at https://debsinha.shinyapps.io/dropClust/. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

dropClust2: An R package for resource efficient analysis of large scale single cell RNA-Seq data

10.1101/596924 ◽

2019 ◽

Author(s):

Debajyoti Sinha ◽

Pradyumn Sinha ◽

Ritwik Saha ◽

Sanghamitra Bandyopadhyay ◽

Debarka Sengupta

Keyword(s):

Single Cell ◽

Programming Languages ◽

Large Scale ◽

Principal Component ◽

Cell Types ◽

R Package ◽

Locality Sensitive Hashing ◽

Rna Seq ◽

Link Type ◽

Component Selection

ABSTRACTDropClust leverages Locality Sensitive Hashing (LSH) to speed up clustering of large scale single cell expression data. It makes ingenious use of structure persevering sampling and modality based principal component selection to rescue minor cell types. Existing implementation of dropClust involves interfacing with multiple programming languagesviz. R, python and C, hindering seamless installation and portability. Here we present dropClust2, a complete R package that’s not only fast but also minimally resource intensive. DropClust2 features a novel batch effect removal algorithm that allows integrative analysis of single cell RNA-seq (scRNA-seq) datasets.Availability and implementationdropClust2 is freely available athttps://debsinha.shinyapps.io/dropClust/as an online web service and athttps://github.com/debsin/dropClustas an R package.

Download Full-text

Corrigendum: FB5P-seq: FACS-Based 5-Prime End Single-Cell RNA-seq for Integrative Analysis of Transcriptome and Antigen Receptor Repertoire in B and T Cells

Frontiers in Immunology ◽

10.3389/fimmu.2020.02047 ◽

2020 ◽

Vol 11 ◽

Author(s):

Noudjoud Attaf ◽

Iñaki Cervera-Marzal ◽

Chuang Dong ◽

Laurine Gil ◽

Amédée Renand ◽

...

Keyword(s):

T Cells ◽

Single Cell ◽

Antigen Receptor ◽

Integrative Analysis ◽

Rna Seq ◽

Receptor Repertoire ◽

Prime End

Download Full-text

sc-REnF:An entropy guided robust feature selection for clustering of single-cell rna-seq data

10.1101/2020.10.10.334573 ◽

2020 ◽

Author(s):

Snehalika Lall ◽

Abhik Ghosh ◽

Sumanta Ray ◽

Sanghamitra Bandyopadhyay

Keyword(s):

Single Cell ◽

Gene Selection ◽

Rna Seq ◽

Technical Noise ◽

Marker Selection ◽

Cell Clustering ◽

Typing Methods ◽

Original Application ◽

Downstream Analysis ◽

Cell Typing

ABSTRACTMany single-cell typing methods require pure clustering of cells, which is susceptible towards the technical noise, and heavily dependent on high quality informative genes selected in the preliminary steps of downstream analysis. Techniques for gene selection in single-cell RNA sequencing (scRNA-seq) data are seemingly simple which casts problems with respect to the resolution of (sub-)types detection, marker selection and ultimately impacts towards cell annotation. We introduce sc-REnF, a novel and robust entropy based feature (gene) selection method, which leverages the landmark advantage of ‘Renyi’ and ‘Tsallis’ entropy achieved in their original application, in single cell clustering. Thereby, gene selection is robust and less sensitive towards the technical noise present in the data, producing a pure clustering of cells, beyond classifying independent and unknown sample with utmost accuracy. The corresponding software is available at: https://github.com/Snehalikalall/sc-REnF

Download Full-text

Predicting heterogeneity in clone-specific therapeutic vulnerabilities using single-cell transcriptomic signatures

Genome Medicine ◽

10.1186/s13073-021-01000-y ◽

2021 ◽

Vol 13 (1) ◽

Author(s):

Chayaporn Suphavilai ◽

Shumei Chia ◽

Ankur Sharma ◽

Lorna Tu ◽

Rafael Peres Da Silva ◽

...

Keyword(s):

Single Cell ◽

Large Scale ◽

Drug Response ◽

Drug Repurposing ◽

High Accuracy ◽

Molecular Heterogeneity ◽

Precision Oncology ◽

Scale Analysis ◽

Rna Seq ◽

Large Scale Analysis

AbstractWhile understanding molecular heterogeneity across patients underpins precision oncology, there is increasing appreciation for taking intra-tumor heterogeneity into account. Based on large-scale analysis of cancer omics datasets, we highlight the importance of intra-tumor transcriptomic heterogeneity (ITTH) for predicting clinical outcomes. Leveraging single-cell RNA-seq (scRNA-seq) with a recommender system (CaDRReS-Sc), we show that heterogeneous gene-expression signatures can predict drug response with high accuracy (80%). Using patient-proximal cell lines, we established the validity of CaDRReS-Sc’s monotherapy (Pearson r>0.6) and combinatorial predictions targeting clone-specific vulnerabilities (>10% improvement). Applying CaDRReS-Sc to rapidly expanding scRNA-seq compendiums can serve as in silico screen to accelerate drug-repurposing studies. Availability: https://github.com/CSB5/CaDRReS-Sc.

Download Full-text

Leveraging high-powered RNA-Seq datasets to improve inference of regulatory activity in single-cell RNA-Seq data

10.1101/553040 ◽

2019 ◽

Cited By ~ 1

Author(s):

Ning Wang ◽

Andrew E. Teschendorff

Keyword(s):

Transcription Factors ◽

Single Cell ◽

Cell Fate ◽

Regulatory Networks ◽

Large Scale ◽

Single Cells ◽

Differential Expression Analysis ◽

Dropout Rate ◽

Rna Seq ◽

Regulatory Activity

AbstractInferring the activity of transcription factors in single cells is a key task to improve our understanding of development and complex genetic diseases. This task is, however, challenging due to the relatively large dropout rate and noisy nature of single-cell RNA-Seq data. Here we present a novel statistical inference framework called SCIRA (Single Cell Inference of Regulatory Activity), which leverages the power of large-scale bulk RNA-Seq datasets to infer high-quality tissue-specific regulatory networks, from which regulatory activity estimates in single cells can be subsequently obtained. We show that SCIRA can correctly infer regulatory activity of transcription factors affected by high technical dropouts. In particular, SCIRA can improve sensitivity by as much as 70% compared to differential expression analysis and current state-of-the-art methods. Importantly, SCIRA can reveal novel regulators of cell-fate in tissue-development, even for cell-types that only make up 5% of the tissue, and can identify key novel tumor suppressor genes in cancer at single cell resolution. In summary, SCIRA will be an invaluable tool for single-cell studies aiming to accurately map activity patterns of key transcription factors during development, and how these are altered in disease.

Download Full-text

RgCop-A regularized copula based method for gene selection in single cell rna-seq data

PLoS Computational Biology ◽

10.1371/journal.pcbi.1009464 ◽

2021 ◽

Vol 17 (10) ◽

pp. e1009464

Author(s):

Snehalika Lall ◽

Sumanta Ray ◽

Sanghamitra Bandyopadhyay

Keyword(s):

Single Cell ◽

Gene Selection ◽

Real Life ◽

Classification Performance ◽

Rna Seq ◽

Scale Invariant ◽

Dependence Measure ◽

Highly Expressed Genes ◽

The Stability ◽

Downstream Analysis

Gene selection in unannotated large single cell RNA sequencing (scRNA-seq) data is important and crucial step in the preliminary step of downstream analysis. The existing approaches are primarily based on high variation (highly variable genes) or significant high expression (highly expressed genes) failed to provide stable and predictive feature set due to technical noise present in the data. Here, we propose RgCop, a novel regularized copula based method for gene selection from large single cell RNA-seq data. RgCop utilizes copula correlation (Ccor), a robust equitable dependence measure that captures multivariate dependency among a set of genes in single cell expression data. We raise an objective function by adding a l1 regularization term with Ccor to penalizes the redundant co-efficient of features/genes, resulting non-redundant effective features/genes set. Results show a significant improvement in the clustering/classification performance of real life scRNA-seq data over the other state-of-the-art. RgCop performs extremely well in capturing dependence among the features of noisy data due to the scale invariant property of copula, thereby improving the stability of the method. Moreover, the differentially expressed (DE) genes identified from the clusters of scRNA-seq data are found to provide an accurate annotation of cells. Finally, the features/genes obtained from RgCop can able to annotate the unknown cells with high accuracy.

Download Full-text

Enhancing droplet-based single-nucleus RNA-seq resolution using the semi-supervised machine learning classifier DIEM

10.1101/786285 ◽

2019 ◽

Cited By ~ 4

Author(s):

Marcus Alvarez ◽

Elior Rahmani ◽

Brandon Jew ◽

Kristina M. Garske ◽

Zong Miao ◽

...

Keyword(s):

Gene Expression ◽

Single Cell ◽

Cell Types ◽

Supervised Machine Learning ◽

Data Sets ◽

Rna Seq ◽

Novel Approach ◽

Single Nucleus ◽

Downstream Analysis

AbstractSingle-nucleus RNA sequencing (snRNA-seq) measures gene expression in individual nuclei instead of cells, allowing for unbiased cell type characterization in solid tissues. Contrary to single-cell RNA seq (scRNA-seq), we observe that snRNA-seq is commonly subject to contamination by high amounts of extranuclear background RNA, which can lead to identification of spurious cell types in downstream clustering analyses if overlooked. We present a novel approach to remove debris-contaminated droplets in snRNA-seq experiments, called Debris Identification using Expectation Maximization (DIEM). Our likelihood-based approach models the gene expression distribution of debris and cell types, which are estimated using EM. We evaluated DIEM using three snRNA-seq data sets: 1) human differentiating preadipocytes in vitro, 2) fresh mouse brain tissue, and 3) human frozen adipose tissue (AT) from six individuals. All three data sets showed various degrees of extranuclear RNA contamination. We observed that existing methods fail to account for contaminated droplets and led to spurious cell types. When compared to filtering using these state of the art methods, DIEM better removed droplets containing high levels of extranuclear RNA and led to higher quality clusters. Although DIEM was designed for snRNA-seq data, we also successfully applied DIEM to single-cell data. To conclude, our novel method DIEM removes debris-contaminated droplets from single-cell-based data fast and effectively, leading to cleaner downstream analysis. Our code is freely available for use at https://github.com/marcalva/diem.

Download Full-text

EpiScanpy: integrated single-cell epigenomic analysis

10.1101/648097 ◽

2019 ◽

Cited By ~ 4

Author(s):

Anna Danese ◽

Maria L. Richter ◽

David S. Fischer ◽

Fabian J. Theis ◽

Maria Colomé-Tatché

Keyword(s):

Dna Methylation ◽

Single Cell ◽

Large Scale ◽

Feature Space ◽

Rna Seq ◽

Computational Framework ◽

Learning Techniques ◽

Multiple Feature ◽

The Many ◽

Cell Data

ABSTRACTEpigenetic single-cell measurements reveal a layer of regulatory information not accessible to single-cell transcriptomics, however single-cell-omics analysis tools mainly focus on gene expression data. To address this issue, we present epiScanpy, a computational framework for the analysis of single-cell DNA methylation and single-cell ATAC-seq data. EpiScanpy makes the many existing RNA-seq workflows from scanpy available to large-scale single-cell data from other -omics modalities. We introduce and compare multiple feature space constructions for epigenetic data and show the feasibility of common clustering, dimension reduction and trajectory learning techniques. We benchmark epiScanpy by interrogating different single-cell brain mouse atlases of DNA methylation, ATAC-seq and transcriptomics. We find that differentially methylated and differentially open markers between cell clusters enrich transcriptome-based cell type labels by orthogonal epigenetic information.

Download Full-text