FourCSeq: Analysis of 4C sequencing data

Mapping Intimacies ◽

10.1101/009548 ◽

2014 ◽

Cited By ~ 4

Author(s):

Felix A. Klein ◽

Tibor Pakozdi ◽

Simon Anders ◽

Yad Ghavi-Helm ◽

Eileen E. M. Furlong ◽

...

Keyword(s):

Specific Interaction ◽

Cell Types ◽

R Package ◽

Genomic Region ◽

Specific Interactions ◽

Genomic Distance ◽

Sequencing Data ◽

Experimental Conditions ◽

Chromosome Conformation ◽

Z Scores

Abstract Motivation: Circularized Chromosome Conformation Capture (4C) is a powerful technique for studying the spatial interactions of a specific genomic region called the ?view- point? with the rest of the genome, both in a single condition or comparing different experimental conditions or cell types. Observed ligation frequencies show a strong, regular dependence on genomic distance from the viewpoint, on top of which specific interaction peaks are superimposed. Here, we address the computational task to find these specific interactions and to detect changes between interaction profiles of different conditions. Results: We model the overall trend of decreasing interaction frequency with genomic distance by fitting a smooth monotonously decreasing function to suitably trans- formed count data. Based on the fit, z-scores are calculated from the residuals, with high z scores being interpreted as peaks providing evidence for specific interactions. To compare different conditions, we normalize fragment counts between samples, and call for differential contact frequencies using the statisti- cal method DESeq2 adapted from RNA-Seq analysis. Availability and Implementation: A full end-to-end analysis pipeline is implemented in the R package FourCSeq available at www.bioconductor.org.

Download Full-text

rPanglaoDB: an R package to download and merge labeled single-cell RNA-seq data from the PanglaoDB database

10.1101/2021.05.28.446161 ◽

2021 ◽

Author(s):

Daniel Osorio ◽

Marieke Lydia Kuijjer ◽

James J. Cai

Keyword(s):

Single Cell ◽

Cell Types ◽

R Package ◽

Rna Seq ◽

Cell Type ◽

Sequencing Data ◽

Single Experiment ◽

Tissue Samples ◽

Molecular Phenotypes ◽

Public Datasets

Motivation: Characterizing cells with rare molecular phenotypes is one of the promises of high throughput single-cell RNA sequencing (scRNA-seq) techniques. However, collecting enough cells with the desired molecular phenotype in a single experiment is challenging, requiring several samples preprocessing steps to filter and collect the desired cells experimentally before sequencing. Data integration of multiple public single-cell experiments stands as a solution for this problem, allowing the collection of enough cells exhibiting the desired molecular signatures. By increasing the sample size of the desired cell type, this approach enables a robust cell type transcriptome characterization. Results: Here, we introduce rPanglaoDB, an R package to download and merge the uniformly processed and annotated scRNA-seq data provided by the PanglaoDB database. To show the potential of rPanglaoDB for collecting rare cell types by integrating multiple public datasets, we present a biological application collecting and characterizing a set of 157 fibrocytes. Fibrocytes are a rare monocyte-derived cell type, that exhibits both the inflammatory features of macrophages and the tissue remodeling properties of fibroblasts. This constitutes the first fibrocytes' unbiased transcriptome profile report. We compared the transcriptomic profile of the fibrocytes against the fibroblasts collected from the same tissue samples and confirm their associated relationship with healing processes in tissue damage and infection through the activation of the prostaglandin biosynthesis and regulation pathway. Availability and Implementation: rPanglaoDB is implemented as an R package available through the CRAN repositories https://CRAN.R-project.org/package=rPanglaoDB.

Download Full-text

FR-Match: Robust matching of cell type clusters from single cell RNA sequencing data using the Friedman-Rafsky non-parametric test

10.1101/2020.05.01.073445 ◽

2020 ◽

Author(s):

Yun Zhang ◽

Brian D. Aevermann ◽

Trygve E. Bakken ◽

Jeremy A. Miller ◽

Rebecca D. Hodge ◽

...

Keyword(s):

Human Brain ◽

Single Cell ◽

Rna Sequencing ◽

Cell Types ◽

R Package ◽

Brain Regions ◽

Cortical Layer ◽

Middle Temporal Gyrus ◽

Cell Type ◽

Sequencing Data

AbstractSingle cell/nucleus RNA sequencing (scRNAseq) is emerging as an essential tool to unravel the phenotypic heterogeneity of cells in complex biological systems. While computational methods for scRNAseq cell type clustering have advanced, the ability to integrate datasets to identify common and novel cell types across experiments remains a challenge. Here, we introduce a cluster-to-cluster cell type matching method – FR-Match – that utilizes supervised feature selection for dimensionality reduction and incorporates shared information among cells to determine whether two cell type clusters share the same underlying multivariate gene expression distribution. FR-Match is benchmarked with existing cell-to-cell and cell-to-cluster cell type matching methods using both simulated and real scRNAseq data. FR-Match proved to be a stringent method that produced fewer erroneous matches of distinct cell subtypes and had the unique ability to identify novel cell phenotypes in new datasets. In silico validation demonstrated that the proposed workflow is the only self-contained algorithm that was robust to increasing numbers of true negatives (i.e. non-represented cell types). FR-Match was applied to two human brain scRNAseq datasets sampled from cortical layer 1 and full thickness middle temporal gyrus. When mapping cell types identified in specimens isolated from these overlapping human brain regions, FR-Match precisely recapitulated the laminar characteristics of matched cell type clusters, reflecting their distinct neuroanatomical distributions. An R package and Shiny application are provided at https://github.com/JCVenterInstitute/FRmatch for users to interactively explore and match scRNAseq cell type clusters with complementary visualization tools.

Download Full-text

chromstaR: Tracking combinatorial chromatin state dynamics in space and time

10.1101/038612 ◽

2016 ◽

Cited By ~ 11

Author(s):

Aaron Taudt ◽

Minh Anh Nguyen ◽

Matthias Heinig ◽

Frank Johannes ◽

Maria Colomé-Tatché

Keyword(s):

Temporal Dynamics ◽

Cell Types ◽

R Package ◽

Developmental Time ◽

Genomic Region ◽

Chromatin State ◽

Post Translational Modifications ◽

Chromatin States ◽

Genome Wide ◽

Experimental Treatments

AbstractBackgroundPost-translational modifications of histone residue tails are an important component of genome regulation. It is becoming increasingly clear that the combinatorial presence and absence of various modifications define discrete chromatin states which determine the functional properties of a locus. An emerging experimental goal is to track changes in chromatin state maps across different conditions, such as experimental treatments, cell-types or developmental time points.ResultsHere we present chromstaR, an algorithm for the computational inference of combinatorial chromatin state dynamics across an arbitrary number of conditions. ChromstaR uses a multivariate Hidden Markov Model to determine the number of discrete combinatorial chromatin states using multiple ChIP-seq experiments as input and assigns every genomic region to a state based on the presence/absence of each modification in every condition. We demonstrate the advantages of chromstaR in the context of three common experimental data scenarios. First, we study how different histone modifications combine to form combinatorial chromatin states in a single tissue. Second, we infer genome-wide patterns of combinatorial state differences between two cell types or conditions. Finally, we study the dynamics of combinatorial chromatin states during tissue differentiation involving up to six differentiation points. Our findings reveal a striking sparcity in the combinatorial organization and temporal dynamics of chromatin state maps.ConclusionschromstaR is a versatile computational tool that facilitates a deeper biological understanding of chromatin organization and dynamics. The algorithm is implemented as an R-package and freely available from http://bioconductor.org/packages/chromstaR/.

Download Full-text

4C-ker: A method to reproducibly identify genome-wide interactions captured by 4C-Seq experiments

10.1101/030569 ◽

2015 ◽

Author(s):

Ramya Raviram ◽

Pedro P. Rocha ◽

Christian L. Müller ◽

Emily R. Miraldi ◽

Sana Badri ◽

...

Keyword(s):

Restriction Enzymes ◽

Cell Types ◽

Single Locus ◽

Spatial Proximity ◽

Experimental Conditions ◽

Entire Genome ◽

Chromosome Conformation ◽

Genome Wide ◽

The Many ◽

Close Spatial Proximity

ABSTRACT4C-Seq has proven to be a powerful technique to identify genome-wide interactions with a single locus of interest (or “bait”) that can be important for gene regulation. However, analysis of 4C-Seq data is complicated by the many biases inherent to the technique. An important consideration when dealing with 4C-Seq data is the differences in resolution of signal across the genome that result from differences in 3D distance separation from the bait. This leads to the highest signal in the region immediately surrounding the bait and increasingly lower signals in far-cis and trans. Another important aspect of 4C-Seq experiments is the resolution, which is greatly influenced by the choice of restriction enzyme and the frequency at which it can cut the genome. Thus, it is important that a 4C-Seq analysis method is flexible enough to analyze data generated using different enzymes and to identify interactions across the entire genome. Current methods for 4C-Seq analysis only identify interactions in regions near the bait or in regions located in far-cis and trans, but no method comprehensively analyzes 4C signals of different length scales. In addition, some methods also fail in experiments where chromatin fragments are generated using frequent cutter restriction enzymes. Here, we describe 4C-ker, a Hidden-Markov Model based pipeline that identifies regions throughout the genome that interact with the 4C bait locus. In addition we incorporate methods for the identification of differential interactions in multiple 4C-seq datasets collected from different genotypes or experimental conditions. Adaptive window sizes are used to correct for differences in signal coverage in near-bait regions, far-cis and trans chromosomes. Using several datasets, we demonstrate that 4C-ker outperforms all existing 4C-Seq pipelines in its ability to reproducibly identify interaction domains at all genomic ranges with different resolution enzymes.AUTHORS SUMMARYCircularized chromosome conformation capture, or 4C-Seq is a technique developed to identify regions of the genome that are in close spatial proximity to a single locus of interest (‘bait’). This technique is used to detect regulatory interactions between promoters and enhancers and to characterize the nuclear environment of different regions within and across different cell types. So far, existing methods for 4C-Seq data analysis do not comprehensively identify interactions across the entire genome due to biases in the technique that are related to the decrease in 4C signal that results from increased 3D distance from the bait. To compensate for these weaknesses in existing methods we developed 4C-ker, a method that explicitly models these biases to improve the analysis of 4C-Seq to better understand the genome wide interaction profile of an individual locus.

Download Full-text

FR-Match: robust matching of cell type clusters from single cell RNA sequencing data using the Friedman–Rafsky non-parametric test

Briefings in Bioinformatics ◽

10.1093/bib/bbaa339 ◽

2020 ◽

Author(s):

Yun Zhang ◽

Brian D Aevermann ◽

Trygve E Bakken ◽

Jeremy A Miller ◽

Rebecca D Hodge ◽

...

Keyword(s):

Human Brain ◽

Single Cell ◽

Rna Sequencing ◽

Cell Types ◽

R Package ◽

Brain Regions ◽

Cortical Layer ◽

Middle Temporal Gyrus ◽

Cell Type ◽

Sequencing Data

Abstract Single cell/nucleus RNA sequencing (scRNAseq) is emerging as an essential tool to unravel the phenotypic heterogeneity of cells in complex biological systems. While computational methods for scRNAseq cell type clustering have advanced, the ability to integrate datasets to identify common and novel cell types across experiments remains a challenge. Here, we introduce a cluster-to-cluster cell type matching method—FR-Match—that utilizes supervised feature selection for dimensionality reduction and incorporates shared information among cells to determine whether two cell type clusters share the same underlying multivariate gene expression distribution. FR-Match is benchmarked with existing cell-to-cell and cell-to-cluster cell type matching methods using both simulated and real scRNAseq data. FR-Match proved to be a stringent method that produced fewer erroneous matches of distinct cell subtypes and had the unique ability to identify novel cell phenotypes in new datasets. In silico validation demonstrated that the proposed workflow is the only self-contained algorithm that was robust to increasing numbers of true negatives (i.e. non-represented cell types). FR-Match was applied to two human brain scRNAseq datasets sampled from cortical layer 1 and full thickness middle temporal gyrus. When mapping cell types identified in specimens isolated from these overlapping human brain regions, FR-Match precisely recapitulated the laminar characteristics of matched cell type clusters, reflecting their distinct neuroanatomical distributions. An R package and Shiny application are provided at https://github.com/JCVenterInstitute/FRmatch for users to interactively explore and match scRNAseq cell type clusters with complementary visualization tools.

Download Full-text

scClassifR: Framework to accurately classify cell types in single-cell RNA-sequencing data

10.1101/2020.12.22.424025 ◽

2020 ◽

Author(s):

Vy Nguyen ◽

Johannes Griss

Keyword(s):

Sensitivity And Specificity ◽

Unique Feature ◽

Cell Types ◽

R Package ◽

Specific Cell ◽

Cell Type ◽

Sequencing Data ◽

Single Cell Rna Sequencing ◽

Additional Cell ◽

Complete Framework

AbstractMotivationAutomatic cell type identification in scRNA-seq datasets is an essential method to alleviate a key bottleneck in scRNA-seq data analysis. While most existing tools show good sensitivity and specificity in classifying cell types, they often fail to adequately not-classify cells that are not present in the used reference.ResultsscClassifR is a novel R package that provides a complete framework to automatically classify cells in scRNA-seq datasets. It supports both Seurat and Bioconductor’s SingleCellExperiment and is thereby compatible with the vast majority of R-based analysis workflows. scClassifR uses hierarchically organised SVMs to distinguish a specific cell type versus all others. It shows comparable or even superior sensitivity and specificity compared to existing tools while being robust in not-classifying unknown cell types. As a unique feature, it reports ambiguous cell assignments, including the respective probabilities. Finally, scClassifR provides dedicated functions to train and evaluate classifiers for additional cell types.Availability and ImplementationscClassifR is freely available on GitHub (https://github.com/grisslab/scClassifR).

Download Full-text

geneBasis: an iterative approach for unsupervised selection of targeted gene panels from scRNA-seq.

10.1101/2021.08.10.455720 ◽

2021 ◽

Author(s):

Alsu Missarova ◽

Jaison Jain ◽

Andrew Butler ◽

Shila Ghazanfar ◽

Tim Stuart ◽

...

Keyword(s):

Practical Importance ◽

Cell Types ◽

R Package ◽

Gene Panel ◽

Limiting Factor ◽

Iterative Approach ◽

Sequencing Data ◽

Number Of Genes ◽

Gene Panels ◽

Selection Of

The problem of selecting targeted gene panels that capture maximum variability encoded in scRNA-sequencing data has become of great practical importance. scRNA-seq datasets are increasingly being used to identify gene panels that can be probed using alternative molecular technologies, such as spatial transcriptomics. In this context, the number of genes that can be probed is an important limiting factor, so choosing the best subset of genes is vital. Existing methods for this task are limited by either a reliance on pre-existing cell type labels or by difficulties in identifying markers of rare cell types. We resolve this by introducing an iterative approach, geneBasis, for selecting an optimal gene panel, where each newly added gene captures the maximum distance between the true manifold and the manifold constructed using the currently selected gene panel. We demonstrate, using a variety of metrics and diverse datasets, that our approach outperforms existing strategies, and can not only resolve cell types but also more subtle cell state differences. Our approach is available as an open source, easy-to-use, documented R package (https://github.com/MarioniLab/geneBasisR).

Download Full-text

Combining multiple tools outperforms individual methods in gene set enrichment analyses

10.1101/042580 ◽

2016 ◽

Cited By ~ 1

Author(s):

Monther Alhamdoosh ◽

Milica Ng ◽

Nicholas J. Wilson ◽

Julie M. Sheridan ◽

Huy Huynh ◽

...

Keyword(s):

Simulated Data ◽

R Package ◽

Sequencing Data ◽

Gene Set Enrichment ◽

Experimental Conditions ◽

Data Set ◽

Gene Set ◽

Link Type ◽

Analysis Methods ◽

Gene Sets

AbstractGene set enrichment (GSE) analysis allows researchers to efficiently extract biological insight from long lists of differentially expressed genes by interrogating them at a systems level. In recent years, there has been a proliferation of GSE analysis methods and hence it has become increasingly difficult for researchers to select an optimal GSE tool based on their particular data set. Moreover, the majority of GSE analysis methods do not allow researchers to simultaneously compare gene set level results between multiple experimental conditions.Results: The ensemble of genes set enrichment analyses (EGSEA) is a method developed for RNA-sequencing data that combines results from twelve algorithms and calculates collective gene set scores to improve the biological relevance of the highest ranked gene sets. redEGSEA’s gene set database contains around 25,000 gene sets from sixteen collections. It has multiple visualization capabilities that allow researchers to view gene sets at various levels of granularity. EGSEA has been tested on simulated data and on a number of human and mouse data sets and, based on biologists' feedback, consistently outperforms the individual tools that have been combined. Our evaluation demonstrates the superiority of the ensemble approach for GSE analysis, and its utility to effectively and efficiently extrapolate biological functions and potential involvement in disease processes from lists of differentially regulated genes.Availability and Implementation: EGSEA is available as an R package at http://www.bioconductor.org/packages/EGSEA/. The gene sets collections are available in the R package EGSEAdata from http://www.bioconductor.org/packages/EGSEA/.

Download Full-text

In-depth transcriptomic analysis of human retina reveals molecular mechanisms underlying diabetic retinopathy

Scientific Reports ◽

10.1038/s41598-021-88698-3 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Kolja Becker ◽

Holger Klein ◽

Eric Simon ◽

Coralie Viollet ◽

Christian Haslinger ◽

...

Keyword(s):

Diabetic Retinopathy ◽

Rna Sequencing ◽

Molecular Mechanisms ◽

Vision Loss ◽

Ganglion Cells ◽

Expression Profiles ◽

Cell Types ◽

Sequencing Data ◽

Disease Stages ◽

Post Transcriptional Regulation

AbstractDiabetic Retinopathy (DR) is among the major global causes for vision loss. With the rise in diabetes prevalence, an increase in DR incidence is expected. Current understanding of both the molecular etiology and pathways involved in the initiation and progression of DR is limited. Via RNA-Sequencing, we analyzed mRNA and miRNA expression profiles of 80 human post-mortem retinal samples from 43 patients diagnosed with various stages of DR. We found differentially expressed transcripts to be predominantly associated with late stage DR and pathways such as hippo and gap junction signaling. A multivariate regression model identified transcripts with progressive changes throughout disease stages, which in turn displayed significant overlap with sphingolipid and cGMP–PKG signaling. Combined analysis of miRNA and mRNA expression further uncovered disease-relevant miRNA/mRNA associations as potential mechanisms of post-transcriptional regulation. Finally, integrating human retinal single cell RNA-Sequencing data revealed a continuous loss of retinal ganglion cells, and Müller cell mediated changes in histidine and β-alanine signaling. While previously considered primarily a vascular disease, attention in DR has shifted to additional mechanisms and cell-types. Our findings offer an unprecedented and unbiased insight into molecular pathways and cell-specific changes in the development of DR, and provide potential avenues for future therapeutic intervention.

Download Full-text

scSorter: assigning cells to known cell types according to marker genes

Genome Biology ◽

10.1186/s13059-021-02281-7 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Hongyu Guo ◽

Jun Li

Keyword(s):

Real Data ◽

Cell Types ◽

Exact Expression ◽

Marker Genes ◽

Specific Marker ◽

Sequencing Data ◽

Reference Dataset ◽

Over Expression ◽

Higher Power ◽

Cell Type Specific

AbstractOn single-cell RNA-sequencing data, we consider the problem of assigning cells to known cell types, assuming that the identities of cell-type-specific marker genes are given but their exact expression levels are unavailable, that is, without using a reference dataset. Based on an observation that the expected over-expression of marker genes is often absent in a nonnegligible proportion of cells, we develop a method called scSorter. scSorter allows marker genes to express at a low level and borrows information from the expression of non-marker genes. On both simulated and real data, scSorter shows much higher power compared to existing methods.

Download Full-text