CAGEd-oPOSSUM: motif enrichment analysis from CAGE-derived TSSs

Mapping Intimacies ◽

10.1101/040667 ◽

2016 ◽

Cited By ~ 1

Author(s):

David J. Arenillas ◽

Alistair R.R. Forrest ◽

Hideya Kawaji ◽

Timo Lassman ◽

Wyeth W. Wasserman ◽

...

Keyword(s):

Large Scale ◽

Enrichment Analysis ◽

Cell Types ◽

Specific Cell ◽

Data Sets ◽

Transcription Start Sites ◽

Supplementary Material ◽

Supplementary Text ◽

Cap Analysis ◽

Genomic Regions

AbstractSummaryWith the emergence of large-scale Cap Analysis of Gene Expression (CAGE) data sets from individual labs and the FANTOM consortium, one can now analyze the cis-regulatory regions associated with gene transcription at an unprecedented level of refinement. By coupling transcription factor binding site (TFBS) enrichment analysis with CAGE-derived genomic regions, CAGEd-oPOSSUM can identify TFs that act as key regulators of genes involved in specific mammalian cell and tissue types. The webtool allows for the analysis of CAGE-derived transcription start sites (TSSs) either provided by the user or selected from ~1,300 mammalian samples from the FANTOM5 project with pre-computed TFBS predicted with JASPAR TF binding profiles. The tool helps power insights into the regulation of genes through the study of the specific usage of TSSs within specific cell types and/or under specific conditions.Availability and implementationThe CAGEd-oPOSUM web tool is implemented in Perl, MySQL, and Apache and is available at http://cagedop.cmmt.ubc.ca/CAGEd_oPOSSUM.Supporting InformationSupplementary Text, Figures, and Data are available online at bioRxiv.

Download Full-text

SCISSOR™: a single-cell inferred site-specific omics resource for tumor microenvironment association study

NAR Cancer ◽

10.1093/narcan/zcab037 ◽

2021 ◽

Vol 3 (3) ◽

Author(s):

Xiang Cui ◽

Fei Qin ◽

Xuanxuan Yu ◽

Feifei Xiao ◽

Guoshuai Cai

Keyword(s):

Tumor Microenvironment ◽

Single Cell ◽

Clinical Outcomes ◽

Large Scale ◽

Cell Types ◽

Cell Interaction ◽

Specific Cell ◽

Dynamic Visualization ◽

Tissue Specific ◽

Cell Composition

Abstract Tumor tissues are heterogeneous with different cell types in tumor microenvironment, which play an important role in tumorigenesis and tumor progression. Several computational algorithms and tools have been developed to infer the cell composition from bulk transcriptome profiles. However, they ignore the tissue specificity and thus a new resource for tissue-specific cell transcriptomic reference is needed for inferring cell composition in tumor microenvironment and exploring their association with clinical outcomes and tumor omics. In this study, we developed SCISSOR™ (https://thecailab.com/scissor/), an online open resource to fulfill that demand by integrating five orthogonal omics data of >6031 large-scale bulk samples, patient clinical outcomes and 451 917 high-granularity tissue-specific single-cell transcriptomic profiles of 16 cancer types. SCISSOR™ provides five major analysis modules that enable flexible modeling with adjustable parameters and dynamic visualization approaches. SCISSOR™ is valuable as a new resource for promoting tumor heterogeneity and tumor–tumor microenvironment cell interaction research, by delineating cells in the tissue-specific tumor microenvironment and characterizing their associations with tumor omics and clinical outcomes.

Download Full-text

tmod: an R package for general and multivariate enrichment analysis

10.7287/peerj.preprints.2420 ◽

2016 ◽

Cited By ~ 3

Author(s):

January Weiner 3rd ◽

Teresa Domaszewska

Keyword(s):

Statistical Methods ◽

Statistical Tests ◽

Disease Process ◽

Enrichment Analysis ◽

R Package ◽

Multivariate Techniques ◽

Specific Cell ◽

Data Sets ◽

Complex Data ◽

Feature Sets

“Omics” studies generate long lists of genes, proteins, metabolites or other features which can be difficult to decipher. Feature set enrichment analysis utilizing annotated groups/classes of features (such as pathways, gene ontology terms or gene/metabolic modules) can provide a powerful gateway to associate data to phenotypes such as disease process or treatment progression. At the same time, the increasing use of technologies to generate multidimensional omics data sets based on specific cell types or responses to stimuli increases the number and breadth of annotated feature sets available for enrichment analysis, facilitating the ability to draw biologically relevant conclusions. However, existing tools and applications for enrichment analysis are adapted specifically to gene set enrichment and lack functionalities to analyze rapidly growing amounts of metabolomics and other data. Moreover, such tools often provide only a limited range of statistical methods, rely on permutation tests, lack suitable visualization tools to facilitate result interpretation in complex experimental setups, and lack standalone versions usable in semi-automatized workflows. Here, we present tmod, an R package which implements powerful statistical methods for enrichment analysis. Tmod includes definitions of widely used feature sets for transcriptomic and metabolomic profiling and also allows use of custom user-provided feature sets. Moreover, it provides novel and intuitive visualiza- tion methods which facilitate interpretation of complex data sets. The implemented statistical tests allow the significance of enrichment within sorted feature lists to be calculated without randomization tests and thus are suitable for combining functional analysis with multivariate techniques.

Download Full-text

Methods for Automatic Reference Trees and Multilevel Phylogenetic Placement

10.1101/299792 ◽

2018 ◽

Cited By ~ 1

Author(s):

Lucas Czech ◽

Alexandros Stamatakis

Keyword(s):

Large Scale ◽

Sequence Data ◽

Sequence Similarity ◽

Computational Effort ◽

Supplementary Information ◽

Data Sets ◽

Metagenomic Sequencing ◽

Sequencing Studies ◽

Manual Selection ◽

Supplementary Material

AbstractMotivationIn most metagenomic sequencing studies, the initial analysis step consists in assessing the evolutionary provenance of the sequences. Phylogenetic (or Evolutionary) Placement methods can be employed to determine the evolutionary position of sequences with respect to a given reference phylogeny. These placement methods do however face certain limitations: The manual selection of reference sequences is labor-intensive; the computational effort to infer reference phylogenies is substantially larger than for methods that rely on sequence similarity; the number of taxa in the reference phylogeny should be small enough to allow for visually inspecting the results.ResultsWe present algorithms to overcome the above limitations. First, we introduce a method to automatically construct representative sequences from databases to infer reference phylogenies. Second, we present an approach for conducting large-scale phylogenetic placements on nested phylogenies. Third, we describe a preprocessing pipeline that allows for handling huge sequence data sets. Our experiments on empirical data show that our methods substantially accelerate the workflow and yield highly accurate placement results.ImplementationFreely available under GPLv3 at http://github.com/lczech/[email protected] InformationSupplementary data are available at Bioinformatics online.

Download Full-text

Bulk and Single-Cell Transcriptomics Identify Tobacco-Use Disparity in Lung Gene Expression of ACE2, the Receptor of 2019-nCov

10.20944/preprints202002.0051.v2 ◽

2020 ◽

Cited By ~ 6

Author(s):

Guoshuai Cai

Keyword(s):

Gene Expression ◽

Single Cell ◽

Large Scale ◽

Cell Types ◽

Smoking History ◽

Normal Lung ◽

Specific Cell ◽

Susceptible Population ◽

Former Smokers ◽

Ace2 Gene

In current severe global emergency situation of 2019-nCov outbreak, it is imperative to identify vulnerable and susceptible groups for effective protection and care. Recently, studies found that 2019-nCov and SARS-nCov share the same receptor, ACE2. In this study, we analyzed four large-scale bulk transcriptomic datasets of normal lung tissue and two single-cell transcriptomic datasets to investigate the disparities related to race, age, gender and smoking status in ACE2 gene expression and its distribution among cell types. We didn’t find significant disparities in ACE2 gene expression between racial groups (Asian vs Caucasian), age groups (>60 vs <60) or gender groups (male vs female). However, we observed significantly higher ACE2 gene expression in former smoker’s lung compared to non-smoker’s lung. Also, we found higher ACE2 gene expression in Asian current smokers compared to non-smokers but not in Caucasian current smokers, which may indicate an existence of gene-smoking interaction. In addition, we found that ACE2 gene is expressed in specific cell types related to smoking history and location. In bronchial epithelium, ACE2 is actively expressed in goblet cells of current smokers and club cells of non-smokers. In alveoli, ACE2 is actively expressed in remodelled AT2 cells of former smokers. Together, this study indicates that smokers especially former smokers may be more susceptible to 2019-nCov and have infection paths different with non-smokers. Thus, smoking history may provide valuable information in identifying susceptible population and standardizing treatment regimen.

Download Full-text

Influence of N6-Methyladenosine Modification Gene HNRNPC on Cell Phenotype in Parkinson’s Disease

Parkinson s Disease ◽

10.1155/2021/9919129 ◽

2021 ◽

Vol 2021 ◽

pp. 1-10

Author(s):

Wei Quan ◽

Jia Li ◽

Li Liu ◽

Qinghui Zhang ◽

Yidan Qin ◽

...

Keyword(s):

Parkinson’S Disease ◽

Parkinson's Disease ◽

Pc12 Cells ◽

Enrichment Analysis ◽

Cell Types ◽

Inflammatory Factors ◽

Cell Phenotype ◽

Data Sets ◽

And Function ◽

Experimental Group

This study aimed to explore the N6-methyladenosine (m6A) modification genes involved in the pathogenesis of Parkinson’s disease (PD) through data analysis of the two data sets GSE120306 and GSE22491 in the GEO database and further explore its influence on cell phenotype in PD. We analyzed the differentially expressed genes and function enrichment analysis of the two sets of data and found that the expression of the m6A-modification gene HNRNPC was significantly downregulated in the PD group, and it played an important role in DNA metabolism, RNA metabolism, and RNA processing and may be involved in PD. Then, we constructed the HNRNPC differential expression cell line to study the role of this gene in the pathogenesis of PD. The results showed that overexpression of HNRNPC can promote the proliferation of PC12 cells, inhibit their apoptosis, and inhibit the expression of inflammatory factors IFN-β, IL-6, and TNF-α, suggesting that HNRNPC may cause PD by inhibiting the proliferation of dopaminergic nerve cells, promoting their apoptosis, and causing immune inflammation. Our study also has certain limitations. For example, the data of the experimental group and the validation group come from different cell types, and the data of the experimental group involve individuals with G2019S LRRK2 mutations. In addition, due to the low expression of HNRNPC in PC12 cells, we used the method of overexpressing this gene to study its function. All these factors may cause our conclusions to be biased. Therefore, more research is still needed to corroborate it in the future.

Download Full-text

Altered cell and RNA isoform diversity in aging Down syndrome brains

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.2114326118 ◽

2021 ◽

Vol 118 (47) ◽

pp. e2114326118

Author(s):

Carter R. Palmer ◽

Christine S. Liu ◽

William J. Romanow ◽

Ming-Hsiang Lee ◽

Jerold Chun

Keyword(s):

Down Syndrome ◽

Large Scale ◽

Cell Types ◽

Chromosome 21 ◽

Specific Cell ◽

Sequencing Technologies ◽

Isoform Diversity ◽

Long Read ◽

Single Nucleus ◽

Altered Cell

Down syndrome (DS), trisomy of human chromosome 21 (HSA21), is characterized by lifelong cognitive impairments and the development of the neuropathological hallmarks of Alzheimer’s disease (AD). The cellular and molecular modifications responsible for these effects are not understood. Here we performed single-nucleus RNA sequencing (snRNA-seq) employing both short- (Illumina) and long-read (Pacific Biosciences) sequencing technologies on a total of 29 DS and non-DS control prefrontal cortex samples. In DS, the ratio of inhibitory-to-excitatory neurons was significantly increased, which was not observed in previous reports examining sporadic AD. DS microglial transcriptomes displayed AD-related aging and activation signatures in advance of AD neuropathology, with increased microglial expression of C1q complement genes (associated with dendritic pruning) and the HSA21 transcription factor gene RUNX1. Long-read sequencing detected vast RNA isoform diversity within and among specific cell types, including numerous sequences that differed between DS and control brains. Notably, over 8,000 genes produced RNAs containing intra-exonic junctions, including amyloid precursor protein (APP) that had previously been associated with somatic gene recombination. These and related results illuminate large-scale cellular and transcriptomic alterations as features of the aging DS brain.

Download Full-text

Large-scale chromatin organization of the major histocompatibility complex and other regions of human chromosome 6 and its response to interferon in interphase nuclei

Journal of Cell Science ◽

10.1242/jcs.113.9.1565 ◽

2000 ◽

Vol 113 (9) ◽

pp. 1565-1576 ◽

Cited By ~ 12

Author(s):

E.V. Volpi ◽

E. Chevret ◽

T. Jones ◽

R. Vatcheva ◽

J. Williamson ◽

...

Keyword(s):

Major Histocompatibility Complex ◽

Large Scale ◽

Gene Clusters ◽

Cell Types ◽

Chromatin Organization ◽

Chromosome 6 ◽

Chromatin Loop ◽

Major Histocompatibility ◽

Histocompatibility Complex ◽

Genomic Regions

The large-scale chromatin organization of the major histocompatibility complex and other regions of chromosome 6 was studied by three-dimensional image analysis in human cell types with major differences in transcriptional activity. Entire gene clusters were visualized by fluorescence in situ hybridization with multiple locus-specific probes. Individual genomic regions showed distinct configurations in relation to the chromosome 6 terrritory. Large chromatin loops containing several megabases of DNA were observed extending outwards from the surface of the domain defined by the specific chromosome 6 paint. The frequency with which a genomic region was observed on an external chromatin loop was cell type dependent and appeared to be related to the number of active genes in that region. Transcriptional up-regulation of genes in the major histocompatibility complex by interferon-gamma led to an increase in the frequency with which this large gene cluster was found on an external chromatin loop. Our data are consistent with an association between large-scale chromatin organization of specific genomic regions and their transcriptional status.

Download Full-text

Building an RNA Sequencing Transcriptome of the Central Nervous System

The Neuroscientist ◽

10.1177/1073858415610541 ◽

2016 ◽

Vol 22 (6) ◽

pp. 579-592 ◽

Cited By ~ 12

Author(s):

Xiaomin Dong ◽

Yanan You ◽

Jia Qian Wu

Keyword(s):

Gene Expression ◽

Central Nervous System ◽

Nervous System ◽

Rna Sequencing ◽

Large Scale ◽

Expression Profiles ◽

Cell Types ◽

Specific Cell ◽

Rna Seq ◽

The Central Nervous System

The composition and function of the central nervous system (CNS) is extremely complex. In addition to hundreds of subtypes of neurons, other cell types, including glia (astrocytes, oligodendrocytes, and microglia) and vascular cells (endothelial cells and pericytes) also play important roles in CNS function. Such heterogeneity makes the study of gene transcription in CNS challenging. Transcriptomic studies, namely the analyses of the expression levels and structures of all genes, are essential for interpreting the functional elements and understanding the molecular constituents of the CNS. Microarray has been a predominant method for large-scale gene expression profiling in the past. However, RNA-sequencing (RNA-Seq) technology developed in recent years has many advantages over microarrays, and has enabled building more quantitative, accurate, and comprehensive transcriptomes of the CNS and other systems. The discovery of novel genes, diverse alternative splicing events, and noncoding RNAs has remarkably expanded the complexity of gene expression profiles and will help us to understand intricate neural circuits. Here, we discuss the procedures and advantages of RNA-Seq technology in mammalian CNS transcriptome construction, and review the approaches of sample collection as well as recent progress in building RNA-Seq-based transcriptomes from tissue samples and specific cell types.

Download Full-text

Biological data annotation via a human-augmenting AI-based labeling system

npj Digital Medicine ◽

10.1038/s41746-021-00520-6 ◽

2021 ◽

Vol 4 (1) ◽

Author(s):

Douwe van der Wal ◽

Iny Jhun ◽

Israa Laklouk ◽

Jeff Nirschl ◽

Lara Richer ◽

...

Keyword(s):

Deep Learning ◽

Large Scale ◽

Microscopic Analysis ◽

Image Data ◽

Cell Types ◽

Biological Data ◽

Data Sets ◽

Data Set ◽

Data Annotation ◽

Labeling System

AbstractBiology has become a prime area for the deployment of deep learning and artificial intelligence (AI), enabled largely by the massive data sets that the field can generate. Key to most AI tasks is the availability of a sufficiently large, labeled data set with which to train AI models. In the context of microscopy, it is easy to generate image data sets containing millions of cells and structures. However, it is challenging to obtain large-scale high-quality annotations for AI models. Here, we present HALS (Human-Augmenting Labeling System), a human-in-the-loop data labeling AI, which begins uninitialized and learns annotations from a human, in real-time. Using a multi-part AI composed of three deep learning models, HALS learns from just a few examples and immediately decreases the workload of the annotator, while increasing the quality of their annotations. Using a highly repetitive use-case—annotating cell types—and running experiments with seven pathologists—experts at the microscopic analysis of biological specimens—we demonstrate a manual work reduction of 90.60%, and an average data-quality boost of 4.34%, measured across four use-cases and two tissue stain types.

Download Full-text

An Unsupervised Graph Embeddings Approach to Multiplex Immunofluorescence Image Exploration

10.1101/2021.06.09.447654 ◽

2021 ◽

Author(s):

Christopher Innocenti ◽

Zhenning Zhang ◽

Balaji Selvaraj ◽

Isabelle Gaffney ◽

Michalis Frangos ◽

...

Keyword(s):

Cell Types ◽

Graph Representation ◽

High Dimensionality ◽

Specific Cell ◽

Data Sets ◽

Complex Interactions ◽

Exploration Tool ◽

The Right ◽

Image Exploration ◽

User Friendly

Understanding the complex biology of the tumor microenvironment (TME) is necessary to understand the mechanisms of action of immuno-oncology therapies and to match the right therapies to the right patients. Multiplex immunofluorescence (mIF) is a useful technology that has tremendous potential to further our understanding of cancer patho-biology; however, tools that fully leverage the high dimensionality of this data are still in their infancy. We describe here a novel deep learning pipeline aimed to allow Graph-based Inspection of Tissues via Embeddings, GraphITE. GraphITE transforms mIF data into a graph representation, where unsupervised learning algorithms can be utilised to generate embeddings representing cellular `neighbourhoods'. The embeddings can be downprojected and explored for clustering analysis, and patterns can be mapped back to the image as well as interrogated for phenotypical, morphological, or structural distinctiveness. GraphITE supports the extraction of information not only on the phenotypes of individual cells or the relationships between specific cell types, but is able to characterize cell neighborhoods to look for more complex interactions, thereby allowing pathologists and data scientists to explore mIF data sets, uncovering patterns that are otherwise obscured by the high-dimensionality of the data. In this work, we showcase the current setup of the system, going from raw input data all the way to a user friendly exploration tool. Using this tool, we show how the data can be navigated in a way previously not possible.

Download Full-text