scholarly journals Exploratory bioinformatics analysis reveals importance of “junk” DNA in early embryo development

2016 ◽  
Author(s):  
Steven Xijin Ge

AbstractBackgroundInstead of testing predefined hypotheses, the goal of exploratory data analysis (EDA) is to find what data can tell us. Following this strategy, we re-analyzed a large body of genomic data to investigate how the early mouse embryos develop from fertilized eggs through a complex, poorly understood process.ResultsStarting with a single-cell RNA-seq dataset of 259 mouse embryonic cells from zygote to blastocyst stages, we reconstructed the temporal and spatial dynamics of gene expression. Our analyses revealed similarities in the expression patterns of regular genes and those of retrotransposons, and the enrichment of transposable elements in the promoters of corresponding genes. Long Terminal Repeats (LTRs) are associated with transient, strong induction of many nearby genes at the 2-4 cell stages, probably by providing binding sites for Obox and other homeobox factors. The presence of B1 and B2 SINEs (Short Interspersed Nuclear Elements) in promoters is highly correlated with broad upregulation of intracellular genes in a dosage-and distance-dependent manner. Such enhancer-like effects are also found for human Alu and bovine tRNA SINEs. Promoters for genes specifically expressed in embryonic stem cells (ESCs) are rich in B1 and B2 SINEs, but low in CpG islands.ConclusionsOur results provide evidence that transposable elements may play a significant role in establishing the expression landscape in early embryos and stem cells. This study also demonstrates that open-ended, exploratory analysis aimed at a broad understanding of a complex process can pinpoint specific mechanisms for further study.Major findingSingle-cell RNA-seq data enables estimation of retrotransposon expression during PDSimilar expression dynamics of retrotransposons and regular genes during PDLong terminal repeats may be essential for the 1st wave of gene expressionObox homeobox factors are possible regulators of PD, upstream of Zscan4SINE repeats predict expression of nearby genes in murine, human and bovine embryosExploratory analysis of large single-cell data pinpoints developmental pathways

Blood ◽  
2015 ◽  
Vol 126 (23) ◽  
pp. 4101-4101
Author(s):  
Stephen S. Chung ◽  
Priyanka Vijay ◽  
Diana L. Stern ◽  
Deirdre O'Sullivan ◽  
Virginia M. Klimek ◽  
...  

Abstract The myelodysplastic syndromes (MDS) arise in and are maintained by hematopoietic stem cells (HSCs). Serial sampling of patients treated with DNA methyltransferase inhibitors (DNMTIs) and lenalidomide has demonstrated that disease HSCs (MDS HSCs) persist at significant levels even in patients achieving complete clinical and cytogenetic responses. As MDS HSCs are the functional unit of clonal selection both during therapy and subsequent disease progression, we hypothesized that the molecular heterogeneity of MDS HSCs may underlie therapeutic resistance. We therefore sought to perform single cell RNA-sequencing (RNA-seq) on MDS HSCs from patients with known responses to therapy, with the intention of identifying novel therapeutic vulnerabilities. To characterize MDS HSC heterogeneity, we FACS-purified HSCs (Lin-CD34+CD38-CD90+CD45RA-) from paired bone marrow (BM) specimens taken from four MDS patients before and after two to four 28-day cycles of the DNMTI decitabine, as well as two patients who were not treated due to stable disease, and two normal age matched controls. Specimens from both responding and non-responding patients were included. We captured and sequenced a total of 869 single cells from 14 samples, sequencing to an average depth of 4.8 million reads. In a subset of samples (n=7) we also performed bulk RNA-seq (average 1500 cells) for comparison. The sequencing data was of high quality, with an average of 80% mapped reads. We confirmed our ability to accurately quantify transcript levels using ERCC spike-in controls, observing a linear correlation between expected concentration and observed FPKM (fragments per kilobase per million). Single cell RNA-seq revealed vast intratumoral heterogeneity in MDS HSCs that was otherwise missed by bulk RNA-seq, as evidenced by the presence of transcripts variably expressed among cells from the same specimen (Fig. 1A). Despite this intratumoral heterogeneity, single cell transcriptomes were able to completely separate individual MDS patients using principal components analysis and hierarchical clustering, consistent with the known heterogeneity of MDS. MDS HSCs further clustered separately from normal age-matched HSCs, with the top 10% of genes contributing to this separation enriched for Gene Ontology (GO) categories including pathways implicated in MDS biology such as "mRNA splicing," "nonsense mediated decay," and "P53 mediated DNA Damage Response" (all P<1e-9). Unsupervised hierarchical clustering of all pre-treatment MDS HSCs revealed clustering of cells from responders separately from non-responders (Fig. 1B). Differential gene expression analysis identified a cluster of genes (FDR<0.01) enriched for GO categories including "translational termination," "SRP dependent co-translational protein targeting to membrane," and "nonsense mediated decay" (all P<1e-9). Notably, this cluster included 60 ribosomal proteins, all of which were decreased in non-responders (t-test, P<1e-16), with responders demonstrating levels of expression closer to but still lower than normal controls (t-test, P<1e-3). Thus, defective ribosomal biogenesis, a hallmark of MDS pathogenesis, may also contribute to therapeutic resistance. Finally, within each sample we measured the spread of gene expression using dispersion (log[variance/mean]) within bins based on expression levels, defining variable genes as those with a dispersion >1.75 at a mean FPKM >2. The highest number of variable genes were in normal HSCs (mean=141), with the next highest in responders prior to treatment (mean=80), and the least number of variable genes in non-responders prior to treatment (mean=9.5). We speculate that the low number of variable genes in non-responders reflects a higher degree of clonal dominance. All post-treatment MDS HSCs demonstrated a relatively low number of variable genes (mean=25), suggesting that therapy induces clonal selection. In sum, our data illustrate the robustness of single cell RNA-seq to define the intrinsic variability of individual MDS HSCs, implicating perturbed ribosomal biogenesis and transcriptional variability as novel predictors of response to therapy. As we expand our data set with additional patients, we expect to identify additional pathways that mediate therapeutic response and resistance, as well as mutations and variably expressed genes that are selected for during therapy and drive disease progression. Figure 1. Figure 1. Disclosures No relevant conflicts of interest to declare.


2018 ◽  
Author(s):  
Abhishek K. Sarkar ◽  
Po-Yuan Tung ◽  
John D. Blischak ◽  
Jonathan E. Burnett ◽  
Yang I. Li ◽  
...  

AbstractQuantification of gene expression levels at the single cell level has revealed that gene expression can vary substantially even across a population of homogeneous cells. However, it is currently unclear what genomic features control variation in gene expression levels, and whether common genetic variants may impact gene expression variation. Here, we take a genome-wide approach to identify expression variance quantitative trait loci (vQTLs). To this end, we generated single cell RNA-seq (scRNA-seq) data from induced pluripotent stem cells (iPSCs) derived from 53 Yoruba individuals. We collected data for a median of 95 cells per individual and a total of 5,447 single cells, and identified 241 mean expression QTLs (eQTLs) at 10% FDR, of which 82% replicate in bulk RNA-seq data from the same individuals. We further identified 14 vQTLs at 10% FDR, but demonstrate that these can also be explained as effects on mean expression. Our study suggests that dispersion QTLs (dQTLs) which could alter the variance of expression independently of the mean can have larger fold changes, but explain less phenotypic variance than eQTLs. We estimate 424 individuals as a lower bound to achieve 80% power to detect the strongest dQTLs in iPSCs. These results will guide the design of future studies on understanding the genetic control of gene expression variance.Author summaryCommon genetic variation can alter the level of average gene expression in human tissues, and through changes in gene expression have downstream consequences on cell function, human development, and human disease. However, human tissues are composed of many cells, each with its own level of gene expression. With advances in single cell sequencing technologies, we can now go beyond simply measuring the average level of gene expression in a tissue sample and directly measure cell-to-cell variance in gene expression. We hypothesized that genetic variation could also alter gene expression variance, potentially revealing new insights into human development and disease. To test this hypothesis, we used single cell RNA sequencing to directly measure gene expression variance in multiple individuals, and then associated the gene expression variance with genetic variation in those same individuals. Our results suggest that effects on gene expression variance are smaller than effects on mean expression, relative to how much the phenotypes vary between individuals, and will require much larger studies than previously thought to detect.


2018 ◽  
Author(s):  
Daniel Alpern ◽  
Vincent Gardeux ◽  
Julie Russeil ◽  
Bart Deplancke

ABSTRACTGenome-wide gene expression analyses by RNA sequencing (RNA-seq) have quickly become a standard in molecular biology because of the widespread availability of high throughput sequencing technologies. While powerful, RNA-seq still has several limitations, including the time and cost of library preparation, which makes it difficult to profile many samples simultaneously. To deal with these constraints, the single-cell transcriptomics field has implemented the early multiplexing principle, making the library preparation of hundreds of samples (cells) markedly more affordable. However, the current standard methods for bulk transcriptomics (such as TruSeq Stranded mRNA) remain expensive, and relatively little effort has been invested to develop cheaper, but equally robust methods. Here, we present a novel approach, Bulk RNA Barcoding and sequencing (BRB-seq), that combines the multiplexing-driven cost-effectiveness of a single-cell RNA-seq workflow with the performance of a bulk RNA-seq procedure. BRB-seq produces 3’ enriched cDNA libraries that exhibit similar gene expression quantification to TruSeq and that maintain this quality, also in terms of number of detected differentially expressed genes, even with low quality RNA samples. We show that BRB-seq is about 25 times less expensive than TruSeq, enabling the generation of ready to sequence libraries for up to 192 samples in a day with only 2 hours of hands-on time. We conclude that BRB-seq constitutes a powerful alternative to TruSeq as a standard bulk RNA-seq approach. Moreover, we anticipate that this novel method will eventually replace RT-qPCR-based gene expression screens given its capacity to generate genome-wide transcriptomic data at a cost that is comparable to profiling 4 genes using RT-qPCR.‘SoftwareWe developed a suite of open source tools (BRB-seqTools) to aid with processing BRB-seq data and generating count matrices that are used for further analyses. This suite can perform demultiplexing, generate count/UMI matrices and trim BRB-seq constructs and is freely available at http://github.com/DeplanckeLab/BRB-seqToolsHighlightsRapid (~2h hands on time) and low-cost approach to perform transcriptomics on hundreds of RNA samplesStrand specificity preservedPerformance: number of detected genes is equal to Illumina TruSeq Stranded mRNA at same sequencing depthHigh capacity: low cost allows increasing the number of biological replicatesProduces reliable data even with low quality RNA samples (down to RIN value = 2)Complete user-friendly sequencing data pre-processing and analysis pipeline allowing result acquisition in a day


2020 ◽  
Author(s):  
Michael T. Shanahan ◽  
Matt Kanke ◽  
Ajeet P. Singh ◽  
Jonathan W. Villanueva ◽  
Adrian J. McNairn ◽  
...  

SummaryThe role of individual miRNAs in small intestinal (SI) epithelial homeostasis is under-explored. In this study, we discovered that miR-375 is among the most enriched miRNAs in intestinal crypts and stem cells (ISCs), especially facultative ISCs. We then showed by multiple manipulations, including CRISPR/Cas9 editing, that miR-375 is strongly suppressed by Wnt-signaling. Single-cell RNA-seq analysis of SI crypt-enriched cells from miR-375 knockout (375-KO) mice revealed elevated numbers of tuft cells and increased expression of pro-proliferative genes in ISCs. Accordingly, the genetic loss of miR-375 promoted resistance to helminth infection and enhanced the regenerative response to irradiation. The conserved effects of miR-375 were confirmed by gain-of-function studies in Drosophila midgut stem cells in vivo. Moreover, functional experiments in enteroids uncovered a regulatory relationship between miR-375 and Yap1 that controls cell survival. Finally, analysis of mouse model and clinical data revealed an inverse association between miR-375 levels and intestinal tumor development.HighlightsmiR-375 is one of the most enriched miRNAs in ISCs, especially facultative ISCs.miR-375 modifies tuft cell abundance and pro-proliferative gene expression in ISCs.Loss of miR-375 in mice enhances the host response to helminth infection and crypt regeneration.Mouse and human intestinal cancer are associated with reduced miR-375 expression.eTOC BlurbSethupathy and colleagues show that miR-375 is a Wnt-responsive, ISC-enriched miRNA that serves as a break on intestinal crypt proliferation. They also show that miR-375 modulates tuft cell abundance and pro-proliferative gene expression in ISCs, that miR-375 loss enhances the host response to helminth infection as well as crypt regeneration post-irradiation, and its reduced expression is associated with intestinal cancer.


2021 ◽  
Vol 2 (2) ◽  
pp. 100426
Author(s):  
Celia Alda-Catalinas ◽  
Melanie A. Eckersley-Maslin ◽  
Wolf Reik

PLoS ONE ◽  
2015 ◽  
Vol 10 (9) ◽  
pp. e0136199 ◽  
Author(s):  
Brian T. Freeman ◽  
Jangwook P. Jung ◽  
Brenda M. Ogle

2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Lars Velten ◽  
Benjamin A. Story ◽  
Pablo Hernández-Malmierca ◽  
Simon Raffel ◽  
Daniel R. Leonce ◽  
...  

AbstractCancer stem cells drive disease progression and relapse in many types of cancer. Despite this, a thorough characterization of these cells remains elusive and with it the ability to eradicate cancer at its source. In acute myeloid leukemia (AML), leukemic stem cells (LSCs) underlie mortality but are difficult to isolate due to their low abundance and high similarity to healthy hematopoietic stem cells (HSCs). Here, we demonstrate that LSCs, HSCs, and pre-leukemic stem cells can be identified and molecularly profiled by combining single-cell transcriptomics with lineage tracing using both nuclear and mitochondrial somatic variants. While mutational status discriminates between healthy and cancerous cells, gene expression distinguishes stem cells and progenitor cell populations. Our approach enables the identification of LSC-specific gene expression programs and the characterization of differentiation blocks induced by leukemic mutations. Taken together, we demonstrate the power of single-cell multi-omic approaches in characterizing cancer stem cells.


2021 ◽  
Vol 9 (Suppl 3) ◽  
pp. A86-A86
Author(s):  
Paul DePietro ◽  
Mary Nesline ◽  
Yong Hee Lee ◽  
RJ Seager ◽  
Erik Van Roey ◽  
...  

BackgroundImmune checkpoint inhibitor-based therapies have achieved impressive success in the treatment of several cancer types. Predictive immune biomarkers, including PD-L1, MSI and TMB are well established as surrogate markers for immune evasion and tumor-specific neoantigens across many tumors. Positive detection across cancer types varies, but overall ~50% of patients test negative for these primary immune markers.1 In this study, we investigated the prevalence of secondary immune biomarkers outside of PD-L1, TMB and MSI.MethodsComprehensive genomic and immune profiling, including PD-L1 IHC, TMB, MSI and gene expression of 395 immune related genes was performed on 6078 FFPE tumors representing 34 cancer types, predominantly composed of lung cancer (36.7%), colorectal cancer (11.9%) and breast cancer (8.5%). Expression levels by RNA-seq of 36 genes targeted by immunotherapies in solid tumor clinical trials, identified as secondary immune biomarkers, were ranked against a reference population. Genes with a rank value ≥75th percentile were considered high and values were associated with PD-L1 (positive ≥1%), MSI (MSI-H or MSS) and TMB (high ≥10 Mut/Mb) status. Additionally, secondary immune biomarker status was segmented by tumor type and cancer immune cycle roles.ResultsIn total, 41.0% of cases were PD-L1+, 6.4% TMB+, and 0.1% MSI-H. 12.6% of cases were positive for >2 of these markers while 39.9% were triple negative (PD-L1-/TMB-/MSS). Of the PD-L1-/TMB-/MSS cases, 89.1% were high for at least one secondary immune biomarker, with 69.3% having ≥3 markers. PD-L1-/TMB-/MSS tumor types with ≥50% prevalence of high secondary immune biomarkers included brain, prostate, kidney, sarcoma, gallbladder, breast, colorectal, and liver cancer. High expression of cancer testis antigen secondary immune biomarkers (e.g., NY-ESO-1, LAGE-1A, MAGE-A4) was most commonly observed in bladder, ovarian, sarcoma, liver, and prostate cancer (≥15%). Tumors demonstrating T-cell priming (e.g., CD40, OX40, CD137), trafficking (e.g., TGFB1, TLR9, TNF) and/or recognition (e.g., CTLA4, LAG3, TIGIT) secondary immune biomarkers were most represented by kidney, gallbladder, and sarcoma (≥40%), with melanoma, esophageal, head & neck, cervical, stomach, and lung cancer least represented (≥15%).ConclusionsOur studies show comprehensive tumor profiling that includes gene expression can detect secondary immune biomarkers targeted by investigational therapies in ~90% of PD-L1-/TMB-/MSS cases. While genomic profiling could also provide therapeutic choices for a percentage of these patients, detection of secondary immune biomarkers by RNA-seq provides additional options for patients without a clear therapeutic path as determined by PD-L1 testing and genomic profiling alone.ReferenceHuang R S P, Haberberger J, Severson E, et al. A pan-cancer analysis of PD-L1 immunohistochemistry and gene amplification, tumor mutation burden and microsatellite instability in 48,782 cases. Mod Pathol 2021;34: 252–263.


Sign in / Sign up

Export Citation Format

Share Document