SCeQTL: an R package for identifying eQTL from single-cell parallel sequencing data

Mapping Intimacies ◽

10.1101/499863 ◽

2018 ◽

Cited By ~ 3

Author(s):

Yue Hu ◽

Xuegong Zhang

Keyword(s):

Gene Expression ◽

Single Cell ◽

Negative Binomial ◽

Negative Binomial Regression ◽

R Package ◽

Eqtl Analysis ◽

Sequencing Data ◽

Parallel Sequencing ◽

Single Cell Sequencing ◽

Cell Data

With the development of single-cell sequencing technologies, parallel sequencing the transcriptome and genome is becoming available and will bring us the opportunity to uncover association between genotype and phenotype at single-cell level. Due to the special characteristics of single-cell sequencing data, new method is needed to identify eQTL from single-cell data. We developed an R package SCeQTL that uses zero-inflated negative binomial regression to do eQTL analysis on single-cell data. It can distinguish two type of gene-expression differences among different genotype groups. It can also be used for finding gene expression variations associated with other grouping factors like cell lineages or cell types.

Download Full-text

Normalization by distributional resampling of high throughput single-cell RNA-sequencing data

10.1101/2020.10.28.359901 ◽

2020 ◽

Author(s):

Jared Brown ◽

Zijian Ni ◽

Chitrasen Mohanty ◽

Rhonda Bacher ◽

Christina Kendziorski

Keyword(s):

Gene Expression ◽

Single Cell ◽

Rna Sequencing ◽

Negative Binomial ◽

R Package ◽

Sequencing Depth ◽

Sequencing Data ◽

Specific Expression ◽

Single Cell Rna Sequencing ◽

Binomial Mixture

AbstractMotivationNormalization to remove technical or experimental artifacts is critical in the analysis of single-cell RNA-sequencing experiments, even those for which unique molecular identifiers (UMIs) are available. The majority of methods for normalizing single-cell RNA-sequencing data adjust average expression in sequencing depth, but allow the variance and other properties of the gene-specific expression distribution to be non-constant in depth, which often results in reduced power and increased false discoveries in downstream analyses. This problem is exacerbated by the high proportion of zeros present in most datasets.ResultsTo address this, we present Dino, a normalization method based on a flexible negative-binomial mixture model of gene expression. As demonstrated in both simulated and case study datasets, by normalizing the entire gene expression distribution, Dino is robust to shallow sequencing depth, sample heterogeneity, and varying zero proportions, leading to improved performance in downstream analyses in a number of settings.Availability and implementationThe R package, Dino, is available on GitHub at https://github.com/JBrownBiostat/[email protected], [email protected]

Download Full-text

484 Bioturing browser: interactively explore public single cell sequencing data

Journal for ImmunoTherapy of Cancer ◽

10.1136/jitc-2020-sitc2020.0484 ◽

2020 ◽

Vol 8 (Suppl 3) ◽

pp. A520-A520

Author(s):

Son Pham ◽

Tri Le ◽

Tan Phan ◽

Minh Pham ◽

Huy Nguyen ◽

...

Keyword(s):

Single Cell ◽

Immune Cell ◽

Expression Profiles ◽

Meta Analysis ◽

Cell Types ◽

Sequencing Data ◽

Single Cell Sequencing ◽

Data Formats ◽

Cancer Types ◽

Cell Data

BackgroundSingle-cell sequencing technology has opened an unprecedented ability to interrogate cancer. It reveals significant insights into the intratumoral heterogeneity, metastasis, therapeutic resistance, which facilitates target discovery and validation in cancer treatment. With rapid advancements in throughput and strategies, a particular immuno-oncology study can produce multi-omics profiles for several thousands of individual cells. This overflow of single-cell data poses formidable challenges, including standardizing data formats across studies, performing reanalysis for individual datasets and meta-analysis.MethodsN/AResultsWe present BioTuring Browser, an interactive platform for accessing and reanalyzing published single-cell omics data. The platform is currently hosting a curated database of more than 10 million cells from 247 projects, covering more than 120 immune cell types and subtypes, and 15 different cancer types. All data are processed and annotated with standardized labels of cell types, diseases, therapeutic responses, etc. to be instantly accessed and explored in a uniform visualization and analytics interface. Based on this massive curated database, BioTuring Browser supports searching similar expression profiles, querying a target across datasets and automatic cell type annotation. The platform supports single-cell RNA-seq, CITE-seq and TCR-seq data. BioTuring Browser is now available for download at www.bioturing.com.ConclusionsN/A

Download Full-text

scTree: An R package to generate antibody-compatible classifiers from single-cell sequencing data

The Journal of Open Source Software ◽

10.21105/joss.02061 ◽

2020 ◽

Vol 5 (48) ◽

pp. 2061

Author(s):

J. Paez ◽

Michael Wendt ◽

Nadia Lanman

Keyword(s):

Single Cell ◽

R Package ◽

Sequencing Data ◽

Single Cell Sequencing

Download Full-text

Cellsnp-lite: an efficient tool for genotyping single cells

10.1101/2020.12.31.424913 ◽

2021 ◽

Author(s):

Xianjie Huang ◽

Yuanhua Huang

Keyword(s):

Single Cell ◽

Single Cells ◽

Basic Research ◽

Substantial Improvement ◽

Data Sets ◽

Sequencing Data ◽

Single Cell Sequencing ◽

Memory Efficiency ◽

Computational Speed ◽

Cell Data

AbstractSummarySingle-cell sequencing is an increasingly used technology and has promising applications in basic research and clinical translations. However, genotyping methods developed for bulk sequencing data have not been well adapted for single-cell data, in terms of both computational parallelization and simplified user interface. Here we introduce a software, cellsnp-lite, implemented in C/C++ and based on well supported package htslib, for genotyping in single-cell sequencing data for both droplet and well based platforms. On various experimental data sets, it shows substantial improvement in computational speed and memory efficiency with retaining highly concordant results compared to existing methods. Cellsnp-lite therefore lightens the genetic analysis for increasingly large single-cell data.AvailabilityThe source code is freely available at https://github.com/single-cell-genetics/[email protected]

Download Full-text

powerEQTL: An R package and shiny application for sample size and power calculation of bulk tissue and single-cell eQTL analysis

10.1101/2020.12.15.422954 ◽

2020 ◽

Author(s):

Xianjun Dong ◽

Xiaoqi Li ◽

Tzuu-Wang Chang ◽

Scott T Weiss ◽

Weiliang Qiu

Keyword(s):

Gene Expression ◽

Sample Size ◽

Single Cell ◽

Allele Frequency ◽

R Package ◽

Power Calculation ◽

Eqtl Analysis ◽

Genome Wide Association Studies ◽

User Friendly ◽

Bulk Tissue

Genome-wide association studies (GWAS) have revealed thousands of genetic loci for common diseases. One of the main challenges in the post-GWAS era is to understand the causality of the genetic variants. Expression quantitative trait locus (eQTL) analysis has been proven to be an effective way to address this question by examining the relationship between gene expression and genetic variation in a sufficiently powered cohort. However, it is often tricky to determine the sample size at which a variant with a specific allele frequency will be detected to associate with gene expression with sufficient power. This is particularly demanding with single-cell RNAseq studies. Therefore, a user-friendly tool to perform power analysis for eQTL at both bulk tissue and single-cell level will be critical. Here, we presented an R package called powerEQTL with flexible functions to calculate power, minimal sample size, or detectable minor allele frequency in both bulk tissue and single-cell eQTL analysis. A user-friendly, program-free web application is also provided, allowing customers to calculate and visualize the parameters interactively.

Download Full-text

Integrated analysis of bulk multi omic and single-cell sequencing data confirms the molecular origin of hemodynamic changes in Covid-19 infection explaining coagulopathy and higher geriatric mortality

10.1101/2020.04.26.20081182 ◽

2020 ◽

Author(s):

Shreya Johri ◽

Deepali Jain ◽

Ishaan Gupta

Keyword(s):

Gene Expression ◽

Single Cell ◽

Older Patients ◽

Cell Types ◽

Integrated Analysis ◽

Molecular Evidence ◽

Sequencing Data ◽

Phagocytic Cells ◽

Hemodynamic Changes ◽

Single Cell Sequencing

AbstractBesides severe respiratory distress, recent reports in Covid-19 patients have found a strong association between platelet counts and patient survival. Along with hemodynamic changes such as prolonged clotting time, high fibrin degradation products and D-dimers, increased levels of monocytes with disturbed morphology have also been identified. In this study, through an integrated analysis of bulk RNA-sequencing data from Covid-19 patients with data from single-cell sequencing studies on lung tissues, we found that most of the cell-types that contributed to the altered gene expression were of hematopoietic origin. We also found that differentially expressed genes in Covid-19 patients formed a significant pool of the expressing genes in phagocytic cells such as Monocytes and platelets. Interestingly, while we observed a general enrichment for Monocytes in Covid-19 patients, we found that the signal for FCGRA3+ Monocytes was depleted. Further, we found evidence that age-associated gene expression changes in Monocytes and platelets, associated with inflammation, mirror gene expression changes in Covid-19 patients suggesting that pro-inflammatory signalling during aging may worsen the infection in older patients. We identified more than 20 genes that change in the same direction between Covid-19 infection and aging cells that may act as potential therapeutic targets. Of particular interest were IL2RG, GNLY and GMZA expressed in platelets, which facilitates cytokine signalling in Monocytes through an interaction with platelets. To understand whether infection can directly manipulate the biology of Monocytes and platelets, we hypothesize that these non-ACE2 expressing cells may be infected by the virus through the phagocytic route. We observed that phagocytic cells such as Monocytes, T-cells, and platelets have a significantly higher expression of genes that are a part of the Covid-19 viral interactome. Hence these cell-types may have an active rather than a reactive role in viral pathogenesis to manifest clinical symptoms such as coagulopathy. Therefore, our results present molecular evidence for pursuing both anti-inflammatory and anticoagulation therapy for better patient management especially in older patients.

Download Full-text

Cobolt: integrative analysis of multimodal single-cell sequencing data

Genome Biology ◽

10.1186/s13059-021-02556-z ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Boying Gong ◽

Yun Zhou ◽

Elizabeth Purdom

Keyword(s):

Gene Expression ◽

Single Cell ◽

Chromatin Accessibility ◽

Integrative Analysis ◽

Rna Seq ◽

Sequencing Data ◽

Single Cell Sequencing ◽

Multiple Datasets ◽

Novel Method ◽

Sequencing Platforms

AbstractA growing number of single-cell sequencing platforms enable joint profiling of multiple omics from the same cells. We present , a novel method that not only allows for analyzing the data from joint-modality platforms, but provides a coherent framework for the integration of multiple datasets measured on different modalities. We demonstrate its performance on multi-modality data of gene expression and chromatin accessibility and illustrate the integration abilities of by jointly analyzing this multi-modality data with single-cell RNA-seq and ATAC-seq datasets.

Download Full-text

Quasi-universality in single-cell sequencing data

10.1101/426239 ◽

2018 ◽

Cited By ~ 2

Author(s):

Luis Aparicio ◽

Mykola Bordyuh ◽

Andrew J. Blumberg ◽

Raul Rabadan

Keyword(s):

Single Cell ◽

Matrix Theory ◽

Biological Information ◽

Sequencing Data ◽

Data Set ◽

Single Cell Sequencing ◽

Marked Cell ◽

Eigenvector Localization ◽

Cell Data ◽

Epigenetic Processes

ABSTRACTThe development of single-cell technologies provides the opportunity to identify new cellular states and reconstruct novel cell-to-cell relationships. Applications range from understanding the transcriptional and epigenetic processes involved in metazoan development to characterizing distinct cells types in heterogeneous populations like cancers or immune cells. However, analysis of the data is impeded by its unknown intrinsic biological and technical variability together with its sparseness; these factors complicate the identification of true biological signals amidst artifact and noise. Here we show that, across technologies, roughly 95% of the eigenvalues derived from each single-cell data set can be described by universal distributions predicted by Random Matrix Theory. Interestingly, 5% of the spectrum shows deviations from these distributions and present a phenomenon known as eigenvector localization, where information tightly concentrates in groups of cells. Some of the localized eigenvectors reflect underlying biological signal, and some are simply a consequence of the sparsity of single cell data; roughly 3% is artifactual. Based on the universal distributions and a technique for detecting sparsity induced localization, we present a strategy to identify the residual 2% of directions that encode biological information and thereby denoise single-cell data. We demonstrate the effectiveness of this approach by comparing with standard single-cell data analysis techniques in a variety of examples with marked cell populations.

Download Full-text

Phenotype-guided subpopulation identification from single-cell sequencing data

10.1101/2020.06.05.137240 ◽

2020 ◽

Author(s):

Duanchen Sun ◽

Xiangnan Guan ◽

Amy E. Moran ◽

David Z. Qian ◽

Pepper Schedin ◽

...

Keyword(s):

Lung Cancer ◽

Single Cell ◽

Clinical Information ◽

Single Step ◽

Cell Subpopulation ◽

Clustering Methods ◽

Sequencing Data ◽

Single Cell Sequencing ◽

Cell Subpopulations ◽

Cell Data

AbstractSingle-cell sequencing yields novel discoveries by distinguishing cell types, states and lineages within the context of heterogeneous tissues. However, interpreting complex single-cell data from highly heterogeneous cell populations remains challenging. Currently, most existing single-cell data analyses focus on cell type clusters defined by unsupervised clustering methods, which cannot directly link cell clusters with specific biological and clinical phenotypes. Here we present Scissor, a novel approach that utilizes disease phenotypes to identify cell subpopulations from single-cell data that most highly correlate with a given phenotype. This “phenotype-to-cell within a single step” strategy enables the utilization of a large amount of clinical information that has been collected for bulk assays to identify the most highly phenotype-associated cell subpopulations. When applied to a lung cancer single-cell RNA-seq (scRNA-seq) dataset, Scissor identified a subset of cells exhibiting high hypoxia activities, which predicted worse survival outcomes in lung cancer patients. Furthermore, in a melanoma scRNA-seq dataset, Scissor discerned a T cell subpopulation with low PDCD1/CTLA4 and high TCF7 expressions, which is associated with a favorable immunotherapy response. Thus, Scissor provides a novel framework to identify the biologically and clinically relevant cell subpopulations from single-cell assays by leveraging the wealth of phenotypes and bulk-omics datasets.

Download Full-text

singleCellHaystack: Finding surprising genes in 2-dimensional representations of single cell transcriptome data

10.1101/557967 ◽

2019 ◽

Author(s):

Alexis Vandenbon ◽

Diego Diez

Keyword(s):

Single Cell ◽

Expression Patterns ◽

R Package ◽

Biological Knowledge ◽

Transcriptome Data ◽

Sequencing Data ◽

Single Cell Sequencing ◽

Leibler Divergence ◽

Cell Transcriptome ◽

Single Cell Transcriptome

AbstractSummarySingle-cell sequencing data is often visualized in 2-dimensional plots, including t-SNE plots. However, it is not straightforward to extract biological knowledge, such as differentially expressed genes, from these plots. Here we introduce singleCellHaystack, a methodology that addresses this problem. singleCellHaystack uses Kullback-Leibler Divergence to find genes that are expressed in subsets of cells that are non-randomly positioned on a 2D plot. We illustrate the usage of singleCellHaystack through applications on several single-cell datasets. singleCellHaystack is implemented as an R package, and includes additional functions for clustering and visualization of genes with interesting expression patterns.Availability and implementationhttps://github.com/alexisvdb/[email protected]

Download Full-text