scholarly journals The winning methods for predicting cellular position in the DREAM single-cell transcriptomics challenge

Author(s):  
Vu V H Pham ◽  
Xiaomei Li ◽  
Buu Truong ◽  
Thin Nguyen ◽  
Lin Liu ◽  
...  

Abstract Motivation Predicting cell locations is important since with the understanding of cell locations, we may estimate the function of cells and their integration with the spatial environment. Thus, the DREAM challenge on single-cell transcriptomics required participants to predict the locations of single cells in the Drosophila embryo using single-cell transcriptomic data. Results We have developed over 50 pipelines by combining different ways of preprocessing the RNA-seq data, selecting the genes, predicting the cell locations and validating predicted cell locations, resulting in the winning methods which were ranked second in sub-challenge 1, first in sub-challenge 2 and third in sub-challenge 3. In this paper, we present an R package, SCTCwhatateam, which includes all the methods we developed and the Shiny web application to facilitate the research on single-cell spatial reconstruction. All the data and the example use cases are available in the Supplementary data.

2020 ◽  
Author(s):  
Vu VH Pham ◽  
Xiaomei Li ◽  
Buu Truong ◽  
Thin Nguyen ◽  
Lin Liu ◽  
...  

AbstractMotivationPredicting cell locations is important since with the understanding of cell locations, we may estimate the function of cells and their integration with the spatial environment. Thus, the DREAM Challenge on Single Cell Transcriptomics required participants to predict the locations of single cells in the Drosophila embryo using single cell transcriptomic data.ResultsWe have developed over 50 pipelines by combining different ways of pre-processing the RNA-seq data, selecting the genes, predicting the cell locations, and validating predicted cell locations, resulting in the winning methods for two out of three sub-challenges in the competition. In this paper, we present an R package, SCTCwhatateam, which includes all the methods we developed and the Shiny web-application to facilitate the research on single cell spatial reconstruction. All the data and the example use cases are available in the Supplementary material.AvailabilityThe scripts of the package are available at https://github.com/thanhbuu04/SCTCwhatateam and the Shiny application is available at https://github.com/pvvhoang/[email protected] informationSupplementary data are available at Briefings in Bioinformatics online.


2017 ◽  
Author(s):  
Zhun Miao ◽  
Ke Deng ◽  
Xiaowo Wang ◽  
Xuegong Zhang

AbstractSummaryThe excessive amount of zeros in single-cell RNA-seq data include “real” zeros due to the on-off nature of gene transcription in single cells and “dropout” zeros due to technical reasons. Existing differential expression (DE) analysis methods cannot distinguish these two types of zeros. We developed an R package DEsingle which employed Zero-Inflated Negative Binomial model to estimate the proportion of real and dropout zeros and to define and detect 3 types of DE genes in single-cell RNA-seq data with higher accuracy.Availability and ImplementationThe R package DEsingle is freely available at https://github.com/miaozhun/DEsingle and is under Bioconductor’s consideration [email protected] informationSupplementary data are available at bioRxiv online.


2018 ◽  
Author(s):  
Changlin Wan ◽  
Wennan Chang ◽  
Yu Zhang ◽  
Fenil Shah ◽  
Xiaoyu Lu ◽  
...  

ABSTRACTA key challenge in modeling single-cell RNA-seq (scRNA-seq) data is to capture the diverse gene expression states regulated by different transcriptional regulatory inputs across single cells, which is further complicated by a large number of observed zero and low expressions. We developed a left truncated mixture Gaussian (LTMG) model that stems from the kinetic relationships between the transcriptional regulatory inputs and metabolism of mRNA and gene expression abundance in a cell. LTMG infers the expression multi-modalities across single cell entities, representing a gene’s diverse expression states; meanwhile the dropouts and low expressions are treated as left truncated, specifically representing an expression state that is under suppression. We demonstrated that LTMG has significantly better goodness of fitting on an extensive number of single-cell data sets, comparing to three other state of the art models. In addition, our systems kinetic approach of handling the low and zero expressions and correctness of the identified multimodality are validated on several independent experimental data sets. Application on data of complex tissues demonstrated the capability of LTMG in extracting varied expression states specific to cell types or cell functions. Based on LTMG, a differential gene expression test and a co-regulation module identification method, namely LTMG-DGE and LTMG-GCR, are further developed. We experimentally validated that LTMG-DGE is equipped with higher sensitivity and specificity in detecting differentially expressed genes, compared with other five popular methods, and that LTMG-GCR is capable to retrieve the gene co-regulation modules corresponding to perturbed transcriptional regulations. A user-friendly R package with all the analysis power is available at https://github.com/zy26/LTMGSCA.


2019 ◽  
Vol 47 (18) ◽  
pp. e111-e111 ◽  
Author(s):  
Changlin Wan ◽  
Wennan Chang ◽  
Yu Zhang ◽  
Fenil Shah ◽  
Xiaoyu Lu ◽  
...  

Abstract A key challenge in modeling single-cell RNA-seq data is to capture the diversity of gene expression states regulated by different transcriptional regulatory inputs across individual cells, which is further complicated by largely observed zero and low expressions. We developed a left truncated mixture Gaussian (LTMG) model, from the kinetic relationships of the transcriptional regulatory inputs, mRNA metabolism and abundance in single cells. LTMG infers the expression multi-modalities across single cells, meanwhile, the dropouts and low expressions are treated as left truncated. We demonstrated that LTMG has significantly better goodness of fitting on an extensive number of scRNA-seq data, comparing to three other state-of-the-art models. Our biological assumption of the low non-zero expressions, rationality of the multimodality setting, and the capability of LTMG in extracting expression states specific to cell types or functions, are validated on independent experimental data sets. A differential gene expression test and a co-regulation module identification method are further developed. We experimentally validated that our differential expression test has higher sensitivity and specificity, compared with other five popular methods. The co-regulation analysis is capable of retrieving gene co-regulation modules corresponding to perturbed transcriptional regulations. A user-friendly R package with all the analysis power is available at https://github.com/zy26/LTMGSCA.


2021 ◽  
Author(s):  
Konrad Thorner ◽  
Aaron M. Zorn ◽  
Praneet Chaturvedi

AbstractAnnotation of single cells has become an important step in the single cell analysis framework. With advances in sequencing technology thousands to millions of cells can be processed to understand the intricacies of the biological system in question. Annotation through manual curation of markers based on a priori knowledge is cumbersome given this exponential growth. There are currently ~200 computational tools available to help researchers automatically annotate single cells using supervised/unsupervised machine learning, cell type markers, or tissue-based markers from bulk RNA-seq. But with the expansion of publicly available data there is also a need for a tool which can help integrate multiple references into a unified atlas and understand how annotations between datasets compare. Here we present ELeFHAnt: Ensemble learning for harmonization and annotation of single cells. ELeFHAnt is an easy-to-use R package that employs support vector machine and random forest algorithms together to perform three main functions: 1) CelltypeAnnotation 2) LabelHarmonization 3) DeduceRelationship. CelltypeAnnotation is a function to annotate cells in a query Seurat object using a reference Seurat object with annotated cell types. LabelHarmonization can be utilized to integrate multiple cell atlases (references) into a unified cellular atlas with harmonized cell types. Finally, DeduceRelationship is a function that compares cell types between two scRNA-seq datasets. ELeFHAnt can be accessed from GitHub at https://github.com/praneet1988/ELeFHAnt.


2020 ◽  
Vol 36 (11) ◽  
pp. 3588-3589 ◽  
Author(s):  
Kaiyi Zhu ◽  
Dimitris Anastassiou

Abstract Summary We developed 2DImpute, an imputation method for correcting false zeros (known as dropouts) in single-cell RNA-sequencing (scRNA-seq) data. It features preventing excessive correction by predicting the false zeros and imputing their values by making use of the interrelationships between both genes and cells in the expression matrix. We showed that 2DImpute outperforms several leading imputation methods by applying it on datasets from various scRNA-seq protocols. Availability and implementation The R package of 2DImpute is freely available at GitHub (https://github.com/zky0708/2DImpute). Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Andrew E Teschendorff ◽  
Alok K Maity ◽  
Xue Hu ◽  
Chen Weiyan ◽  
Matthias Lechner

Abstract Motivation An important task in the analysis of single-cell RNA-Seq data is the estimation of differentiation potency, as this can help identify stem-or-multipotent cells in non-temporal studies or in tissues where differentiation hierarchies are not well established. A key challenge in the estimation of single-cell potency is the need for a fast and accurate algorithm, scalable to large scRNA-Seq studies profiling millions of cells. Results Here, we present a single-cell potency measure, called Correlation of Connectome and Transcriptome (CCAT), which can return accurate single-cell potency estimates of a million cells in minutes, a 100-fold improvement over current state-of-the-art methods. We benchmark CCAT against 8 other single-cell potency models and across 28 scRNA-Seq studies, encompassing over 2 million cells, demonstrating comparable accuracy than the current state-of-the-art, at a significantly reduced computational cost, and with increased robustness to dropouts. Availability and implementation CCAT is part of the SCENT R-package, freely available from https://github.com/aet21/SCENT. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Irzam Sarfraz ◽  
Muhammad Asif ◽  
Joshua D Campbell

Abstract Motivation R Experiment objects such as the SummarizedExperiment or SingleCellExperiment are data containers for storing one or more matrix-like assays along with associated row and column data. These objects have been used to facilitate the storage and analysis of high-throughput genomic data generated from technologies such as single-cell RNA sequencing. One common computational task in many genomics analysis workflows is to perform subsetting of the data matrix before applying down-stream analytical methods. For example, one may need to subset the columns of the assay matrix to exclude poor-quality samples or subset the rows of the matrix to select the most variable features. Traditionally, a second object is created that contains the desired subset of assay from the original object. However, this approach is inefficient as it requires the creation of an additional object containing a copy of the original assay and leads to challenges with data provenance. Results To overcome these challenges, we developed an R package called ExperimentSubset, which is a data container that implements classes for efficient storage and streamlined retrieval of assays that have been subsetted by rows and/or columns. These classes are able to inherently provide data provenance by maintaining the relationship between the subsetted and parent assays. We demonstrate the utility of this package on a single-cell RNA-seq dataset by storing and retrieving subsets at different stages of the analysis while maintaining a lower memory footprint. Overall, the ExperimentSubset is a flexible container for the efficient management of subsets. Availability and implementation ExperimentSubset package is available at Bioconductor: https://bioconductor.org/packages/ExperimentSubset/ and Github: https://github.com/campbio/ExperimentSubset. Supplementary information Supplementary data are available at Bioinformatics online.


2021 ◽  
Vol 7 (8) ◽  
pp. eabe3610
Author(s):  
Conor J. Kearney ◽  
Stephin J. Vervoort ◽  
Kelly M. Ramsbottom ◽  
Izabela Todorovski ◽  
Emily J. Lelliott ◽  
...  

Multimodal single-cell RNA sequencing enables the precise mapping of transcriptional and phenotypic features of cellular differentiation states but does not allow for simultaneous integration of critical posttranslational modification data. Here, we describe SUrface-protein Glycan And RNA-seq (SUGAR-seq), a method that enables detection and analysis of N-linked glycosylation, extracellular epitopes, and the transcriptome at the single-cell level. Integrated SUGAR-seq and glycoproteome analysis identified tumor-infiltrating T cells with unique surface glycan properties that report their epigenetic and functional state.


Author(s):  
Wenbin Ye ◽  
Tao Liu ◽  
Hongjuan Fu ◽  
Congting Ye ◽  
Guoli Ji ◽  
...  

Abstract Motivation Alternative polyadenylation (APA) has been widely recognized as a widespread mechanism modulated dynamically. Studies based on 3′ end sequencing and/or RNA-seq have profiled poly(A) sites in various species with diverse pipelines, yet no unified and easy-to-use toolkit is available for comprehensive APA analyses. Results We developed an R package called movAPA for modeling and visualization of dynamics of alternative polyadenylation across biological samples. movAPA incorporates rich functions for preprocessing, annotation and statistical analyses of poly(A) sites, identification of poly(A) signals, profiling of APA dynamics and visualization. Particularly, seven metrics are provided for measuring the tissue-specificity or usages of APA sites across samples. Three methods are used for identifying 3′ UTR shortening/lengthening events between conditions. APA site switching involving non-3′ UTR polyadenylation can also be explored. Using poly(A) site data from rice and mouse sperm cells, we demonstrated the high scalability and flexibility of movAPA in profiling APA dynamics across tissues and single cells. Availability and implementation https://github.com/BMILAB/movAPA. Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document