diffloop: a computational framework for identifying and analyzing differential DNA loops from sequencing data

ABSTRACTThe three-dimensional architecture of DNA within the nucleus is a key determinant of interactions between genes, regulatory elements, and transcriptional machinery. As a result, differences in loop structure are associated with differences in gene expression and cell state. Here, we introduce diffloop, an R/Bioconductor package for identifying differential DNA looping between samples. The package additionally provides a suite of functions for the quality control, statistical testing, annotation and visualization of DNA loops. We demonstrate this functionality by detecting differences in DNA loops between ENCODE ChIA-PET datasets and relate looping to differences in epigenetic state and gene expression.

Download Full-text

SSCC: a novel computational framework for rapid and accurate clustering large single cell RNA-seq data

10.1101/344242 ◽

2018 ◽

Cited By ~ 2

Author(s):

Xianwen Ren ◽

Liangtao Zheng ◽

Zemin Zhang

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Large Scale ◽

Random Projection ◽

Rna Seq ◽

Sequencing Data ◽

Computational Framework ◽

Human Blood Cells ◽

Single Cell Rna Sequencing ◽

Data Volume

ABSTRACTClustering is a prevalent analytical means to analyze single cell RNA sequencing data but the rapidly expanding data volume can make this process computational challenging. New methods for both accurate and efficient clustering are of pressing needs. Here we proposed a new clustering framework based on random projection and feature construction for large scale single-cell RNA sequencing data, which greatly improves clustering accuracy, robustness and computational efficacy for various state-of-the-art algorithms benchmarked on multiple real datasets. On a dataset with 68,578 human blood cells, our method reached 20% improvements for clustering accuracy and 50-fold acceleration but only consumed 66% memory usage compared to the widely-used software package SC3. Compared to k-means, the accuracy improvement can reach 3-fold depending on the concrete dataset. An R implementation of the framework is available from https://github.com/Japrin/sscClust.

Download Full-text

Machine learning of genomic features in organotropic metastases stratifies progression risk of primary tumors

10.21203/rs.3.rs-73390/v1 ◽

2020 ◽

Author(s):

Jiguang Wang ◽

Biaobin Jiang ◽

Quanhua Mu ◽

Fufang Qiu ◽

Weiqi Xu

Keyword(s):

Early Stage ◽

Metastatic Cancer ◽

Risk Groups ◽

Sequencing Data ◽

Computational Framework ◽

Primary Tumors ◽

Genomic Features ◽

Organ Specific ◽

Spatiotemporal Behavior ◽

Prostate Cancers

Abstract Metastasis leads to most cancer deaths, but its spatiotemporal behavior remains unpredictable at early stage. Here, we developed MetaNet, a computational framework that integrates clinical and sequencing data from 32,176 primary and metastatic cancer cases, to assess metastatic risks of primary tumors. MetaNet achieved high accuracy in distinguishing the metastasis from the primary in breast and prostate cancers. From the prediction, we identified Metastasis-Featuring Primary (MFP) tumors, a subset of primary tumors with genomic features enriched in metastasis, and demonstrated their high metastatic risks with significantly shorter disease-free survivals and higher migratory potential. In addition, we identified genomic alterations associated with organ-specific metastases, and employed them to stratify patients into the risk groups with propensities toward different metastatic organs. Remarkably, this organotropic stratification achieved better prognostic value than standard histological grading system in prostate cancer, especially between Bone-MFP and Liver-MFP subtypes, with organotropic insights to inform organ-specific examinations in follow-ups.

Download Full-text

SMaSH: A scalable, general marker gene identification framework for single-cell RNA sequencing and Spatial Transcriptomics

10.1101/2021.04.08.438978 ◽

2021 ◽

Author(s):

Michael E Nelson ◽

Simone G Riva ◽

Ann Cvejic

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Marker Gene ◽

Marker Genes ◽

Sequencing Data ◽

Computational Framework ◽

Data Set ◽

Spatially Resolved ◽

Single Cell Rna Sequencing ◽

The Given

Spatial transcriptomics is revolutionising the study of single-cell RNA and tissue-wide cell heterogeneity, but few robust methods connecting spatially resolved cells to so-called marker genes from single-cell RNA sequencing, which generate significant insight gleaned from spatial methods, exist. Here we present SMaSH, a general computational framework for extracting key marker genes from single-cell RNA sequencing data for spatial transcriptomics approaches. SMaSH extracts robust and biologically well-motivated marker genes, which characterise the given data-set better than existing and limited computational approaches for global marker gene calculation.

Download Full-text

RiboVIEW: a computational framework for visualization, quality control and statistical analysis of ribosome profiling data

Nucleic Acids Research ◽

10.1093/nar/gkz1074 ◽

2019 ◽

Vol 48 (2) ◽

pp. e7-e7 ◽

Cited By ~ 1

Author(s):

Carine Legrand ◽

Francesca Tuorto

Keyword(s):

Quality Control ◽

Statistical Analysis ◽

Computational Analysis ◽

High Throughput Sequencing ◽

Ribosome Profiling ◽

Confounding Factors ◽

Batch Effects ◽

Sequencing Data ◽

Computational Framework ◽

Genome Wide

Abstract Recently, newly developed ribosome profiling methods based on high-throughput sequencing of ribosome-protected mRNA footprints allow to study genome-wide translational changes in detail. However, computational analysis of the sequencing data still represents a bottleneck for many laboratories. Further, specific pipelines for quality control and statistical analysis of ribosome profiling data, providing high levels of both accuracy and confidence, are currently lacking. In this study, we describe automated bioinformatic and statistical diagnoses to perform robust quality control of ribosome profiling data (RiboQC), to efficiently visualize ribosome positions and to estimate ribosome speed (RiboMine) in an unbiased way. We present an R pipeline to setup and undertake the analyses that offers the user an HTML page to scan own data regarding the following aspects: periodicity, ligation and digestion of footprints; reproducibility and batch effects of replicates; drug-related artifacts; unbiased codon enrichment including variability between mRNAs, for A, P and E sites; mining of some causal or confounding factors. We expect our pipeline to allow an optimal use of the wealth of information provided by ribosome profiling experiments.

Download Full-text

DeepMicrobes: taxonomic classification for metagenomics with deep learning

10.1101/694851 ◽

2019 ◽

Cited By ~ 1

Author(s):

Qiaoxing Liang ◽

Paul W. Bible ◽

Yu Liu ◽

Bin Zou ◽

Lai Wei

Keyword(s):

Deep Learning ◽

Large Scale ◽

Genomic Sequence ◽

Taxonomic Classification ◽

Sequencing Data ◽

Computational Framework ◽

Genome Wide ◽

Disease Diagnostics ◽

Genomic Sequence Analysis ◽

Microbial Genomic

AbstractTaxonomic classification is a crucial step for metagenomics applications including disease diagnostics, microbiome analyses, and outbreak tracing. Yet it is unknown what deep learning architecture can capture microbial genome-wide features relevant to this task. We report DeepMicrobes (https://github.com/MicrobeLab/DeepMicrobes), a computational framework that can perform large-scale training on > 10,000 RefSeq complete microbial genomes and accurately predict the species-of-origin of whole metagenome shotgun sequencing reads. We show the advantage of DeepMicrobes over state-of-the-art tools in precisely identifying species from microbial community sequencing data. Therefore, DeepMicrobes expands the toolbox of taxonomic classification for metagenomics and enables the development of further deep learning-based bioinformatics algorithms for microbial genomic sequence analysis.

Download Full-text

Image-based representation of massive spatial transcriptomics datasets

10.1101/2021.12.07.471629 ◽

2021 ◽

Author(s):

Stephan Preibisch ◽

Nikos Karaiskos ◽

Nikolaus Rajewsky

Keyword(s):

Computer Vision ◽

Brain Tissue ◽

Spatial Resolution ◽

Mouse Brain ◽

High Throughput ◽

Serial Sections ◽

Sequencing Data ◽

Computational Framework ◽

Sequencing Technologies ◽

Transcriptomics Data

We present STIM, an imaging-based computational framework for exploring, visualizing, and processing high-throughput spatial sequencing datasets. STIM is built on the powerful ImgLib2, N5 and BigDataViewer (BDV) frameworks enabling transfer of computer vision techniques to datasets with irregular measurement-spacing and arbitrary spatial resolution, such as spatial transcriptomics data generated by multiplexed targeted hybridization or spatial sequencing technologies. We illustrate STIM's capabilities by representing, visualizing, and automatically registering publicly available spatial sequencing data from 14 serial sections of mouse brain tissue.

Download Full-text

A computational framework for detecting signatures of accelerated somatic evolution in cancer genomes

10.1101/177261 ◽

2017 ◽

Author(s):

Kyle S. Smith ◽

Debashis Ghosh ◽

Katherine S. Pollard ◽

Subhajyoti De

Keyword(s):

Somatic Mutations ◽

Local Context ◽

Whole Genome Sequencing Data ◽

Nuclear Space ◽

Evolutionary Divergence ◽

Sequencing Data ◽

Computational Framework ◽

Somatic Evolution ◽

A Genome ◽

Cancer Genomes

ABSTRACTBy accumulation of somatic mutations, cancer genomes evolve, diverging away from the genome of the host. It remains unclear to what extent somatic evolutionary divergence is comparable across different regions of the cancer genome versus concentrated in specific genomic elements. We present a novel computational framework, SASE-mapper, to identify genomic regions that show signatures of accelerated somatic evolution (SASE) in a subset of samples in a cohort, marked by accumulation of an excess of somatic mutations compared to that expected based on local, context-aware background mutation rates in the cancer genomes. Analyzing tumor whole genome sequencing data for 365 samples from 5 cohorts we detect recurrent SASE at a genome-wide scale. The SASEs were enriched for genomic elements associated with active chromatin, and regulatory regions of several known cancer genes had SASE in multiple cohorts. Regions with SASE carried specific mutagenic signatures and often co-localized within the 3D nuclear space suggesting their common basis. A subset of SASEs was frequently associated with regulatory changes in key cancer pathways and also poor clinical outcome. While the SASE-associated mutations were not necessarily recurrent at base-pair resolution, the SASEs recurrently targeted same functional regions, with similar consequences. It is likely that regulatory redundancy and plasticity promote prevalence of SASE-like patterns in the cancer genomes.

Download Full-text

scDALI: modeling allelic heterogeneity in single cells reveals context-specific genetic regulation

Genome Biology ◽

10.1186/s13059-021-02593-8 ◽

2022 ◽

Vol 23 (1) ◽

Author(s):

Tobias Heinen ◽

Stefano Secchia ◽

James P. Reddington ◽

Bingqing Zhao ◽

Eileen E. M. Furlong ◽

...

Keyword(s):

Developmental Stages ◽

Single Cells ◽

Explanatory Power ◽

Genetic Regulation ◽

Cell Types ◽

Genetic Effects ◽

Sequencing Data ◽

Computational Framework ◽

Human Ipscs ◽

Context Specific

AbstractWhile it is established that the functional impact of genetic variation can vary across cell types and states, capturing this diversity remains challenging. Current studies using bulk sequencing either ignore this heterogeneity or use sorted cell populations, reducing discovery and explanatory power. Here, we develop scDALI, a versatile computational framework that integrates information on cellular states with allelic quantifications of single-cell sequencing data to characterize cell-state-specific genetic effects. We apply scDALI to scATAC-seq profiles from developing F1 Drosophila embryos and scRNA-seq from differentiating human iPSCs, uncovering heterogeneous genetic effects in specific lineages, developmental stages, or cell types.

Download Full-text

PEMer: a computational framework with simulation-based error models for inferring genomic structural variants from massive paired-end sequencing data

Genome Biology ◽

10.1186/gb-2009-10-2-r23 ◽

2009 ◽

Vol 10 (2) ◽

pp. R23 ◽

Cited By ~ 184

Author(s):

Jan O Korbel ◽

Alexej Abyzov ◽

Xinmeng Mu ◽

Nicholas Carriero ◽

Philip Cayting ◽

...

Keyword(s):

Structural Variants ◽

Sequencing Data ◽

Computational Framework ◽

Error Models ◽

Simulation Based ◽

Paired End Sequencing ◽

Genomic Structural Variants

Download Full-text