Style transfer with variational autoencoders is a promising approach to RNA-Seq data harmonization and analysis

Mapping Intimacies ◽

10.1101/791962 ◽

2019 ◽

Author(s):

N. Russkikh ◽

D. Antonets ◽

D. Shtokalo ◽

A. Makarov ◽

Y. Vyatkin ◽

...

Keyword(s):

Prediction Accuracy ◽

Supplementary Information ◽

Rna Seq ◽

Style Transfer ◽

Data Harmonization ◽

Link Type ◽

Proposed Model ◽

Technical Factors ◽

Neural Network Classifiers

AbstractMotivationThe transcriptomic data is being frequently used in the research of biomarker genes of different diseases and biological states. The most common tasks there are data harmonization and treatment outcome prediction. Both of them can be addressed via the style transfer approach. Either technical factors or any biological details about the samples which we would like to control (gender, biological state, treatment etc.) can be used as style components.ResultsThe proposed style transfer solution is based on Conditional Variational Autoencoders, Y-Autoencoders and adversarial feature decomposition. In order to quantitatively measure the quality of the style transfer, neural network classifiers which predict the style and semantics after training on real expression were used. Comparison with several existing style-transfer based approaches shows that proposed model has the highest style prediction accuracy on all considered datasets while having comparable or the best semantics prediction accuracy.Availabilityhttps://github.com/NRshka/[email protected] informationFigShare.com (https://dx.doi.org/10.6084/m9.figshare.9925115)

Download Full-text

Style transfer with variational autoencoders is a promising approach to RNA-Seq data harmonization and analysis

Bioinformatics ◽

10.1093/bioinformatics/btaa624 ◽

2020 ◽

Vol 36 (20) ◽

pp. 5076-5085

Author(s):

Nikolai Russkikh ◽

Denis Antonets ◽

Dmitry Shtokalo ◽

Alexander Makarov ◽

Yuri Vyatkin ◽

...

Keyword(s):

Prediction Accuracy ◽

Supplementary Information ◽

Rna Seq ◽

Style Transfer ◽

Data Harmonization ◽

Proposed Model ◽

Technical Factors ◽

Neural Network Classifiers ◽

Biological State

Abstract Motivation The transcriptomic data are being frequently used in the research of biomarker genes of different diseases and biological states. The most common tasks there are the data harmonization and treatment outcome prediction. Both of them can be addressed via the style transfer approach. Either technical factors or any biological details about the samples which we would like to control (gender, biological state, treatment, etc.) can be used as style components. Results The proposed style transfer solution is based on Conditional Variational Autoencoders, Y-Autoencoders and adversarial feature decomposition. To quantitatively measure the quality of the style transfer, neural network classifiers which predict the style and semantics after training on real expression were used. Comparison with several existing style-transfer based approaches shows that proposed model has the highest style prediction accuracy on all considered datasets while having comparable or the best semantics prediction accuracy. Availability and implementation https://github.com/NRshka/stvae-source. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

PhyloFold: Precise and Swift Prediction of RNA Secondary Structures to Incorporate Phylogeny among Homologs

10.1101/2020.03.05.975797 ◽

2020 ◽

Author(s):

Masaki Tagashira

Keyword(s):

Secondary Structure ◽

Rna Secondary Structure ◽

Prediction Accuracy ◽

Structural Alignment ◽

Source Code ◽

Secondary Structures ◽

Supplementary Information ◽

Supplementary Data ◽

Link Type ◽

Structural Alignments

AbstractMotivationThe simultaneous consideration of sequence alignment and RNA secondary structure, or structural alignment, is known to help predict more accurate secondary structures of homologs. However, the consideration is heavy and can be done only roughly to decompose structural alignments.ResultsThe PhyloFold method, which predicts secondary structures of homologs considering likely pairwise structural alignments, was developed in this study. The method shows the best prediction accuracy while demanding comparable running time compared to conventional methods.AvailabilityThe source code of the programs implemented in this study is available on “https://github.com/heartsh/phylofold” and “https://github.com/heartsh/phyloalifold“.Contact“[email protected]”.Supplementary informationSupplementary data are available.

Download Full-text

Curatr: a web application for creating, curating, and sharing a mass spectral library

10.1101/170571 ◽

2017 ◽

Author(s):

Andrew Palmer ◽

Prasad Phapale ◽

Dominik Fay ◽

Theodore Alexandrov

Keyword(s):

Mass Spectrometry ◽

Web Application ◽

Mass Spectrometry Analysis ◽

Supplementary Information ◽

Spectral Library ◽

Mass Spectral Fragmentation ◽

Mass Spectral ◽

Link Type ◽

Manual Curation

AbstractMotivationIdentification from metabolomics mass spectrometry experiments requires comparison of fragmentation spectra from experimental samples to spectra from analytical standards. As the quality of identification depends directly on the quality of the reference spectra, manual curation is routine during the selection of reference spectra to include in a spectral library. Whilst building our own in-house spectral library we realised that there is currently no vendor neutral open access tool for for facilitating manual curation of spectra from raw LC-MS data into a custom spectral library.ResultsWe developed a web application curatr for the rapid generation of high quality mass spectral fragmentation libraries for liquid-chromatography mass spectrometry analysis. Curatr handles datasets from single or multiplexed standards, automatically extracting chromatographic profiles and potential fragmentation spectra for multiple adducts. These are presented through an intuitive interface for manual curation before being documented in a custom spectral library. Searchable molecular information and the providence of each standard is stored along with metadata on the experimental protocol. Curatr support the export of spectral libraries in several standard formats for easy use with third party software or submission to community databases, maximising the return on investment for these costly measurements. We demonstrate the use of curatr to generate the EMBL Metabolomics Core Facility spectral library which is publicly available at http://curatr.mcf.embl.de.AvailabilityThe source code is freely available at http://github.com/alexandrovteam/curatr/ along with example data.Supplementary informationA step-by step user manual is available in the supplementary information

Download Full-text

orfipy: a fast and flexible tool for extracting ORFs

10.1101/2020.10.20.348052 ◽

2020 ◽

Author(s):

Urminder Singh ◽

Eve Syrkin Wurtele

Keyword(s):

Open Reading Frames ◽

Supplementary Information ◽

Rna Seq ◽

Flexible Tool ◽

Coding Regions ◽

Link Type ◽

Alternative Reading Frames ◽

Downstream Analysis ◽

Fine Tune ◽

Reading Frames

SummarySearching for ORFs in transcripts is a critical step prior to annotating coding regions in newly-sequenced genomes and to search for alternative reading frames within known genes. With the tremendous increase in RNA-Seq data, faster tools are needed to handle large input datasets. These tools should be versatile enough to fine-tune search criteria and allow efficient downstream analysis. Here we present a new python based tool, orfipy, which allows the user to flexibly search for open reading frames in fasta sequences. The search is rapid and is fully customizable, with a choice of Fasta and BED output formats.Availability and implementationorfipy is implemented in python and is compatible with python v3.6 and higher. Source code: https://github.com/urmi-21/orfipy. Installation: from the source, or via PyPi (https://pypi.org/project/orfipy) or bioconda (https://anaconda.org/bioconda/orfipy)[email protected], [email protected] informationSupplementary data are available at https://github.com/urmi-21/orfipy

Download Full-text

Block HSIC Lasso: model-free biomarker detection for ultra-high dimensional data

10.1101/532192 ◽

2019 ◽

Author(s):

Héctor Climente-González ◽

Chloé-Agathe Azencott ◽

Samuel Kaski ◽

Makoto Yamada

Keyword(s):

Single Cell ◽

Synthetic Data ◽

Real Data ◽

Supplementary Information ◽

Rna Seq ◽

Link Type ◽

Model Free ◽

Computational Overhead ◽

Expression Microarrays ◽

And Function

AbstractMotivationFinding nonlinear relationships between biomolecules and a biological outcome is computationally expensive and statistically challenging. Existing methods have crucial drawbacks, among others lack of parsimony, non-convexity, and computational overhead. Here we present the block HSIC Lasso, a nonlinear feature selector that does not present the previous drawbacks.ResultsWe compare the block HSIC Lasso to other state-of-the-art feature selection techniques in synthetic data and real data, including experiments over three common types of genomic data: gene-expression microarrays, single-cell RNA-seq, and GWAS. In all the cases, we observe that features selected by block HSIC Lasso retain more information about the underlying biology than features of other techniques. As a proof of concept, we applied the block HSIC Lasso to a single-cell RNA-seq experiment on mouse hippocampus. We discovered that many genes linked in the past to brain development and function are involved in the biological differences between the types of neurons.AvailabilityBlock HSIC Lasso is implemented in the Python 2/3 package pyHSICLasso, available in Github (https://github.com/riken-aip/pyHSICLasso) and PyPi (https://pypi.org/project/pyHSICLasso)[email protected] informationSupplementary data are available at Bioinformatics online.

Download Full-text

Whisper 2: indel-sensitive short read mapping

10.1101/2019.12.18.881292 ◽

2019 ◽

Author(s):

Sebastian Deorowicz ◽

Adam Gudyś

Keyword(s):

Web Site ◽

Variant Calling ◽

Supplementary Information ◽

Supplementary Data ◽

Read Mapping ◽

Short Read ◽

Short Read Mapping ◽

Link Type ◽

Mapping Software

AbstractSummaryWhisper 2 is a short-read-mapping software providing superior quality of indel variant calling. Its running times place it among the fastest existing tools.Availability and Implementationhttps://github.com/refresh-bio/[email protected] informationSupplementary data are available at publisher’s Web site.

Download Full-text

HAF-SVG: Hierarchical Stochastic Video Generation with Aligned Features

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/138 ◽

2020 ◽

Author(s):

Zhihui Lin ◽

Chun Yuan ◽

Maomao Li

Keyword(s):

Prediction Accuracy ◽

Spatial Information ◽

State Of The Art ◽

Independence Assumption ◽

Main Challenge ◽

Proposed Model ◽

Generation Network ◽

Multi Level ◽

The One

Stochastic video generation methods predict diverse videos based on observed frames, where the main challenge lies in modeling the complex future uncertainty and generating realistic frames. Numerous of Recurrent-VAE-based methods have achieved state-of-the-art results. However, on the one hand, the independence assumption of the variables of approximate posterior limits the inference performance. On the other hand, although these methods adopt skip connections between encoder and decoder to utilize multi-level features, they still produce blurry generation due to the spatial misalignment between encoder and decoder features at different time steps. In this paper, we propose a hierarchical recurrent VAE with a feature aligner, which can not only relax the independence assumption in typical VAE but also use a feature aligner to enable the decoder to obtain the aligned spatial information from the last observed frames. The proposed model is named Hierarchical Stochastic Video Generation network with Aligned Features, referred to as HAF-SVG. Experiments on Moving-MNIST, BAIR, and KTH datasets demonstrate that hierarchical structure is helpful for modeling more accurate future uncertainty, and the feature aligner is beneficial to generate realistic frames. Besides, the HAF-SVG exceeds SVG on both prediction accuracy and the quality of generated frames.

Download Full-text

Comprehensive evaluation of computational cell-type quantification methods for immuno-oncology

10.1101/463828 ◽

2018 ◽

Cited By ~ 4

Author(s):

Gregor Sturm ◽

Francesca Finotello ◽

Florent Petitprez ◽

Jitao David Zhang ◽

Jan Baumbach ◽

...

Keyword(s):

Tumor Microenvironment ◽

Single Cell ◽

Computational Methods ◽

Immune Cell ◽

Comprehensive Evaluation ◽

Supplementary Information ◽

Rna Seq ◽

Cell Type ◽

Link Type ◽

Real World Datasets

AbstractMotivationThe composition and density of immune cells in the tumor microenvironment profoundly influence tumor progression and success of anti-cancer therapies. Flow cytometry, immunohistochemistry staining, or single-cell sequencing is often unavailable such that we rely on computational methods to estimate the immune-cell composition from bulk RNA-sequencing (RNA-seq) data. Various methods have been proposed recently, yet their capabilities and limitations have not been evaluated systematically. A general guideline leading the research community through cell type deconvolution is missing.ResultsWe developed a systematic approach for benchmarking such computational methods and assessed the accuracy of tools at estimating nine different immune- and stromal cells from bulk RNA-seq samples. We used a single-cell RNA-seq dataset of ∼11,000 cells from the tumor microenvironment to simulate bulk samples of known cell type proportions, and validated the results using independent, publicly available gold-standard estimates. This allowed us to analyze and condense the results of more than a hundred thousand predictions to provide an exhaustive evaluation across seven computational methods over nine cell types and ∼1,800 samples from five simulated and real-world datasets. We demonstrate that computational deconvolution performs at high accuracy for well-defined cell-type signatures and propose how fuzzy cell-type signatures can be improved. We suggest that future efforts should be dedicated to refining cell population definitions and finding reliable signatures.AvailabilityA snakemake pipeline to reproduce the benchmark is available at https://github.com/grst/immune_deconvolution_benchmark. An R package allows the community to perform integrated deconvolution using different methods (https://grst.github.io/immunedeconv)[email protected] informationSupplementary data are available at Bioinformatics online.

Download Full-text

PROSSTT: probabilistic simulation of single-cell RNA-seq data for complex differentiation processes

Bioinformatics ◽

10.1093/bioinformatics/btz078 ◽

2019 ◽

Vol 35 (18) ◽

pp. 3517-3519 ◽

Cited By ~ 12

Author(s):

Nikolaos Papadopoulos ◽

Parra R Gonzalo ◽

Johannes Söding

Keyword(s):

Single Cell ◽

Noise Model ◽

Supplementary Information ◽

Rna Seq ◽

Tree Reconstruction ◽

Probabilistic Simulation ◽

Single Cell Rna Sequencing ◽

Lineage Tree ◽

Lineage Trees

Abstract Summary Cellular lineage trees can be derived from single-cell RNA sequencing snapshots of differentiating cells. Currently, only datasets with simple topologies are available. To test and further develop tools for lineage tree reconstruction, we need test datasets with known complex topologies. PROSSTT can simulate scRNA-seq datasets for differentiation processes with lineage trees of any desired complexity, noise level, noise model and size. PROSSTT also provides scripts to quantify the quality of predicted lineage trees. Availability and implementation https://github.com/soedinglab/prosstt. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

The winning methods for predicting cellular position in the DREAM single cell transcriptomics challenge

10.1101/2020.05.09.086397 ◽

2020 ◽

Author(s):

Vu VH Pham ◽

Xiaomei Li ◽

Buu Truong ◽

Thin Nguyen ◽

Lin Liu ◽

...

Keyword(s):

Single Cell ◽

Web Application ◽

Single Cells ◽

Drosophila Embryo ◽

Supplementary Information ◽

Rna Seq ◽

Link Type ◽

Spatial Reconstruction ◽

Spatial Environment ◽

Supplementary Material

AbstractMotivationPredicting cell locations is important since with the understanding of cell locations, we may estimate the function of cells and their integration with the spatial environment. Thus, the DREAM Challenge on Single Cell Transcriptomics required participants to predict the locations of single cells in the Drosophila embryo using single cell transcriptomic data.ResultsWe have developed over 50 pipelines by combining different ways of pre-processing the RNA-seq data, selecting the genes, predicting the cell locations, and validating predicted cell locations, resulting in the winning methods for two out of three sub-challenges in the competition. In this paper, we present an R package, SCTCwhatateam, which includes all the methods we developed and the Shiny web-application to facilitate the research on single cell spatial reconstruction. All the data and the example use cases are available in the Supplementary material.AvailabilityThe scripts of the package are available at https://github.com/thanhbuu04/SCTCwhatateam and the Shiny application is available at https://github.com/pvvhoang/[email protected] informationSupplementary data are available at Briefings in Bioinformatics online.

Download Full-text