scholarly journals Style transfer with variational autoencoders is a promising approach to RNA-Seq data harmonization and analysis

2019 ◽  
Author(s):  
N. Russkikh ◽  
D. Antonets ◽  
D. Shtokalo ◽  
A. Makarov ◽  
Y. Vyatkin ◽  
...  

AbstractMotivationThe transcriptomic data is being frequently used in the research of biomarker genes of different diseases and biological states. The most common tasks there are data harmonization and treatment outcome prediction. Both of them can be addressed via the style transfer approach. Either technical factors or any biological details about the samples which we would like to control (gender, biological state, treatment etc.) can be used as style components.ResultsThe proposed style transfer solution is based on Conditional Variational Autoencoders, Y-Autoencoders and adversarial feature decomposition. In order to quantitatively measure the quality of the style transfer, neural network classifiers which predict the style and semantics after training on real expression were used. Comparison with several existing style-transfer based approaches shows that proposed model has the highest style prediction accuracy on all considered datasets while having comparable or the best semantics prediction accuracy.Availabilityhttps://github.com/NRshka/[email protected] informationFigShare.com (https://dx.doi.org/10.6084/m9.figshare.9925115)

2020 ◽  
Vol 36 (20) ◽  
pp. 5076-5085
Author(s):  
Nikolai Russkikh ◽  
Denis Antonets ◽  
Dmitry Shtokalo ◽  
Alexander Makarov ◽  
Yuri Vyatkin ◽  
...  

Abstract Motivation The transcriptomic data are being frequently used in the research of biomarker genes of different diseases and biological states. The most common tasks there are the data harmonization and treatment outcome prediction. Both of them can be addressed via the style transfer approach. Either technical factors or any biological details about the samples which we would like to control (gender, biological state, treatment, etc.) can be used as style components. Results The proposed style transfer solution is based on Conditional Variational Autoencoders, Y-Autoencoders and adversarial feature decomposition. To quantitatively measure the quality of the style transfer, neural network classifiers which predict the style and semantics after training on real expression were used. Comparison with several existing style-transfer based approaches shows that proposed model has the highest style prediction accuracy on all considered datasets while having comparable or the best semantics prediction accuracy. Availability and implementation https://github.com/NRshka/stvae-source. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Author(s):  
Masaki Tagashira

AbstractMotivationThe simultaneous consideration of sequence alignment and RNA secondary structure, or structural alignment, is known to help predict more accurate secondary structures of homologs. However, the consideration is heavy and can be done only roughly to decompose structural alignments.ResultsThe PhyloFold method, which predicts secondary structures of homologs considering likely pairwise structural alignments, was developed in this study. The method shows the best prediction accuracy while demanding comparable running time compared to conventional methods.AvailabilityThe source code of the programs implemented in this study is available on “https://github.com/heartsh/phylofold” and “https://github.com/heartsh/phyloalifold“.Contact“[email protected]”.Supplementary informationSupplementary data are available.


2017 ◽  
Author(s):  
Andrew Palmer ◽  
Prasad Phapale ◽  
Dominik Fay ◽  
Theodore Alexandrov

AbstractMotivationIdentification from metabolomics mass spectrometry experiments requires comparison of fragmentation spectra from experimental samples to spectra from analytical standards. As the quality of identification depends directly on the quality of the reference spectra, manual curation is routine during the selection of reference spectra to include in a spectral library. Whilst building our own in-house spectral library we realised that there is currently no vendor neutral open access tool for for facilitating manual curation of spectra from raw LC-MS data into a custom spectral library.ResultsWe developed a web application curatr for the rapid generation of high quality mass spectral fragmentation libraries for liquid-chromatography mass spectrometry analysis. Curatr handles datasets from single or multiplexed standards, automatically extracting chromatographic profiles and potential fragmentation spectra for multiple adducts. These are presented through an intuitive interface for manual curation before being documented in a custom spectral library. Searchable molecular information and the providence of each standard is stored along with metadata on the experimental protocol. Curatr support the export of spectral libraries in several standard formats for easy use with third party software or submission to community databases, maximising the return on investment for these costly measurements. We demonstrate the use of curatr to generate the EMBL Metabolomics Core Facility spectral library which is publicly available at http://curatr.mcf.embl.de.AvailabilityThe source code is freely available at http://github.com/alexandrovteam/curatr/ along with example data.Supplementary informationA step-by step user manual is available in the supplementary information


2020 ◽  
Author(s):  
Urminder Singh ◽  
Eve Syrkin Wurtele

SummarySearching for ORFs in transcripts is a critical step prior to annotating coding regions in newly-sequenced genomes and to search for alternative reading frames within known genes. With the tremendous increase in RNA-Seq data, faster tools are needed to handle large input datasets. These tools should be versatile enough to fine-tune search criteria and allow efficient downstream analysis. Here we present a new python based tool, orfipy, which allows the user to flexibly search for open reading frames in fasta sequences. The search is rapid and is fully customizable, with a choice of Fasta and BED output formats.Availability and implementationorfipy is implemented in python and is compatible with python v3.6 and higher. Source code: https://github.com/urmi-21/orfipy. Installation: from the source, or via PyPi (https://pypi.org/project/orfipy) or bioconda (https://anaconda.org/bioconda/orfipy)[email protected], [email protected] informationSupplementary data are available at https://github.com/urmi-21/orfipy


2019 ◽  
Author(s):  
Héctor Climente-González ◽  
Chloé-Agathe Azencott ◽  
Samuel Kaski ◽  
Makoto Yamada

AbstractMotivationFinding nonlinear relationships between biomolecules and a biological outcome is computationally expensive and statistically challenging. Existing methods have crucial drawbacks, among others lack of parsimony, non-convexity, and computational overhead. Here we present the block HSIC Lasso, a nonlinear feature selector that does not present the previous drawbacks.ResultsWe compare the block HSIC Lasso to other state-of-the-art feature selection techniques in synthetic data and real data, including experiments over three common types of genomic data: gene-expression microarrays, single-cell RNA-seq, and GWAS. In all the cases, we observe that features selected by block HSIC Lasso retain more information about the underlying biology than features of other techniques. As a proof of concept, we applied the block HSIC Lasso to a single-cell RNA-seq experiment on mouse hippocampus. We discovered that many genes linked in the past to brain development and function are involved in the biological differences between the types of neurons.AvailabilityBlock HSIC Lasso is implemented in the Python 2/3 package pyHSICLasso, available in Github (https://github.com/riken-aip/pyHSICLasso) and PyPi (https://pypi.org/project/pyHSICLasso)[email protected] informationSupplementary data are available at Bioinformatics online.


2019 ◽  
Author(s):  
Sebastian Deorowicz ◽  
Adam Gudyś

AbstractSummaryWhisper 2 is a short-read-mapping software providing superior quality of indel variant calling. Its running times place it among the fastest existing tools.Availability and Implementationhttps://github.com/refresh-bio/[email protected] informationSupplementary data are available at publisher’s Web site.


Author(s):  
Zhihui Lin ◽  
Chun Yuan ◽  
Maomao Li

Stochastic video generation methods predict diverse videos based on observed frames, where the main challenge lies in modeling the complex future uncertainty and generating realistic frames. Numerous of Recurrent-VAE-based methods have achieved state-of-the-art results. However, on the one hand, the independence assumption of the variables of approximate posterior limits the inference performance. On the other hand, although these methods adopt skip connections between encoder and decoder to utilize multi-level features, they still produce blurry generation due to the spatial misalignment between encoder and decoder features at different time steps. In this paper, we propose a hierarchical recurrent VAE with a feature aligner, which can not only relax the independence assumption in typical VAE but also use a feature aligner to enable the decoder to obtain the aligned spatial information from the last observed frames. The proposed model is named Hierarchical Stochastic Video Generation network with Aligned Features, referred to as HAF-SVG. Experiments on Moving-MNIST, BAIR, and KTH datasets demonstrate that hierarchical structure is helpful for modeling more accurate future uncertainty, and the feature aligner is beneficial to generate realistic frames. Besides, the HAF-SVG exceeds SVG on both prediction accuracy and the quality of generated frames.


2018 ◽  
Author(s):  
Gregor Sturm ◽  
Francesca Finotello ◽  
Florent Petitprez ◽  
Jitao David Zhang ◽  
Jan Baumbach ◽  
...  

AbstractMotivationThe composition and density of immune cells in the tumor microenvironment profoundly influence tumor progression and success of anti-cancer therapies. Flow cytometry, immunohistochemistry staining, or single-cell sequencing is often unavailable such that we rely on computational methods to estimate the immune-cell composition from bulk RNA-sequencing (RNA-seq) data. Various methods have been proposed recently, yet their capabilities and limitations have not been evaluated systematically. A general guideline leading the research community through cell type deconvolution is missing.ResultsWe developed a systematic approach for benchmarking such computational methods and assessed the accuracy of tools at estimating nine different immune- and stromal cells from bulk RNA-seq samples. We used a single-cell RNA-seq dataset of ∼11,000 cells from the tumor microenvironment to simulate bulk samples of known cell type proportions, and validated the results using independent, publicly available gold-standard estimates. This allowed us to analyze and condense the results of more than a hundred thousand predictions to provide an exhaustive evaluation across seven computational methods over nine cell types and ∼1,800 samples from five simulated and real-world datasets. We demonstrate that computational deconvolution performs at high accuracy for well-defined cell-type signatures and propose how fuzzy cell-type signatures can be improved. We suggest that future efforts should be dedicated to refining cell population definitions and finding reliable signatures.AvailabilityA snakemake pipeline to reproduce the benchmark is available at https://github.com/grst/immune_deconvolution_benchmark. An R package allows the community to perform integrated deconvolution using different methods (https://grst.github.io/immunedeconv)[email protected] informationSupplementary data are available at Bioinformatics online.


2019 ◽  
Vol 35 (18) ◽  
pp. 3517-3519 ◽  
Author(s):  
Nikolaos Papadopoulos ◽  
Parra R Gonzalo ◽  
Johannes Söding

Abstract Summary Cellular lineage trees can be derived from single-cell RNA sequencing snapshots of differentiating cells. Currently, only datasets with simple topologies are available. To test and further develop tools for lineage tree reconstruction, we need test datasets with known complex topologies. PROSSTT can simulate scRNA-seq datasets for differentiation processes with lineage trees of any desired complexity, noise level, noise model and size. PROSSTT also provides scripts to quantify the quality of predicted lineage trees. Availability and implementation https://github.com/soedinglab/prosstt. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Author(s):  
Vu VH Pham ◽  
Xiaomei Li ◽  
Buu Truong ◽  
Thin Nguyen ◽  
Lin Liu ◽  
...  

AbstractMotivationPredicting cell locations is important since with the understanding of cell locations, we may estimate the function of cells and their integration with the spatial environment. Thus, the DREAM Challenge on Single Cell Transcriptomics required participants to predict the locations of single cells in the Drosophila embryo using single cell transcriptomic data.ResultsWe have developed over 50 pipelines by combining different ways of pre-processing the RNA-seq data, selecting the genes, predicting the cell locations, and validating predicted cell locations, resulting in the winning methods for two out of three sub-challenges in the competition. In this paper, we present an R package, SCTCwhatateam, which includes all the methods we developed and the Shiny web-application to facilitate the research on single cell spatial reconstruction. All the data and the example use cases are available in the Supplementary material.AvailabilityThe scripts of the package are available at https://github.com/thanhbuu04/SCTCwhatateam and the Shiny application is available at https://github.com/pvvhoang/[email protected] informationSupplementary data are available at Briefings in Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document