Comparing complex variants in family trios

Mapping Intimacies ◽

10.1101/253492 ◽

2018 ◽

Cited By ~ 1

Author(s):

Berke Ç. Toptaş ◽

Goran Rakocevic ◽

Péter Kómár ◽

Deniz Kural

Keyword(s):

Source Code ◽

Supplementary Information ◽

Analysis Tool ◽

Matching Problem ◽

Concordance Analysis ◽

Link Type ◽

Comparison Methods ◽

Multiple Variants

AbstractMotivation: Several tools exist to count Mendelian violations in family trios by comparing variants at the same genomic positions. This naive variant comparison however, fails to assess regions where multiple variants need to be examined together, resulting in reduced accuracy of existing Mendelian violation checking tools.Results: We introduce VBT, a trio concordance analysis tool, that identifies Mendelian violations by approximately solving the 3-way variant matching problem to resolve variant representation differences in family trios. We show that VBT outperforms previous trio comparison methods by accuracy.Availability: VBT is implemented in C++ and source code is available under GNU GPLv3 license at the following URL: https://github.com/sbg/VBT-TrioAnalysis.gitContact:[email protected] information: Supplementary materials are available at Biorxiv.

Download Full-text

GalaxyCloudRunner: enhancing scalable computing for Galaxy

10.1101/2020.05.28.121772 ◽

2020 ◽

Author(s):

N Goonasekera ◽

A Mahmoud ◽

J Chilton ◽

E Afgan

Keyword(s):

Source Code ◽

Supplementary Information ◽

Scalable Computing ◽

Link Type ◽

Cloud Providers ◽

Galaxy Server ◽

Cloud Resources

AbstractSummaryThe existence of more than 100 public Galaxy servers with service quotas is indicative of the need for an increased availability of compute resources for Galaxy to use. The GalaxyCloudRunner enables a Galaxy server to easily expand its available compute capacity by sending user jobs to cloud resources. User jobs are routed to the acquired resources based on a set of configurable rules and the resources can be dynamically acquired from any of 4 popular cloud providers (AWS, Azure, GCP, or OpenStack) in an automated fashion.Availability and implementationGalaxyCloudRunner is implemented in Python and leverages Docker containers. The source code is MIT licensed and available at https://github.com/cloudve/galaxycloudrunner. The documentation is available at http://gcr.cloudve.org/.ContactEnis Afgan ([email protected])Supplementary informationNone

Download Full-text

MODE-TASK: Large-scale protein motion tools

10.1101/217505 ◽

2017 ◽

Author(s):

Caroline Ross ◽

Bilal Nizami ◽

Michael Glenister ◽

Olivier Sheik Amamuddy ◽

Ali Rana Atilgan ◽

...

Keyword(s):

Large Scale ◽

Protein Complexes ◽

Normal Mode Analysis ◽

Md Simulations ◽

Supplementary Information ◽

Mode Analysis ◽

Analysis Tool ◽

Link Type ◽

Supplementary Material ◽

Anisotropic Network

AbstractSummaryMODE-TASK, a novel software suite, comprises Principle Component Analysis, Multidimensional Scaling, and t-Distributed Stochastic Neighbor Embedding techniques using molecular dynamics trajectories. MODE-TASK also includes a Normal Mode Analysis tool based on Anisotropic Network Model so as to provide a variety of ways to analyse and compare large-scale motions of protein complexes for which long MD simulations are prohibitive.Availability and ImplementationMODE-TASK has been open-sourced, and is available for download from https://github.com/RUBi-ZA/MODE-TASK, implemented in Python and C++.Supplementary informationDocumentation available at http://mode-task.readthedocs.io.

Download Full-text

PhyloFold: Precise and Swift Prediction of RNA Secondary Structures to Incorporate Phylogeny among Homologs

10.1101/2020.03.05.975797 ◽

2020 ◽

Author(s):

Masaki Tagashira

Keyword(s):

Secondary Structure ◽

Rna Secondary Structure ◽

Prediction Accuracy ◽

Structural Alignment ◽

Source Code ◽

Secondary Structures ◽

Supplementary Information ◽

Supplementary Data ◽

Link Type ◽

Structural Alignments

AbstractMotivationThe simultaneous consideration of sequence alignment and RNA secondary structure, or structural alignment, is known to help predict more accurate secondary structures of homologs. However, the consideration is heavy and can be done only roughly to decompose structural alignments.ResultsThe PhyloFold method, which predicts secondary structures of homologs considering likely pairwise structural alignments, was developed in this study. The method shows the best prediction accuracy while demanding comparable running time compared to conventional methods.AvailabilityThe source code of the programs implemented in this study is available on “https://github.com/heartsh/phylofold” and “https://github.com/heartsh/phyloalifold“.Contact“[email protected]”.Supplementary informationSupplementary data are available.

Download Full-text

NanoPack: visualizing and processing long read sequencing data

10.1101/237180 ◽

2017 ◽

Cited By ~ 2

Author(s):

Wouter De Coster ◽

Svenn D’Hert ◽

Darrin T. Schultz ◽

Marc Cruts ◽

Christine Van Broeckhoven

Keyword(s):

Web Service ◽

Graphical User Interface ◽

Source Code ◽

Supplementary Information ◽

Command Line ◽

Sequencing Data ◽

Link Type ◽

Oxford Nanopore ◽

Long Read ◽

Oxford Nanopore Technologies

AbstractSummary: Here we describe NanoPack, a set of tools developed for visualization and processing of long read sequencing data from Oxford Nanopore Technologies and Pacific Biosciences.Availability and Implementation: The NanoPack tools are written in Python3 and released under the GNU GPL3.0 Licence. The source code can be found at https://github.com/wdecoster/nanopack, together with links to separate scripts and their documentation. The scripts are compatible with Linux, Mac OS and the MS Windows 10 subsystem for linux and are available as a graphical user interface, a web service at http://nanoplot.bioinf.be and command line tools.Contact:[email protected] information: Supplementary tables and figures are available at Bioinformatics online.

Download Full-text

Rapid screening and detection of inter-type viral recombinants using phylo-k-mers

10.1101/2020.06.22.161422 ◽

2020 ◽

Author(s):

Guillaume E. Scholz ◽

Benjamin Linard ◽

Nikolai Romashchenko ◽

Eric Rivals ◽

Fabio Pardi

Keyword(s):

Source Code ◽

Rapid Screening ◽

Supplementary Information ◽

Evolutionary Significance ◽

Large Database ◽

Recombinant Viruses ◽

Phylogenetic Placement ◽

Link Type ◽

Whole Genomes ◽

Complex Models

AbstractMotivationNovel recombinant viruses may have important medical and evolutionary significance, as they sometimes display new traits not present in the parental strains. This is particularly concerning when the new viruses combine fragments coming from phylogenetically-distinct viral types. Here, we consider the task of screening large collections of sequences for such novel recombinants. A number of methods already exist for this task. However, these methods rely on complex models and heavy computations that are not always practical for a quick scan of a large number of sequences.ResultsWe have developed SHERPAS, a new program to detect novel recombinants and provide a first estimate of their parental composition. Our approach is based on the precomputation of a large database of “phylogenetically-informed k-mers”, an idea recently introduced in the context of phylogenetic placement in metagenomics. Our experiments show that SHERPAS is hundreds to thousands of times faster than existing software, and enables the analysis of thousands of whole genomes, or long sequencing reads, within minutes or seconds, and with limited loss of accuracy.Availability and ImplementationThe source code is freely available for download at https://github.com/phylo42/[email protected], [email protected] informationSupplementary Materials are available online.

Download Full-text

EpyNN: Educational python for Neural Networks

10.1101/2021.12.06.470764 ◽

2021 ◽

Author(s):

Florian Malard ◽

Laura Danner ◽

Emilie Rouzies ◽

Jesse G Meyer ◽

Ewen Lescop ◽

...

Keyword(s):

Neural Networks ◽

Graphical Representation ◽

Source Code ◽

Application Programming Interface ◽

Supplementary Information ◽

Practical Implementation ◽

Data Preparation ◽

Link Type ◽

Application Programming ◽

Programming Interface

AbstractSummaryArtificial Neural Networks (ANNs) have achieved unequaled performance for numerous problems in many areas of Science, Business, Public Policy, and more. While experts are familiar with performance-oriented software and underlying theory, ANNs are difficult to comprehend for non-experts because it requires skills in programming, background in mathematics and knowledge of terminology and concepts. In this work, we release EpyNN, an educational python resource meant for a public willing to understand key concepts and practical implementation of scalable ANN architectures from concise, homogeneous and idiomatic source code. EpyNN contains an educational Application Programming Interface (API), educational workflows from data preparation to ANN training and a documentation website setting side-by-side code, mathematics, graphical representation and text to facilitate learning and provide teaching material. Overall, EpyNN provides basics for python-fluent individuals who wish to learn, teach or develop from scratch.AvailabilityEpyNN documentation is available at https://epynn.net and repository can be retrieved from https://github.com/synthaze/epynn.ContactStéphanie Olivier-Van-Stichelen, [email protected] InformationSupplementary files and listings.

Download Full-text

Classifying cells with Scasat - a tool to analyse single-cell ATAC-seq

10.1101/227397 ◽

2017 ◽

Cited By ~ 1

Author(s):

Syed Murtuza Baker ◽

Connor Rogerson ◽

Andrew Hayes ◽

Andrew D. Sharrocks ◽

Magnus Rattray

Keyword(s):

Single Cell ◽

Dna Sequences ◽

Mammalian Cells ◽

Cell Types ◽

Regulatory Elements ◽

Cellular Heterogeneity ◽

Supplementary Information ◽

Open Chromatin ◽

Analysis Tool ◽

Link Type

AbstractMotivationThe assay for transposase-accessible chromatin using sequencing (ATAC-seq) reveals the landscape and principles of DNA regulatory mechanisms by identifying the accessible genome of mammalian cells. When done at single-cell resolution, it provides an insight into the cell-to-cell variability that emerges from identical DNA sequences by identifying the variability in the genomic location of open chromatin sites in each of the cells. Processing of single-cell ATAC-seq requires a number of steps and a simple pipeline to processes and analyse single-cell ATAC-seq is not yet available.ResultsThis paper presents ScAsAT (single-cell ATAC-seq analysis tool), a complete pipeline to process scATAC-seq data with simple steps. The pipeline is developed in a Jupyter notebook environment that holds the executable code along with the necessary description and results. For the initial sequence processing steps, the pipeline uses a number of well-known tools which it executes from a python environment for each of the fastq files. While functions for the data analysis part are mostly written in R, it is robust, flexible, interactive and easy to extend. The pipeline was applied to a single-cell ATAC-seq dataset in order to identify different cell-types from a complex cell mixture. The results from Scasat showed that open chromatin location corresponding to potential regulatory elements can account for cellular heterogeneity and can identify regulatory regions that separates cells from a complex population.AvailabilityThe jupyter notebook with the complete pipeline applied to the dataset published with this paper are publicly available on the Github (https://github.com/ManchesterBioinference/Scasat). An additional notebook is also provided for analysis of a publicly available dataset. The fastq files are submitted at ArrayExpress database at EMBL-EBI (www.ebi.ac.uk/arrayexpress) under accession number [email protected] and [email protected] informationSupplementary data are available at bioRxiv online.

Download Full-text

Inference of CRISPR Edits from Sanger Trace Data

10.1101/251082 ◽

2018 ◽

Cited By ~ 93

Author(s):

Tim Hsiau ◽

David Conant ◽

Nicholas Rossi ◽

Travis Maures ◽

Kelsey Waite ◽

...

Keyword(s):

Source Code ◽

Potential Outcomes ◽

Analysis Tool ◽

Current Analysis ◽

Robust Analysis ◽

Homology Directed Repair ◽

Guide Rnas ◽

Link Type ◽

Generation Sequencing ◽

Batch Analysis

AbstractEfficient precision genome editing requires a quick, quantitative, and inexpensive assay of editing outcomes. Here we present ICE (Inference of CRISPR Edits), which enables robust analysis of CRISPR edits using Sanger data. ICE proposes potential outcomes for editing with guide RNAs (gRNAs) and then determines which are supported by the data via regression. Additionally, we develop a score called ICE-D (Discordance) that can provide information on large or unexpected edits. We empirically confirm through over 1,800 edits that the ICE algorithm is robust, reproducible, and can analyze CRISPR experiments within days after transfection. We also confirm that ICE strongly correlates with next-generation sequencing of amplicons (Amp-Seq). The ICE tool is free to use and offers several improvements over current analysis tools. For instance, ICE can analyze individual experiments as well as multiple experiments simultaneously (batch analysis). ICE can also detect a wider variety of outcomes, including multi-guide edits (multiple gRNAs per target) and edits resulting from homology-directed repair (HDR), such as knock-ins and base edits. ICE is a reliable analysis tool that can significantly expedite CRISPR editing workflows. It is available online at ice.synthego.com, and the source code is at github.com/synthego-open/ice

Download Full-text

deSPI: efficient classification of metagenomic reads with lightweight de Bruijn graph-based reference indexing

10.1101/080200 ◽

2016 ◽

Cited By ~ 1

Author(s):

Dengfeng Guan ◽

Bo Liu ◽

Yadong Wang

Keyword(s):

Source Code ◽

Classification Method ◽

Supplementary Information ◽

De Bruijn Graph ◽

Supplementary Data ◽

Link Type ◽

Memory Footprint ◽

Supplementary Material ◽

De Bruijn

AbstractSummaryIn metagenomic studies, fast and effective tools are on wide demand to implement taxonomy classification for upto billions of reads. Herein, we propose deSPI, a novel read classification method that classifies reads by recognizing and analyzing the matches between reads and reference with de Bruijn graph-based lightweight reference indexing. deSPI has faster speed with relatively small memory footprint, meanwhile, it can also achieve higher or similar sensitivity and accuracy.Availabilitythe C++ source code of deSPI is available at https://github.com/hitbc/[email protected] informationSupplementary data are available at Bioinformatics online.

Download Full-text

KmerFinderJS: A client-server method for fast species typing of bacteria over slow Internet connections

10.1101/145284 ◽

2017 ◽

Cited By ~ 2

Author(s):

Jose Luis Bellod Cineros ◽

Ole Lund

Keyword(s):

Bacterial Species ◽

Source Code ◽

Supplementary Information ◽

The Internet ◽

Supplementary Data ◽

Client Server ◽

Link Type ◽

Genome Data ◽

Speed Up ◽

Internet Connections

AbstractMotivationKmerFinder is a program based on K-mer statistics for identifying bacterial species in whole genome data, that as a web server that have been used more than 10.000 times. Kmer-FinderJS is a development of the KmerFinder that benefits from the downsampling of data using a prefix filtering used by KmerFinder, to minimize amount of data that needs to be transferred between the client and the server.ResultsKmerFinderJS replaces the python based hash structure for holding the databases with a true Key-value database. These improvements are shown to lead to a many-fold speed up of species identification with the internet transfer speeds that are realistic to expect today. It is also shown that the method can find the true content of an artificial metagenomic cocktail with no false positives.AvailabilityThe method is freely available at https://cge.cbs.dtu.dk/services/KmerFinderJS/ and as a source code at https://bitbucket.org/genomicepidemiology/[email protected] informationSupplementary data are available at biorxiv online.

Download Full-text