scholarly journals SNaReSim: Synthetic Nanopore Read Simulator

2017 ◽  
Author(s):  
Philippe Faucon ◽  
Parithi Balachandran ◽  
Sharon Crook

AbstractNanopores represent the first commercial technology in decades to present a significantly different technique for DNA sequencing, and one of the first technologies to propose direct RNA sequencing. Despite significant differences with previous sequencing technologies, read simulators to date make similar assumptions with respect to error profiles and their analysis. This is a great disservice to both nanopore sequencing and to computer scientists who seek to optimize their tools for the platform. Previous works have discussed the occurrence of some k-mer bias, but this discussion has been focused on homopolymers, leaving unanswered the question of whether k-mer bias exists over general k-mers, how it occurs, and what can be done to reduce the effects. In this work, we demonstrate that current read simulators fail to accurately represent k-mer error distributions, We explore the sources of k-mer bias in nanopore basecalls, and we present a model for predicting k-mers that are difficult to identify. We also propose a new SNaReSim, a new state-of-the-art simulator, and demonstrate that it provides higher accuracy with respect to 6-mer accuracy biases.

2021 ◽  
Author(s):  
Qingxi Meng ◽  
Shubham Chandak ◽  
Yifan Zhu ◽  
Tsachy Weissman

Motivation: The amount of data produced by genome sequencing experiments has been growing rapidly over the past several years, making compression important for efficient storage, transfer and analysis of the data. In recent years, nanopore sequencing technologies have seen increasing adoption since they are portable, real-time and provide long reads. However, there has been limited progress on compression of nanopore sequencing reads obtained in FASTQ files. Previous work ENANO focuses mostly on quality score compression and does not achieve significant gains for the compression of read sequences over general-purpose compressors. RENANO achieves significantly better compression for read sequences but is limited to aligned data with a reference available. Results: We present NanoSpring, a reference-free compressor for nanopore sequencing reads, relying on an approximate assembly approach. We evaluate NanoSpring on a variety of datasets including bacterial, metagenomic, plant, animal, and human whole genome data. For recently basecalled high quality nanopore datasets, NanoSpring achieves close to 3x improvement in compression over state-of-the-art reference-free compressors. The computational requirements of NanoSpring are practical, although it uses more time and memory during compression than previous tools to achieve the compression gains. Availability: NanoSpring is available on GitHub at https://github.com/qm2/NanoSpring.


BMC Genomics ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Ratanond Koonchanok ◽  
Swapna Vidhur Daulatabad ◽  
Quoseena Mir ◽  
Khairi Reda ◽  
Sarath Chandra Janga

Abstract Background Direct-sequencing technologies, such as Oxford Nanopore’s, are delivering long RNA reads with great efficacy and convenience. These technologies afford an ability to detect post-transcriptional modifications at a single-molecule resolution, promising new insights into the functional roles of RNA. However, realizing this potential requires new tools to analyze and explore this type of data. Result Here, we present Sequoia, a visual analytics tool that allows users to interactively explore nanopore sequences. Sequoia combines a Python-based backend with a multi-view visualization interface, enabling users to import raw nanopore sequencing data in a Fast5 format, cluster sequences based on electric-current similarities, and drill-down onto signals to identify properties of interest. We demonstrate the application of Sequoia by generating and analyzing ~ 500k reads from direct RNA sequencing data of human HeLa cell line. We focus on comparing signal features from m6A and m5C RNA modifications as the first step towards building automated classifiers. We show how, through iterative visual exploration and tuning of dimensionality reduction parameters, we can separate modified RNA sequences from their unmodified counterparts. We also document new, qualitative signal signatures that characterize these modifications from otherwise normal RNA bases, which we were able to discover from the visualization. Conclusions Sequoia’s interactive features complement existing computational approaches in nanopore-based RNA workflows. The insights gleaned through visual analysis should help users in developing rationales, hypotheses, and insights into the dynamic nature of RNA. Sequoia is available at https://github.com/dnonatar/Sequoia.


2012 ◽  
Vol 2012 ◽  
pp. 1-18 ◽  
Author(s):  
Silvio Garofalo ◽  
Marisa Cornacchione ◽  
Alfonso Di Costanzo

The introduction of DNA microarrays and DNA sequencing technologies in medical genetics and diagnostics has been a challenge that has significantly transformed medical practice and patient management. Because of the great advancements in molecular genetics and the development of simple laboratory technology to identify the mutations in the causative genes, also the diagnostic approach to epilepsy has significantly changed. However, the clinical use of molecular cytogenetics and high-throughput DNA sequencing technologies, which are able to test an entire genome for genetic variants that are associated with the disease, is preparing a further revolution in the near future. Molecular Karyotype and Next-Generation Sequencing have the potential to identify causative genes or loci also in sporadic or non-familial epilepsy cases and may well represent the transition from a genetic to a genomic approach to epilepsy.


2012 ◽  
pp. 68-95
Author(s):  
Marco Seri ◽  
Claudio Graziano ◽  
Daniela Turchetti ◽  
Juri Monducci

The pace of discovery in the field of human genetics has increased exponentially in the last 30 years. We have witnessed the completion of the Human Genome Project, the identification of hundreds of disease-causing genes, and the dawn of genomic medicine (clinical care based on genomic information). Reduction of DNA sequencing costs, thanks to the so-called "next generation sequencing" technologies, is driving a shift towards the era of "personal genomes", but scientific as well as ethical challenges ahead are countless. We provide an overview on the classification of genetic tests, on informed consent procedures in the context of genetic counseling, and on specific ethical issues raised by the implementation of new DNA sequencing technologies.


PeerJ ◽  
2015 ◽  
Vol 3 ◽  
pp. e1419 ◽  
Author(s):  
Jose E. Kroll ◽  
Jihoon Kim ◽  
Lucila Ohno-Machado ◽  
Sandro J. de Souza

Motivation.Alternative splicing events (ASEs) are prevalent in the transcriptome of eukaryotic species and are known to influence many biological phenomena. The identification and quantification of these events are crucial for a better understanding of biological processes. Next-generation DNA sequencing technologies have allowed deep characterization of transcriptomes and made it possible to address these issues. ASEs analysis, however, represents a challenging task especially when many different samples need to be compared. Some popular tools for the analysis of ASEs are known to report thousands of events without annotations and/or graphical representations. A new tool for the identification and visualization of ASEs is here described, which can be used by biologists without a solid bioinformatics background.Results.A software suite namedSplicing Expresswas created to perform ASEs analysis from transcriptome sequencing data derived from next-generation DNA sequencing platforms. Its major goal is to serve the needs of biomedical researchers who do not have bioinformatics skills.Splicing Expressperforms automatic annotation of transcriptome data (GTF files) using gene coordinates available from the UCSC genome browser and allows the analysis of data from all available species. The identification of ASEs is done by a known algorithm previously implemented in another tool namedSplooce. As a final result,Splicing Expresscreates a set of HTML files composed of graphics and tables designed to describe the expression profile of ASEs among all analyzed samples. By using RNA-Seq data from the Illumina Human Body Map and the Rat Body Map, we show thatSplicing Expressis able to perform all tasks in a straightforward way, identifying well-known specific events.Availability and Implementation.Splicing Expressis written in Perl and is suitable to run only in UNIX-like systems. More details can be found at:http://www.bioinformatics-brazil.org/splicingexpress.


2021 ◽  
Vol 39 (15_suppl) ◽  
pp. e21008-e21008
Author(s):  
Henry G. Kaplan ◽  
Alex Barrett ◽  
Jiaxin Niu ◽  
Somasundaram Subramaniam ◽  
Maria Matsangou

e21008 Background: NUT midline carcinoma (NMC) is an aggressive squamous cell carcinoma molecularly defined by a chromosomal rearrangement of nuclear protein in testis (NUTM1) with bromodomain-containing protein 3 or 4 (BRD3/4). While NMCs are characterized by this rare canonical gene rearrangement little is known about the transcriptome and proteosome of this rare disease. As such, we set out to comprehensively characterize five NMC cases in which we attained targeted DNA sequencing, full-transcriptome RNA sequencing, and targeted proteomics. We further examine and integrate these results in order to better understand the relationship between gene expression and protein abundance within the context of NMC. Methods: All cases were analyzed for genomic and transcriptomic alterations against a custom panel via the Tempus xT tissue biopsy assay (DNA sequencing of 648 genes in tumor and matched normal samples at 500x depth and full-transcriptome RNA sequencing) for germline and/or somatic mutations. The xT assay detects single nucleotide variants, specific insertion/deletions, amplifications and gene fusions, as well as tumor mutational burden (TMB) and microsatellite instability (MSI) status. Proteomic data were obtained utilizing digital spatial profiling through Nanostring immune, MAPK and PI3/AKT, and pan tumor nCounter GeoMix panels. Results: Clinical characteristics, histology, and genomic/proteomic alterations for 5 NMC cases are presented. Cases were defined by pathological assessment and the identification of the canonical NUTM1 fusion, further broken down by fusion partner with three patients having NUTM1-BRD4 fusions, one NUT-BRD3, and one NUT-ZMYND8. TMBs ranged for 0.8-.6 mutations/megabases (n=5). All patients were MSI stable (5/5). Of three patients with available PD-L1 IHC result, one had elevated PD-L1 tumor staining at 70%. Results will be presented from full-transcriptome RNA expression analysis indicating overexpression of BRAF, MYC, mTOR, and EGFR, among others. Targeted proteomics were performed to assess relative abundance at the protein level (results to be presented). Clinical follow up for the five patients revealed that two have survived beyond 7 months. A lung primary patient treated with surgical resection and post op radiation (XRT) is NED at 63 months. A sinus primary patient is NED at 16 months after a partial response (PR) to taxotere/5FU/Cisplatin followed by resection and XRT/cis platin. One patient had a brief PR from ifosphamide/etoposide/vorinostat. One patient's tumor grew through XRT/cisplatin. Conclusions: Multi-omic analysis has the potential to further elucidate the mechanisms of tumor growth in NMC and identify new targets for the treatment of this aggressive and poor prognosis disease.


Author(s):  
Kexin Huang ◽  
Tianfan Fu ◽  
Lucas M Glass ◽  
Marinka Zitnik ◽  
Cao Xiao ◽  
...  

Abstract Summary Accurate prediction of drug–target interactions (DTI) is crucial for drug discovery. Recently, deep learning (DL) models for show promising performance for DTI prediction. However, these models can be difficult to use for both computer scientists entering the biomedical field and bioinformaticians with limited DL experience. We present DeepPurpose, a comprehensive and easy-to-use DL library for DTI prediction. DeepPurpose supports training of customized DTI prediction models by implementing 15 compound and protein encoders and over 50 neural architectures, along with providing many other useful features. We demonstrate state-of-the-art performance of DeepPurpose on several benchmark datasets. Availability and implementation https://github.com/kexinhuang12345/DeepPurpose. Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.


2018 ◽  
Vol 29 (08) ◽  
pp. 1249-1255
Author(s):  
Kamil Salikhov

Modern DNA sequencing technologies generate prodigious volumes of sequence data consisting of short DNA fragments (reads). Storing and transferring this data is often challenging. With this motivation, several specialized compression methods have been developed. In this paper, we present an improvement of the lossless reference-free compression algorithm, suggested by Rozov et al., based on the technique of cascading Bloom filters. Through computational experiments on real data, we demonstrate that our method results in a significant associated memory reduction in practice.


Sign in / Sign up

Export Citation Format

Share Document