ARSDA: A new approach for storing, transmitting and analyzing high-throughput sequencing data

Mapping Intimacies ◽

10.1101/114470 ◽

2017 ◽

Cited By ~ 6

Author(s):

Xuhua Xia

Keyword(s):

Bacillus Subtilis ◽

Data Analysis ◽

High Throughput ◽

High Throughput Sequencing ◽

Sequencing Data ◽

Plain Text ◽

Link Type ◽

High Throughput Sequencing Data ◽

Fastq Format ◽

Single Entry

ABSTRACTTwo major stumbling blocks exist in high-throughput sequencing (HTS) data analysis. The first is the sheer file size typically in gigabytes when uncompressed, causing problems in storage, transmission and analysis. However, these files do not need to be so large and can be reduced without loss of information. Each HTS file, either in compressed .SRA or plain text .fastq format, contains numerous identical reads stored as separate entries. For example, among 44603541 forward reads in the SRR4011234.sra file (from aBacillus subtilistranscriptomic study) deposited at NCBI’s SRA database, one read has 497027 identical copies. Instead of storing them as separate entries, one can and should store them as a single entry with the SeqID_NumCopy format (which I dub as FASTA+ format). The second is the proper allocation reads that map equally well to paralogous genes. I illustrate in detail a new method for such allocation. I have developed ARSDA software that implement these new approaches. A number of HTS files for model species are in the process of being processed and deposited athttp://coevol.rdc.uottawa.cato demonstrate that this approach not only saves a huge amount of storage space and transmission bandwidth, but also dramatically reduces time in downstream data analysis. Instead of matching the 497027 identical reads separately against theBacillus subtilisgenome, one only needs to match it once. ARSDA includes functions to take advantage of HTS data in the new sequence format for downstream data analysis such as gene expression characterization. ARSDA can be run on Windows, Linux and Macintosh computers and is freely available athttp://dambe.bio.uottawa.ca/ARSDA/ARSDA.aspx.

Download Full-text

Improvements and impacts of GRCh38 human reference on high throughput sequencing data analysis

Genomics ◽

10.1016/j.ygeno.2017.01.005 ◽

2017 ◽

Vol 109 (2) ◽

pp. 83-90 ◽

Cited By ~ 44

Author(s):

Yan Guo ◽

Yulin Dai ◽

Hui Yu ◽

Shilin Zhao ◽

David C. Samuels ◽

...

Keyword(s):

Data Analysis ◽

High Throughput ◽

High Throughput Sequencing ◽

Sequencing Data ◽

High Throughput Sequencing Data ◽

Sequencing Data Analysis

Download Full-text

xIP-seq Platform: An Integrative Framework for High-Throughput Sequencing Data Analysis

2009 Ohio Collaborative Conference on Bioinformatics ◽

10.1109/occbio.2009.20 ◽

2009 ◽

Cited By ~ 2

Author(s):

Xin Wang ◽

Mingxiang Teng ◽

Guohua Wang ◽

Yuming Zhao ◽

Xu Han ◽

...

Keyword(s):

Data Analysis ◽

High Throughput ◽

High Throughput Sequencing ◽

Sequencing Data ◽

Integrative Framework ◽

High Throughput Sequencing Data ◽

Sequencing Data Analysis

Download Full-text

NASQAR: A web-based platform for high-throughput sequencing data analysis and visualization

10.1101/709980 ◽

2019 ◽

Cited By ~ 1

Author(s):

Ayman Yousif ◽

Nizar Drou ◽

Jillian Rowe ◽

Mohammed Khalfan ◽

Kristin C Gunsalus

Keyword(s):

New York ◽

Data Analysis ◽

Open Source ◽

High Throughput ◽

High Throughput Sequencing ◽

Web Applications ◽

Rna Seq ◽

Sequencing Data ◽

Web Based ◽

Link Type

AbstractBackgroundAs high-throughput sequencing applications continue to evolve, the rapid growth in quantity and variety of sequence-based data calls for the development of new software libraries and tools for data analysis and visualization. Often, effective use of these tools requires computational skills beyond those of many researchers. To ease this computational barrier, we have created a dynamic web-based platform, NASQAR (Nucleic Acid SeQuence Analysis Resource).ResultsNASQAR offers a collection of custom and publicly available open-source web applications that make extensive use of a variety of R packages to provide interactive data analysis and visualization. The platform is publicly accessible at http://nasqar.abudhabi.nyu.edu/. Open-source code is on GitHub at https://github.com/nasqar/NASQAR, and the system is also available as a Docker image at https://hub.docker.com/r/aymanm/nasqarall. NASQAR is a collaboration between the core bioinformatics teams of the NYU Abu Dhabi and NYU New York Centers for Genomics and Systems Biology.ConclusionsNASQAR empowers non-programming experts with a versatile and intuitive toolbox to easily and efficiently explore, analyze, and visualize their Transcriptomics data interactively. Popular tools for a variety of applications are currently available, including Transcriptome Data Preprocessing, RNA-seq Analysis (including Single-cell RNA-seq), Metagenomics, and Gene Enrichment.

Download Full-text

fluff: exploratory analysis and visualization of high-throughput sequencing data

PeerJ ◽

10.7717/peerj.2209 ◽

2016 ◽

Vol 4 ◽

pp. e2209 ◽

Cited By ~ 28

Author(s):

Georgios Georgiou ◽

Simon J. van Heeringen

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

Developmental Stages ◽

Command Line ◽

Clustering Methods ◽

Sequencing Data ◽

Link Type ◽

High Throughput Sequencing Data ◽

Genome Wide ◽

Genome Wide Data

Summary.In this article we describe fluff, a software package that allows for simple exploration, clustering and visualization of high-throughput sequencing data mapped to a reference genome. The package contains three command-line tools to generate publication-quality figures in an uncomplicated manner using sensible defaults. Genome-wide data can be aggregated, clustered and visualized in a heatmap, according to different clustering methods. This includes a predefined setting to identify dynamic clusters between different conditions or developmental stages. Alternatively, clustered data can be visualized in a bandplot. Finally, fluff includes a tool to generate genomic profiles. As command-line tools, the fluff programs can easily be integrated into standard analysis pipelines. The installation is straightforward and documentation is available athttp://fluff.readthedocs.org.Availability.fluff is implemented in Python and runs on Linux. The source code is freely available for download athttps://github.com/simonvh/fluff.

Download Full-text

HTSstation: A Web Application and Open-Access Libraries for High-Throughput Sequencing Data Analysis

PLoS ONE ◽

10.1371/journal.pone.0085879 ◽

2014 ◽

Vol 9 (1) ◽

pp. e85879 ◽

Cited By ~ 67

Author(s):

Fabrice P. A. David ◽

Julien Delafontaine ◽

Solenne Carat ◽

Frederick J. Ross ◽

Gregory Lefebvre ◽

...

Keyword(s):

Data Analysis ◽

Open Access ◽

High Throughput ◽

Web Application ◽

High Throughput Sequencing ◽

Sequencing Data ◽

High Throughput Sequencing Data ◽

Sequencing Data Analysis

Download Full-text

HaTSPiL: A modular pipeline for high-throughput sequencing data analysis

PLoS ONE ◽

10.1371/journal.pone.0222512 ◽

2019 ◽

Vol 14 (10) ◽

pp. e0222512

Author(s):

Edoardo Morandi ◽

Matteo Cereda ◽

Danny Incarnato ◽

Caterina Parlato ◽

Giulia Basile ◽

...

Keyword(s):

Data Analysis ◽

High Throughput ◽

High Throughput Sequencing ◽

Sequencing Data ◽

High Throughput Sequencing Data ◽

Sequencing Data Analysis

Download Full-text

clusterSeq: methods for identifying co-expression in high-throughput sequencing data

10.1101/188581 ◽

2017 ◽

Cited By ~ 1

Author(s):

Thomas J. Hardcastle ◽

Irene Papatheodorou

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

Structural Elements ◽

Sequencing Data ◽

Experimental Conditions ◽

Functional Relationships ◽

Link Type ◽

High Throughput Sequencing Data ◽

Novel Approach ◽

Significant Step

ABSTRACTSummary:Identifying gene co-expression is a significant step in understanding functional relationships between genes. Existing methods primarily depend on analyses of correlation between pairs of genes; however, this neglects structural elements between experimental conditions. We present a novel approach to identifying clusters of co-expressed genes that incorporates these structures.Availability:The methods are released on Bioconductor as the clusterSeq package (https://bioconductor.org/packages/release/bioc/html/clusterSeq.html).Contact: [email protected]

Download Full-text

Productive visualization of high-throughput sequencing data using the SeqCode open portable platform

Scientific Reports ◽

10.1038/s41598-021-98889-7 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Enrique Blanco ◽

Mar González-Ramírez ◽

Luciano Di Croce

Keyword(s):

High Throughput ◽

Large Scale ◽

High Throughput Sequencing ◽

Graphical Analysis ◽

Sequencing Data ◽

Efficient Manner ◽

Link Type ◽

High Throughput Sequencing Data ◽

Almost All ◽

User Friendly

AbstractLarge-scale sequencing techniques to chart genomes are entirely consolidated. Stable computational methods to perform primary tasks such as quality control, read mapping, peak calling, and counting are likewise available. However, there is a lack of uniform standards for graphical data mining, which is also of central importance. To fill this gap, we developed SeqCode, an open suite of applications that analyzes sequencing data in an elegant but efficient manner. Our software is a portable resource written in ANSI C that can be expected to work for almost all genomes in any computational configuration. Furthermore, we offer a user-friendly front-end web server that integrates SeqCode functions with other graphical analysis tools. Our analysis and visualization toolkit represents a significant improvement in terms of performance and usability as compare to other existing programs. Thus, SeqCode has the potential to become a key multipurpose instrument for high-throughput professional analysis; further, it provides an extremely useful open educational platform for the world-wide scientific community. SeqCode website is hosted at http://ldicrocelab.crg.eu, and the source code is freely distributed at https://github.com/eblancoga/seqcode.

Download Full-text

High throughput sequencing data analysis workflow: mtDNA variant detection and identification of STR/Y-STR alleles and iso-alleles

Forensic Science International Genetics Supplement Series ◽

10.1016/j.fsigss.2019.10.121 ◽

2019 ◽

Vol 7 (1) ◽

pp. 639-640

Author(s):

C.S. Liu ◽

L. Luo ◽

J. McGuigan ◽

J. Wu ◽

J. Todd ◽

...

Keyword(s):

Data Analysis ◽

High Throughput ◽

High Throughput Sequencing ◽

Sequencing Data ◽

High Throughput Sequencing Data ◽

Detection And Identification ◽

Analysis Workflow ◽

Variant Detection ◽

Sequencing Data Analysis

Download Full-text

NASQAR: a web-based platform for high-throughput sequencing data analysis and visualization

BMC Bioinformatics ◽

10.1186/s12859-020-03577-4 ◽

2020 ◽

Vol 21 (1) ◽

Author(s):

Ayman Yousif ◽

Nizar Drou ◽

Jillian Rowe ◽

Mohammed Khalfan ◽

Kristin C. Gunsalus

Keyword(s):

Data Analysis ◽

High Throughput ◽

High Throughput Sequencing ◽

Sequencing Data ◽

Web Based ◽

High Throughput Sequencing Data ◽

Sequencing Data Analysis

Download Full-text