NASQAR: A web-based platform for high-throughput sequencing data analysis and visualization

Mapping Intimacies ◽

10.1101/709980 ◽

2019 ◽

Cited By ~ 1

Author(s):

Ayman Yousif ◽

Nizar Drou ◽

Jillian Rowe ◽

Mohammed Khalfan ◽

Kristin C Gunsalus

Keyword(s):

New York ◽

Data Analysis ◽

Open Source ◽

High Throughput ◽

High Throughput Sequencing ◽

Web Applications ◽

Rna Seq ◽

Sequencing Data ◽

Web Based ◽

Link Type

AbstractBackgroundAs high-throughput sequencing applications continue to evolve, the rapid growth in quantity and variety of sequence-based data calls for the development of new software libraries and tools for data analysis and visualization. Often, effective use of these tools requires computational skills beyond those of many researchers. To ease this computational barrier, we have created a dynamic web-based platform, NASQAR (Nucleic Acid SeQuence Analysis Resource).ResultsNASQAR offers a collection of custom and publicly available open-source web applications that make extensive use of a variety of R packages to provide interactive data analysis and visualization. The platform is publicly accessible at http://nasqar.abudhabi.nyu.edu/. Open-source code is on GitHub at https://github.com/nasqar/NASQAR, and the system is also available as a Docker image at https://hub.docker.com/r/aymanm/nasqarall. NASQAR is a collaboration between the core bioinformatics teams of the NYU Abu Dhabi and NYU New York Centers for Genomics and Systems Biology.ConclusionsNASQAR empowers non-programming experts with a versatile and intuitive toolbox to easily and efficiently explore, analyze, and visualize their Transcriptomics data interactively. Popular tools for a variety of applications are currently available, including Transcriptome Data Preprocessing, RNA-seq Analysis (including Single-cell RNA-seq), Metagenomics, and Gene Enrichment.

Download Full-text

NASQAR: a web-based platform for high-throughput sequencing data analysis and visualization

BMC Bioinformatics ◽

10.1186/s12859-020-03577-4 ◽

2020 ◽

Vol 21 (1) ◽

Author(s):

Ayman Yousif ◽

Nizar Drou ◽

Jillian Rowe ◽

Mohammed Khalfan ◽

Kristin C. Gunsalus

Keyword(s):

Data Analysis ◽

High Throughput ◽

High Throughput Sequencing ◽

Sequencing Data ◽

Web Based ◽

High Throughput Sequencing Data ◽

Sequencing Data Analysis

Download Full-text

The simple fool's guide to population genomics via RNA ‐Seq: an introduction to high‐throughput sequencing data analysis

Molecular Ecology Resources ◽

10.1111/1755-0998.12003 ◽

2012 ◽

Vol 12 (6) ◽

pp. 1058-1067 ◽

Cited By ~ 167

Author(s):

Pierre Wit ◽

Melissa H. Pespeni ◽

Jason T. Ladner ◽

Daniel J. Barshis ◽

François Seneca ◽

...

Keyword(s):

Data Analysis ◽

High Throughput ◽

High Throughput Sequencing ◽

Population Genomics ◽

Rna Seq ◽

Sequencing Data ◽

High Throughput Sequencing Data ◽

Sequencing Data Analysis

Download Full-text

ARSDA: A new approach for storing, transmitting and analyzing high-throughput sequencing data

10.1101/114470 ◽

2017 ◽

Cited By ~ 6

Author(s):

Xuhua Xia

Keyword(s):

Bacillus Subtilis ◽

Data Analysis ◽

High Throughput ◽

High Throughput Sequencing ◽

Sequencing Data ◽

Plain Text ◽

Link Type ◽

High Throughput Sequencing Data ◽

Fastq Format ◽

Single Entry

ABSTRACTTwo major stumbling blocks exist in high-throughput sequencing (HTS) data analysis. The first is the sheer file size typically in gigabytes when uncompressed, causing problems in storage, transmission and analysis. However, these files do not need to be so large and can be reduced without loss of information. Each HTS file, either in compressed .SRA or plain text .fastq format, contains numerous identical reads stored as separate entries. For example, among 44603541 forward reads in the SRR4011234.sra file (from aBacillus subtilistranscriptomic study) deposited at NCBI’s SRA database, one read has 497027 identical copies. Instead of storing them as separate entries, one can and should store them as a single entry with the SeqID_NumCopy format (which I dub as FASTA+ format). The second is the proper allocation reads that map equally well to paralogous genes. I illustrate in detail a new method for such allocation. I have developed ARSDA software that implement these new approaches. A number of HTS files for model species are in the process of being processed and deposited athttp://coevol.rdc.uottawa.cato demonstrate that this approach not only saves a huge amount of storage space and transmission bandwidth, but also dramatically reduces time in downstream data analysis. Instead of matching the 497027 identical reads separately against theBacillus subtilisgenome, one only needs to match it once. ARSDA includes functions to take advantage of HTS data in the new sequence format for downstream data analysis such as gene expression characterization. ARSDA can be run on Windows, Linux and Macintosh computers and is freely available athttp://dambe.bio.uottawa.ca/ARSDA/ARSDA.aspx.

Download Full-text

Advancing clinical genomics and precision medicine with GVViZ: FAIR bioinformatics platform for variable gene-disease annotation, visualization, and expression analysis

Human Genomics ◽

10.1186/s40246-021-00336-1 ◽

2021 ◽

Vol 15 (1) ◽

Author(s):

Zeeshan Ahmed ◽

Eduard Gibert Renart ◽

Saman Zeeshan ◽

XinQi Dong

Keyword(s):

Data Analysis ◽

Patient Care ◽

Expression Analysis ◽

High Throughput ◽

Gene Annotation ◽

Next Generation Sequencing Data ◽

Rna Seq ◽

Sequencing Data ◽

Complex Disorders ◽

Transcriptomics Data

Abstract Background Genetic disposition is considered critical for identifying subjects at high risk for disease development. Investigating disease-causing and high and low expressed genes can support finding the root causes of uncertainties in patient care. However, independent and timely high-throughput next-generation sequencing data analysis is still a challenge for non-computational biologists and geneticists. Results In this manuscript, we present a findable, accessible, interactive, and reusable (FAIR) bioinformatics platform, i.e., GVViZ (visualizing genes with disease-causing variants). GVViZ is a user-friendly, cross-platform, and database application for RNA-seq-driven variable and complex gene-disease data annotation and expression analysis with a dynamic heat map visualization. GVViZ has the potential to find patterns across millions of features and extract actionable information, which can support the early detection of complex disorders and the development of new therapies for personalized patient care. The execution of GVViZ is based on a set of simple instructions that users without a computational background can follow to design and perform customized data analysis. It can assimilate patients’ transcriptomics data with the public, proprietary, and our in-house developed gene-disease databases to query, easily explore, and access information on gene annotation and classified disease phenotypes with greater visibility and customization. To test its performance and understand the clinical and scientific impact of GVViZ, we present GVViZ analysis for different chronic diseases and conditions, including Alzheimer’s disease, arthritis, asthma, diabetes mellitus, heart failure, hypertension, obesity, osteoporosis, and multiple cancer disorders. The results are visualized using GVViZ and can be exported as image (PNF/TIFF) and text (CSV) files that include gene names, Ensembl (ENSG) IDs, quantified abundances, expressed transcript lengths, and annotated oncology and non-oncology diseases. Conclusions We emphasize that automated and interactive visualization should be an indispensable component of modern RNA-seq analysis, which is currently not the case. However, experts in clinics and researchers in life sciences can use GVViZ to visualize and interpret the transcriptomics data, making it a powerful tool to study the dynamics of gene expression and regulation. Furthermore, with successful deployment in clinical settings, GVViZ has the potential to enable high-throughput correlations between patient diagnoses based on clinical and transcriptomics data.

Download Full-text

Improvements and impacts of GRCh38 human reference on high throughput sequencing data analysis

Genomics ◽

10.1016/j.ygeno.2017.01.005 ◽

2017 ◽

Vol 109 (2) ◽

pp. 83-90 ◽

Cited By ~ 44

Author(s):

Yan Guo ◽

Yulin Dai ◽

Hui Yu ◽

Shilin Zhao ◽

David C. Samuels ◽

...

Keyword(s):

Data Analysis ◽

High Throughput ◽

High Throughput Sequencing ◽

Sequencing Data ◽

High Throughput Sequencing Data ◽

Sequencing Data Analysis

Download Full-text

HTSeq - A Python framework to work with high-throughput sequencing data

10.1101/002824 ◽

2014 ◽

Cited By ~ 242

Author(s):

Simon Anders ◽

Paul Theodor Pyl ◽

Wolfgang Huber

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

Rapid Development ◽

Differential Expression Analysis ◽

Rna Seq ◽

Sequencing Data ◽

Standard Work ◽

Data Formats ◽

High Throughput Sequencing Data ◽

Python Package

Motivation: A large choice of tools exists for many standard tasks in the analysis of high-throughput sequencing (HTS) data. However, once a project deviates from standard work flows, custom scripts are needed. Results: We present HTSeq, a Python library to facilitate the rapid development of such scripts. HTSeq offers parsers for many common data formats in HTS projects, as well as classes to represent data such as genomic coordinates, sequences, sequencing reads, alignments, gene model information, variant calls, and provides data structures that allow for querying via genomic coordinates. We also present htseq-count, a tool developed with HTSeq that preprocesses RNA-Seq data for differential expression analysis by counting the overlap of reads with genes. Availability: HTSeq is released as open-source software under the GNU General Public Licence and available from http://www-huber.embl.de/HTSeq or from the Python Package Index, https://pypi.python.org/pypi/HTSeq

Download Full-text

xIP-seq Platform: An Integrative Framework for High-Throughput Sequencing Data Analysis

2009 Ohio Collaborative Conference on Bioinformatics ◽

10.1109/occbio.2009.20 ◽

2009 ◽

Cited By ~ 2

Author(s):

Xin Wang ◽

Mingxiang Teng ◽

Guohua Wang ◽

Yuming Zhao ◽

Xu Han ◽

...

Keyword(s):

Data Analysis ◽

High Throughput ◽

High Throughput Sequencing ◽

Sequencing Data ◽

Integrative Framework ◽

High Throughput Sequencing Data ◽

Sequencing Data Analysis

Download Full-text

fluff: exploratory analysis and visualization of high-throughput sequencing data

PeerJ ◽

10.7717/peerj.2209 ◽

2016 ◽

Vol 4 ◽

pp. e2209 ◽

Cited By ~ 28

Author(s):

Georgios Georgiou ◽

Simon J. van Heeringen

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

Developmental Stages ◽

Command Line ◽

Clustering Methods ◽

Sequencing Data ◽

Link Type ◽

High Throughput Sequencing Data ◽

Genome Wide ◽

Genome Wide Data

Summary.In this article we describe fluff, a software package that allows for simple exploration, clustering and visualization of high-throughput sequencing data mapped to a reference genome. The package contains three command-line tools to generate publication-quality figures in an uncomplicated manner using sensible defaults. Genome-wide data can be aggregated, clustered and visualized in a heatmap, according to different clustering methods. This includes a predefined setting to identify dynamic clusters between different conditions or developmental stages. Alternatively, clustered data can be visualized in a bandplot. Finally, fluff includes a tool to generate genomic profiles. As command-line tools, the fluff programs can easily be integrated into standard analysis pipelines. The installation is straightforward and documentation is available athttp://fluff.readthedocs.org.Availability.fluff is implemented in Python and runs on Linux. The source code is freely available for download athttps://github.com/simonvh/fluff.

Download Full-text

Application of High-Throughput Sequencing Data Mining in Comparison of Gene Expression Profile in Renal Cell Carcinoma and Normal Renal Cell by RNA-Seq

Lecture Notes in Electrical Engineering - Innovative Computing ◽

10.1007/978-981-15-5959-4_45 ◽

2020 ◽

pp. 359-365

Author(s):

Yunhai Yu ◽

Hongmei Xu ◽

Shaoning Guo ◽

Na Wang

Keyword(s):

Gene Expression ◽

Data Mining ◽

Renal Cell Carcinoma ◽

Cell Carcinoma ◽

High Throughput ◽

Renal Cell ◽

High Throughput Sequencing ◽

Rna Seq ◽

Sequencing Data ◽

High Throughput Sequencing Data

Download Full-text

HTSstation: A Web Application and Open-Access Libraries for High-Throughput Sequencing Data Analysis

PLoS ONE ◽

10.1371/journal.pone.0085879 ◽

2014 ◽

Vol 9 (1) ◽

pp. e85879 ◽

Cited By ~ 67

Author(s):

Fabrice P. A. David ◽

Julien Delafontaine ◽

Solenne Carat ◽

Frederick J. Ross ◽

Gregory Lefebvre ◽

...

Keyword(s):

Data Analysis ◽

Open Access ◽

High Throughput ◽

Web Application ◽

High Throughput Sequencing ◽

Sequencing Data ◽

High Throughput Sequencing Data ◽

Sequencing Data Analysis

Download Full-text