GAD: a Python script for dividing genome annotation files into feature-based files

Mapping Intimacies ◽

10.1101/815860 ◽

2019 ◽

Author(s):

Ahmed Karam ◽

Norhan Yasser

Keyword(s):

Data Analysis ◽

Genome Annotation ◽

Gene Annotation ◽

Untranslated Regions ◽

File Formats ◽

Genome Features ◽

Daily Task ◽

Intergenic Regions ◽

Feature Based ◽

Genomic Data Analysis

AbstractNowadays, manipulating and analyzing publicly available genomic datasets become a daily task in bioinformatics and genomics laboratories. The release of several genome sequencing projects prompts bioinformaticians to develop automated scripts and pipelines which analyze genomic datasets in particular gene annotation pipelines. Handling genome annotation files with fully-featured programs used by non-developers is necessary, furthermore, accelerating genomic data analysis with a focus on diminishing the genome annotation and sequence files based on specific features is required. Consequently, to extract genome features from GTF or GFF3 in a precise manner, GAD script (https://github.com/bio-projects/GAD) provides a simple graphical user interface which interpreted by all python versions installed in different operating systems. GAD script contains unique entry widgets which are capable to analyze multiple genome sequence and annotation files by a click. With highly influential coded functions, genome features such upstream genes, downstream genes, intergenic regions, genes, transcripts, exons, introns, coding sequences, five prime untranslated regions, and three prime untranslated regions and other ambiguous sequence ontology terms will be extracted. GAD script outputs the results in diverse file formats such as BED, GTF/GFF3 and FASTA files which supported by other bioinformatics programs. Our script could be incorporated into various pipelines in all genomics laboratories with the aim of accelerating data analysis.

Download Full-text

Advancing clinical genomics and precision medicine with GVViZ: FAIR bioinformatics platform for variable gene-disease annotation, visualization, and expression analysis

Human Genomics ◽

10.1186/s40246-021-00336-1 ◽

2021 ◽

Vol 15 (1) ◽

Author(s):

Zeeshan Ahmed ◽

Eduard Gibert Renart ◽

Saman Zeeshan ◽

XinQi Dong

Keyword(s):

Data Analysis ◽

Patient Care ◽

Expression Analysis ◽

High Throughput ◽

Gene Annotation ◽

Next Generation Sequencing Data ◽

Rna Seq ◽

Sequencing Data ◽

Complex Disorders ◽

Transcriptomics Data

Abstract Background Genetic disposition is considered critical for identifying subjects at high risk for disease development. Investigating disease-causing and high and low expressed genes can support finding the root causes of uncertainties in patient care. However, independent and timely high-throughput next-generation sequencing data analysis is still a challenge for non-computational biologists and geneticists. Results In this manuscript, we present a findable, accessible, interactive, and reusable (FAIR) bioinformatics platform, i.e., GVViZ (visualizing genes with disease-causing variants). GVViZ is a user-friendly, cross-platform, and database application for RNA-seq-driven variable and complex gene-disease data annotation and expression analysis with a dynamic heat map visualization. GVViZ has the potential to find patterns across millions of features and extract actionable information, which can support the early detection of complex disorders and the development of new therapies for personalized patient care. The execution of GVViZ is based on a set of simple instructions that users without a computational background can follow to design and perform customized data analysis. It can assimilate patients’ transcriptomics data with the public, proprietary, and our in-house developed gene-disease databases to query, easily explore, and access information on gene annotation and classified disease phenotypes with greater visibility and customization. To test its performance and understand the clinical and scientific impact of GVViZ, we present GVViZ analysis for different chronic diseases and conditions, including Alzheimer’s disease, arthritis, asthma, diabetes mellitus, heart failure, hypertension, obesity, osteoporosis, and multiple cancer disorders. The results are visualized using GVViZ and can be exported as image (PNF/TIFF) and text (CSV) files that include gene names, Ensembl (ENSG) IDs, quantified abundances, expressed transcript lengths, and annotated oncology and non-oncology diseases. Conclusions We emphasize that automated and interactive visualization should be an indispensable component of modern RNA-seq analysis, which is currently not the case. However, experts in clinics and researchers in life sciences can use GVViZ to visualize and interpret the transcriptomics data, making it a powerful tool to study the dynamics of gene expression and regulation. Furthermore, with successful deployment in clinical settings, GVViZ has the potential to enable high-throughput correlations between patient diagnoses based on clinical and transcriptomics data.

Download Full-text

Uncovering transcriptional dark matter via gene annotation independent single-cell RNA sequencing analysis

Nature Communications ◽

10.1038/s41467-021-22496-3 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Michael F. Z. Wang ◽

Madhav Mantri ◽

Shao-Pei Chou ◽

Gaetano J. Scuderi ◽

David W. McKellar ◽

...

Keyword(s):

Single Cell ◽

Genome Annotation ◽

Gene Annotation ◽

Active Regions ◽

Sequencing Analysis ◽

Biologically Relevant ◽

Mole Rat ◽

Genome Annotations ◽

Cell Expression ◽

High Quality Genome

AbstractConventional scRNA-seq expression analyses rely on the availability of a high quality genome annotation. Yet, as we show here with scRNA-seq experiments and analyses spanning human, mouse, chicken, mole rat, lemur and sea urchin, genome annotations are often incomplete, in particular for organisms that are not routinely studied. To overcome this hurdle, we created a scRNA-seq analysis routine that recovers biologically relevant transcriptional activity beyond the scope of the best available genome annotation by performing scRNA-seq analysis on any region in the genome for which transcriptional products are detected. Our tool generates a single-cell expression matrix for all transcriptionally active regions (TARs), performs single-cell TAR expression analysis to identify biologically significant TARs, and then annotates TARs using gene homology analysis. This procedure uses single-cell expression analyses as a filter to direct annotation efforts to biologically significant transcripts and thereby uncovers biology to which scRNA-seq would otherwise be in the dark.

Download Full-text

Characterizing Performance Variation of Genomic Data Analysis Workflows on the Public Cloud

2020 IEEE Intl Conf on Dependable, Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech) ◽

10.1109/dasc-picom-cbdcom-cyberscitech49142.2020.00116 ◽

2020 ◽

Author(s):

David Perez ◽

Ling-Hong Hung ◽

Sonia Xu ◽

Ka Yee Yeung ◽

Wes Lloyd

Keyword(s):

Data Analysis ◽

Genomic Data ◽

Public Cloud ◽

The Public ◽

Performance Variation ◽

Genomic Data Analysis

Download Full-text

Statistical Genetics for Genomic Data Analysis

Springer Handbook of Engineering Statistics ◽

10.1007/978-1-84628-288-1_32 ◽

2006 ◽

pp. 591-605

Author(s):

Jae Lee

Keyword(s):

Data Analysis ◽

Genomic Data ◽

Statistical Genetics ◽

Genomic Data Analysis

Download Full-text

Introduction to R for Genomic Data Analysis

Computational Genomics with R ◽

10.1201/9780429084317-2 ◽

2020 ◽

pp. 23-66

Author(s):

Altuna Akalin

Keyword(s):

Data Analysis ◽

Genomic Data ◽

Genomic Data Analysis

Download Full-text

Uncovering Effective Explanations for Interactive Genomic Data Analysis

Patterns ◽

10.1016/j.patter.2020.100093 ◽

2020 ◽

Vol 1 (6) ◽

pp. 100093

Author(s):

Silu Huang ◽

Charles Blatti ◽

Saurabh Sinha ◽

Aditya Parameswaran

Keyword(s):

Data Analysis ◽

Genomic Data ◽

Genomic Data Analysis

Download Full-text

Triple Threat: OnRamp Bioinformatics, Cloudian, ScaleMatrix Combine Services for Genomic Data Analysis and Storage Offering

Clinical OMICs ◽

10.1089/clinomi.04.04.20 ◽

2017 ◽

Vol 4 (4) ◽

pp. 30-31

Author(s):

Chris Anderson

Keyword(s):

Data Analysis ◽

Genomic Data ◽

Genomic Data Analysis ◽

And Storage ◽

Triple Threat

Download Full-text

Impact of Gene Annotation on RNA-seq Data Analysis

Next Generation Sequencing - Advances, Applications and Challenges ◽

10.5772/61197 ◽

2016 ◽

Author(s):

Shanrong Zhao ◽

Baohong Zhang

Keyword(s):

Data Analysis ◽

Gene Annotation ◽

Rna Seq

Download Full-text

Reproducible biomedical benchmarking in the cloud: lessons from crowd-sourced data challenges

Genome Biology ◽

10.1186/s13059-019-1794-0 ◽

2019 ◽

Vol 20 (1) ◽

Cited By ~ 7

Author(s):

Kyle Ellrott ◽

Alex Buchanan ◽

Allison Creason ◽

Michael Mason ◽

Thomas Schaffter ◽

...

Keyword(s):

Data Analysis ◽

Data Sharing ◽

Software Architectures ◽

Biomedical Data ◽

Output File ◽

Biomedical Data Analysis ◽

Input And Output ◽

Software Packages ◽

File Formats ◽

Computing Environments

Abstract Challenges are achieving broad acceptance for addressing many biomedical questions and enabling tool assessment. But ensuring that the methods evaluated are reproducible and reusable is complicated by the diversity of software architectures, input and output file formats, and computing environments. To mitigate these problems, some challenges have leveraged new virtualization and compute methods, requiring participants to submit cloud-ready software packages. We review recent data challenges with innovative approaches to model reproducibility and data sharing, and outline key lessons for improving quantitative biomedical data analysis through crowd-sourced benchmarking challenges.

Download Full-text