Biochat: a database for natural language processing of Gene Expression Omnibus data

Mapping Intimacies ◽

10.1101/480020 ◽

2018 ◽

Author(s):

Bohdan B. Khomtchouk ◽

Vsevolod Dyomkin ◽

Kasra A. Vand ◽

Themistocles Assimes ◽

Or Gozani

Keyword(s):

Gene Expression ◽

Language Processing ◽

Web Application ◽

Gene Expression Omnibus ◽

Omics Data ◽

Textual Information ◽

Link Type ◽

Different Types ◽

User Friendly ◽

Omics Data Integration

AbstractA biological dataset’s metadata profile (e.g., study description, organism name, sequencing type, etc.) typically contains terse but descriptive textual information that can be used to link it with other similar biological datasets for the purpose of integrating omics data of different types to inform hypotheses and biological questions. Here we present Biochat, a database containing a multi-omics data integration support system to aid in cross-linking Gene Expression Omnibus (GEO) records to each other by metadata similarity through a user-friendly web application. Biochat is publicly available at: http://www.biochat.ai. Biochat source code is hosted at: https://github.com/Bohdan-Khomtchouk/Bio-chat.Database URLhttps://github.com/Bohdan-Khomtchouk/Bio-chat

Download Full-text

GREIN: An Interactive Web Platform for Reanalyzing GEO RNA-seq Data

10.1101/326223 ◽

2018 ◽

Cited By ~ 1

Author(s):

Naim Al Mahi ◽

Mehdi Fazel Najafabadi ◽

Marcin Pilarczyk ◽

Michal Kouril ◽

Mario Medvedovic

Keyword(s):

Gene Expression ◽

User Interfaces ◽

Web Application ◽

Statistical Power ◽

Functional Characterization ◽

Gene Expression Omnibus ◽

Rna Seq ◽

Link Type ◽

Front End ◽

User Friendly

ABSTRACTThe vast amount of RNA-seq data deposited in Gene Expression Omnibus (GEO) and Sequence Read Archive (SRA) is still a grossly underutilized resource for biomedical research. To remove technical roadblocks for reusing these data, we have developed a web-application GREIN (GEO RNA-seq Experiments Interactive Navigator) which provides user-friendly interfaces to manipulate and analyze GEO RNA-seq data. GREIN is powered by the back-end computational pipeline for uniform processing of RNA-seq data and the large number (>6,500) of already processed datasets. The front-end user interfaces provide a wealth of user-analytics options including sub-setting and downloading processed data, interactive visualization, statistical power analyses, construction of differential gene expression signatures and their comprehensive functional characterization, and connectivity analysis with LINCS L1000 data. The combination of the massive amount of back-end data and front-end analytics options driven by user-friendly interfaces makes GREIN a unique open-source resource for re-using GEO RNA-seq data. GREIN is accessible at: https://shiny.ilincs.org/grein, the source code at: https://github.com/uc-bd2k/grein, and the Docker container at: https://hub.docker.com/r/ucbd2k/grein.

Download Full-text

ProTrack: An Interactive Multi-Omics Data Browser for Proteogenomic Studies

10.1101/2020.02.05.935650 ◽

2020 ◽

Cited By ~ 1

Author(s):

Anna Pamela Calinawan ◽

Xiaoyu Song ◽

Jiayi Ji ◽

Saravana Mohan Dhanasekaran ◽

Francesca Petralia ◽

...

Keyword(s):

Web Application ◽

Research Community ◽

Omics Data ◽

Multiple Cancer ◽

Cell Renal Cell Carcinoma ◽

Data Set ◽

Data Resource ◽

Link Type ◽

Cancer Types ◽

User Friendly

AbstractThe Clinical Proteomic Tumor Analysis Consortium (CPTAC) initiative has generated extensive multi-omics data resources of deep proteogenomic profiles for multiple cancer types. To enable the broader community of biological and medical researchers to intuitively query, explore, and download data and analysis results from various CPTAC projects, we built a prototype user-friendly web application called “ProTrack” with the CPTAC clear cell renal cell carcinoma (ccRCC) data set (http://ccrcc.cptac-data-view.org). Here we describe the salient features of this application which provides a dynamic, comprehensive, and granular visualization of the rich proteogenomic data.Statement of SignificanceThe CPTAC initiative (https://proteomics.cancer.gov/) has generated multi-omics data for multiple cancer types to understand the proteogenomic aberrations of these malignancies. Collectively this effort has so far produced a large data resource for the research community, including high-throughput profiles for proteome, phosphoproteome, whole exome, whole genome, transcriptome, and DNA methylome. To make this valuable data-resource useful to the larger research community, there is a pressing need for development of user-friendly, readily accessible, and easily shared analytic and visualization tools for aligning multi-omics data and exploring alterations in key cancer genes, to drive and support new biological hypotheses. To bridge this gap, we have developed CPTAC ProTrack, an interactive web application which uses a multilayered, client-server architecture in order to deliver an interactive web experience to any user of a modern web-browser. This tool is intentionally designed accessible for researchers, biologists, and clinicians who are interested in multi-omic data without any need to code.

Download Full-text

BDdb: a comprehensive platform for exploration and utilization of birth defect multi-omics data

BMC Medical Genomics ◽

10.1186/s12920-021-01110-x ◽

2021 ◽

Vol 14 (1) ◽

Author(s):

Dengwei Zhang ◽

Si Zhou ◽

Ziheng Zhou ◽

Xiaosen Jiang ◽

Dongsheng Chen ◽

...

Keyword(s):

Gene Expression ◽

Birth Defects ◽

Birth Defect ◽

Infant Health ◽

Expression Patterns ◽

Gene Expression Omnibus ◽

Omics Data ◽

Chromatin Immunoprecipitation Sequencing ◽

Underlying Mechanisms ◽

User Friendly

Abstract Background Birth defects pose a major challenge to infant health. Thus far, however, the causes of most birth defects remain cryptic. Over the past few decades, considerable effort has been expended on disclosing the underlying mechanisms related to birth defects, yielding myriad treatises and data. To meet the increasing requirements for data resources, we developed a freely accessible birth defect multi-omics database (BDdb, http://t21omics.cngb.org) consisting of multi-omics data and potential disease biomarkers. Results In total, omics datasets from 136 Gene Expression Omnibus (GEO) Series records, including 5245 samples, as well as 869 biomarkers of 22 birth defects in six different species, were integrated into the BDdb. The database provides a user-friendly interface for searching, browsing, and downloading data of interest. The BDdb also enables users to explore the correlations among different sequencing methods, such as chromatin immunoprecipitation sequencing (ChIP-Seq) and RNA sequencing (RNA-Seq) from different studies, to obtain the information on gene expression patterns from diverse aspects. Conclusion To the best of our knowledge, the BDdb is the first comprehensive database associated with birth defects, which should benefit the diagnosis and prevention of birth defects.

Download Full-text

A curated collection of transcriptome datasets to investigate the molecular mechanisms of immunoglobulin E-mediated atopic diseases

10.1101/525477 ◽

2019 ◽

Author(s):

Susie S. Y. Huang ◽

Fatima Al Ali ◽

Sabri Boughorbel ◽

Mohammed Toufiq ◽

Damien Chaussabel ◽

...

Keyword(s):

Gene Expression ◽

Web Application ◽

Molecular Mechanisms ◽

Complex Disease ◽

Immunoglobulin E ◽

Gene Expression Omnibus ◽

Atopic Diseases ◽

Sample Number ◽

Link Type ◽

Good Concordance

ABSTRACTPrevalence of allergies has reached ~50% of industrialized populations and with children under ten being the most susceptible. However, the combination of the complexity of atopic allergy susceptibility/development and environmental factors has made identification of gene biomarkers challenging. The amount of publicly accessible transcriptomic data presents an unprecedented opportunity for mechanistic discoveries and validation of complex disease signatures across studies. However, this necessitates structured methodologies and visual tools for the interpretation of results. Here, we present a curated collection of transcriptomic datasets relevant to immunoglobin E (IgE)-mediated atopic diseases (ranging from allergies to primary immunodeficiencies). 30 datasets from the Gene Expression Omnibus (GEO), encompassing 1761 transcriptome profiles, were made available on the Gene Expression Browser (GXB), an online and open-source web application that allows for the query, visualization, and annotation of metadata. The thematic compositions, disease categories, sample number, and platforms of the collection are described. Ranked gene lists and sample grouping are used to facilitate data visualization/interpretation and are available online via GXB (http://ige.gxbsidra.org/dm3/geneBrowser/list). Dataset validation using associated publications showed good concordance in GXB gene expression trend and fold-change.Database URL: http://ige.gxbsidra.org/dm3/geneBrowser/list

Download Full-text

CovidNLP: A Web Application for Distilling Systemic Implications of COVID-19 Pandemic with Natural Language Processing

10.1101/2020.04.25.20079129 ◽

2020 ◽

Cited By ~ 4

Author(s):

Raghav Awasthi ◽

Ridam Pal ◽

Pradeep Singh ◽

Aditya Nagori ◽

Suryatej Reddy ◽

...

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Web Application ◽

Data Availability ◽

Research Articles ◽

Global Literature ◽

Link Type ◽

Novel Coronavirus ◽

User Friendly

AbstractThe flood of conflicting COVID-19 research has revealed that COVID-19 continues to be an enigma. Although more than 14,000 research articles on COVID-19 have been published with the disease taking a pandemic proportion, clinicians and researchers are struggling to distill knowledge for furthering clinical management and research. In this study, we address this gap for a targeted user group, i.e. clinicians, researchers, and policymakers by applying natural language processing to develop a CovidNLP dashboard in order to speed up knowledge discovery. The WHO has created a repository of about more than 5000 peer-reviewed and curated research articles on varied aspects including epidemiology, clinical features, diagnosis, treatment, social factors, and economics. We summarised all the articles in the WHO Database through an extractive summarizer followed by an exploration of the feature space using word embeddings which were then used to visualize the summarized associations of COVID-19 as found in the text. Clinicians, researchers, and policymakers will not only discover the direct effects of COVID-19 but also the systematic implications such as the anticipated rise in TB and cancer mortality due to the non-availability of drugs during the export lockdown as highlighted by our models. These demonstrate the utility of mining massive literature with natural language processing for rapid distillation and knowledge updates. This can help the users understand, synthesize, and take pre-emptive action with the available peer-reviewed evidence on COVID-19. Our models will be continuously updated with new literature and we have made our resource CovidNLP publicly available in a user-friendly fashion at http://covidnlp.tavlab.iiitd.edu.in/.Data Availability StatementAll the data used in this study are publicly available from the WHO Covid-19 Global Literature on coronavirus disease maintained at https://search.bvsalud.org/global-literature-on-novel-coronavirus-2019-ncov/. Our analysis and the interactive resource CovidNLP is publicly available in a user friendly fashion at http://covidnlp.tavlab.iiitd.edu.in

Download Full-text

PathScore: a web tool for identifying altered pathways in cancer data

10.1101/067090 ◽

2016 ◽

Cited By ~ 2

Author(s):

Stephen G. Gaffney ◽

Jeffrey P. Townsend

Keyword(s):

Web Application ◽

Somatic Mutations ◽

Supplementary Information ◽

Web Tool ◽

Cancer Data ◽

Link Type ◽

Novel Approach ◽

Supplementary Material ◽

User Friendly ◽

Pathway Effect

ABSTRACTSummaryPathScore quantifies the level of enrichment of somatic mutations within curated pathways, applying a novel approach that identifies pathways enriched across patients. The application provides several user-friendly, interactive graphic interfaces for data exploration, including tools for comparing pathway effect sizes, significance, gene-set overlap and enrichment differences between projects.Availability and ImplementationWeb application available at pathscore.publichealth.yale.edu. Site implemented in Python and MySQL, with all major browsers supported. Source code available at github.com/sggaffney/pathscore with a GPLv3 [email protected] InformationAdditional documentation can be found at http://pathscore.publichealth.yale.edu/faq.

Download Full-text

MetENP/MetENPWeb: An R package and web application for metabolomics enrichment and pathway analysis in Metabolomics Workbench

10.1101/2020.11.20.391912 ◽

2020 ◽

Author(s):

Kumari Sonal Choudhary ◽

Eoin Fahy ◽

Kevin Coakley ◽

Manish Sud ◽

Mano R Maurya ◽

...

Keyword(s):

Pathway Analysis ◽

Web Application ◽

Enrichment Analysis ◽

R Package ◽

Pathway Enrichment Analysis ◽

Pathway Enrichment ◽

Kegg Pathways ◽

Link Type ◽

Species Specific ◽

User Friendly

ABSTRACTWith the advent of high throughput mass spectrometric methods, metabolomics has emerged as an essential area of research in biomedicine with the potential to provide deep biological insights into normal and diseased functions in physiology. However, to achieve the potential offered by metabolomics measures, there is a need for biologist-friendly integrative analysis tools that can transform data into mechanisms that relate to phenotypes. Here, we describe MetENP, an R package, and a user-friendly web application deployed at the Metabolomics Workbench site extending the metabolomics enrichment analysis to include species-specific pathway analysis, pathway enrichment scores, gene-enzyme information, and enzymatic activities of the significantly altered metabolites. MetENP provides a highly customizable workflow through various user-specified options and includes support for all metabolite species with available KEGG pathways. MetENPweb is a web application for calculating metabolite and pathway enrichment analysis.Availability and ImplementationThe MetENP package is freely available from Metabolomics Workbench GitHub: (https://github.com/metabolomicsworkbench/MetENP), the web application, is freely available at (https://www.metabolomicsworkbench.org/data/analyze.php)

Download Full-text

OSkirc: a web tool for identifying prognostic biomarkers in kidney renal clear cell carcinoma

Future Oncology ◽

10.2217/fon-2019-0296 ◽

2019 ◽

Vol 15 (27) ◽

pp. 3103-3110 ◽

Cited By ~ 15

Author(s):

Longxiang Xie ◽

Qiang Wang ◽

Yifang Dang ◽

Linna Ge ◽

Xiaoxiao Sun ◽

...

Keyword(s):

Gene Expression ◽

Cell Carcinoma ◽

Web Application ◽

Clear Cell ◽

Clear Cell Carcinoma ◽

Renal Clear Cell Carcinoma ◽

Gene Expression Omnibus ◽

The Cancer Genome Atlas ◽

P Value ◽

Survival Plot

Aim: To develop a free and quick analysis online tool that allows users to easily investigate the prognostic potencies of interesting genes in kidney renal clear cell carcinoma (KIRC). Patients & methods: A total of 629 KIRC cases with gene expression profiling data and clinical follow-up information are collected from public Gene Expression Omnibus and The Cancer Genome Atlas databases. Results: One web application called Online consensus Survival analysis for KIRC (OSkirc) that can be used for exploring the prognostic implications of interesting genes in KIRC was constructed. By OSkirc, users could simply input the gene symbol to receive the Kaplan–Meier survival plot with hazard ratio and log-rank p-value. Conclusion: OSkirc is extremely valuable for basic and translational researchers to screen and validate the prognostic potencies of genes for KIRC, publicly accessible at http://bioinfo.henu.edu.cn/KIRC/KIRCList.jsp

Download Full-text

re-Searcher: GUI-based bioinformatics tool for simplified genomics data mining of VCF files

PeerJ ◽

10.7717/peerj.11333 ◽

2021 ◽

Vol 9 ◽

pp. e11333

Author(s):

Daniyar Karabayev ◽

Askhat Molkenov ◽

Kaiyrgali Yerulanuly ◽

Ilyas Kabimoldayev ◽

Asset Daniyarov ◽

...

Keyword(s):

Web Application ◽

High Throughput Sequencing ◽

Sequencing Data ◽

Data Types ◽

Standard Format ◽

Standard Data ◽

Additional Information ◽

Link Type ◽

Sequencing Platforms ◽

User Friendly

Background High-throughput sequencing platforms generate a massive amount of high-dimensional genomic datasets that are available for analysis. Modern and user-friendly bioinformatics tools for analysis and interpretation of genomics data becomes essential during the analysis of sequencing data. Different standard data types and file formats have been developed to store and analyze sequence and genomics data. Variant Call Format (VCF) is the most widespread genomics file type and standard format containing genomic information and variants of sequenced samples. Results Existing tools for processing VCF files don’t usually have an intuitive graphical interface, but instead have just a command-line interface that may be challenging to use for the broader biomedical community interested in genomics data analysis. re-Searcher solves this problem by pre-processing VCF files by chunks to not load RAM of computer. The tool can be used as standalone user-friendly multiplatform GUI application as well as web application (https://nla-lbsb.nu.edu.kz). The software including source code as well as tested VCF files and additional information are publicly available on the GitHub repository (https://github.com/LabBandSB/re-Searcher).

Download Full-text

MicroScope: ChIP-seq and RNA-seq software analysis suite for gene expression heatmaps

10.1101/034694 ◽

2015 ◽

Cited By ~ 1

Author(s):

Bohdan B. Khomtchouk ◽

James R. Hennessy ◽

Claes Wahlestedt

Keyword(s):

Web Application ◽

Differential Expression Analysis ◽

Dynamic Network ◽

Principal Component ◽

Rna Seq ◽

Software Suite ◽

Software Analysis ◽

Link Type ◽

R Shiny ◽

User Friendly

AbstractWe propose a user-friendly ChIP-seq and RNA-seq software suite for the interactive visualization and analysis of genomic data, including integrated features to support differential expression analysis, interactive heatmap production, principal component analysis, gene ontology analysis, and dynamic network analysis.MicroScope is hosted online as an R Shiny web application based on the D3 JavaScript library: http://microscopebioinformatics.org/. The methods are implemented in R, and are available as part of the MicroScope project at: https://github.com/Bohdan-Khomtchouk/Microscope.

Download Full-text