Finding useful data across multiple biomedical data repositories using DataMed

Lucila Ohno-Machado; Susanna-Assunta Sansone; George Alter; Ian Fore; Jeffrey Grethe; Hua Xu; Alejandra Gonzalez-Beltran; Philippe Rocca-Serra; Anupama E Gururaj; Elizabeth Bell; Ergin Soysal; Nansu Zong; Hyeon-eui Kim

doi:10.1038/ng.3864

An omics perspective on drug target discovery platforms

Briefings in Bioinformatics ◽

10.1093/bib/bbz122 ◽

2019 ◽

Vol 21 (6) ◽

pp. 1937-1953 ◽

Cited By ~ 5

Author(s):

Jussi Paananen ◽

Vittorio Fortino

Keyword(s):

Drug Target ◽

Drug Targets ◽

Knowledge Bases ◽

Scientific Data ◽

Data Sources ◽

Biomedical Data ◽

Drug Target Discovery ◽

Data Repositories ◽

Target Discovery ◽

Omics Technologies

Abstract The drug discovery process starts with identification of a disease-modifying target. This critical step traditionally begins with manual investigation of scientific literature and biomedical databases to gather evidence linking molecular target to disease, and to evaluate the efficacy, safety and commercial potential of the target. The high-throughput and affordability of current omics technologies, allowing quantitative measurements of many putative targets (e.g. DNA, RNA, protein, metabolite), has exponentially increased the volume of scientific data available for this arduous task. Therefore, computational platforms identifying and ranking disease-relevant targets from existing biomedical data sources, including omics databases, are needed. To date, more than 30 drug target discovery (DTD) platforms exist. They provide information-rich databases and graphical user interfaces to help scientists identify putative targets and pre-evaluate their therapeutic efficacy and potential side effects. Here we survey and compare a set of popular DTD platforms that utilize multiple data sources and omics-driven knowledge bases (either directly or indirectly) for identifying drug targets. We also provide a description of omics technologies and related data repositories which are important for DTD tasks.

Download Full-text

A Data Mining Based Approach for Collaborative Analysis of Biomedical Data

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213014600100 ◽

2014 ◽

Vol 23 (04) ◽

pp. 1460010 ◽

Cited By ~ 1

Author(s):

Georgia Tsiliki ◽

Sophia Kossida ◽

Natalja Friesen ◽

Stefan Rüping ◽

Manolis Tzagarakis ◽

...

Keyword(s):

Data Mining ◽

Data Mining Algorithm ◽

Biomedical Data ◽

Data Repositories ◽

Mining Algorithm ◽

Research Questions ◽

Collaborative Analysis ◽

Set Up ◽

Available Information ◽

Collaborative Decision

Biomedical research becomes increasingly multidisciplinary and collaborative in nature. At the same time, it has recently seen a vast growth in publicly and instantly available information. As the available resources become more specialized, there is a growing need for multidisciplinary collaborations between biomedical researchers to address complex research questions. We present an application of a data mining algorithm to genomic data in a collaborative decision-making support environment, as a typical example of how multidisciplinary researchers can collaborate in analyzing and interpreting biomedical data. Through the proposed approach, researchers can easily decide about which data repositories should be considered, analyze the algorithmic results, discuss the weaknesses of the patterns identified, and set up new iterations of the data mining algorithm by defining other descriptive attributes or integrating other relevant data. Evaluation results show that the proposed approach facilitates users to set their research objectives and better understand the data and methodologies used in their research.

Download Full-text

Bias invariant RNA-seq metadata annotation

10.1101/2020.11.26.399568 ◽

2020 ◽

Author(s):

Hannes Wartmann ◽

Sven Heins ◽

Karin Kloiber ◽

Stefan Bonn

Keyword(s):

Deep Learning ◽

Domain Adaptation ◽

Tissue Sample ◽

Large Data ◽

Biomedical Data ◽

Rna Seq ◽

Adaptation Algorithm ◽

Data Repositories ◽

Technological Advances ◽

Metadata Annotation

AbstractRecent technological advances have resulted in an unprecedented increase in publicly available biomedical data, yet the reuse of the data is often precluded by experimental bias and a lack of annotation depth and consistency. Here we investigate RNA-seq metadata prediction based on gene expression values. We present a deep-learning based domain adaptation algorithm for the automatic annotation of RNA-seq metadata. We show how our algorithm outperforms existing approaches as well as traditional deep learning methods for the prediction of tissue, sample source, and patient sex information across several large data repositories. By using a model architecture similar to siamese networks the algorithm is able to learn biases from datasets with few samples. Our domain adaptation approach achieves metadata annotation accuracies up to 12.3% better than a previously published method. Lastly, we provide a list of more than 10,000 novel tissue and sex label annotations for 8,495 unique SRA samples.

Download Full-text

Improving Research Patient Data Repositories from a Health Data Industry Viewpoint (Preprint)

10.2196/preprints.32845 ◽

2021 ◽

Author(s):

Chunlei Tang ◽

Li Zhou ◽

Joseph Plasek ◽

Yangyong Zhu ◽

Yajun Huang ◽

...

Keyword(s):

Best Practices ◽

Data Science ◽

Patient Data ◽

Biomedical Data ◽

Data Repositories ◽

New Developments ◽

Cross Border ◽

Critical Resource ◽

Central Resource ◽

Science And Research

UNSTRUCTURED Electronic patient data are critical to clinical and translational science, and research patient data repositories (RPDRs) are a central resource for any work in biomedical data science. However, the data science ecosystem, due to its inherently transdisciplinary nature, poses challenges to existing RPDRs and demands expansions and new developments, calling for a wide variety of new functions and capabilities in the administrative, educational, and organizational domains. The power of data science in the business realm is tremendous. In business, it is already viewed as a critical resource, and this will likely occur in healthcare as well. This perspective focuses on best practices in developing RPDRs, and identifies areas which we believe have not received enough attention. These include deployment, contribution calculation, internal talent marketplaces, data partnerships, data sovereigns’ new capital assets, and cross-border data sharing.

Download Full-text

SATORI: a system for ontology-guided visual exploration of biomedical data repositories

Bioinformatics ◽

10.1093/bioinformatics/btx739 ◽

2017 ◽

Vol 34 (7) ◽

pp. 1200-1207 ◽

Cited By ~ 2

Author(s):

Fritz Lekschas ◽

Nils Gehlenborg

Keyword(s):

Visual Exploration ◽

Biomedical Data ◽

Data Repositories

Download Full-text

Dug: A Semantic Search Engine Leveraging Peer-Reviewed Literature to Span Biomedical Data Repositories

10.1101/2021.07.07.451461 ◽

2021 ◽

Author(s):

Alexander M Waldrop ◽

John B Cheadle ◽

Kira Bradford ◽

Nathan T Braswell ◽

Matt Watson ◽

...

Keyword(s):

Semantic Search ◽

Evidence Based ◽

Biomedical Data ◽

Data Repositories ◽

Public Data ◽

National Heart Lung ◽

Search Tool ◽

Public Datasets ◽

Scientific Questions ◽

Semantic Search Engine

As the number of public data resources continues to proliferate, identifying relevant datasets across heterogenous repositories is becoming critical to answering scientific questions. To help researchers navigate this data landscape, we developed Dug: a semantic search tool for biomedical datasets that utilizes evidence-based relationships from curated knowledge graphs to find relevant datasets and explain why those results are returned. Developed through the National Heart, Lung, and Blood Institute's (NHLBI) BioData Catalyst ecosystem, Dug can index more than 15,911 study variables from public datasets in just over 39 minutes. On a manually curated search dataset, Dug's mean recall (total relevant results/total results) of 0.79 outperformed default Elasticsearch's mean recall of 0.76. When using synonyms or related concepts as search queries, Dug's (0.28) far outperforms Elasticsearch (0.1) in terms of mean recall. Dug is freely available at https://github.com/helxplatform/dug, and an example Dug deployment is also available for use at https://helx.renci.org/ui.

Download Full-text

Kinematics of Big Biomedical Data to characterize temporal variability and seasonality of data repositories: Functional Data Analysis of data temporal evolution over non-parametric statistical manifolds

International Journal of Medical Informatics ◽

10.1016/j.ijmedinf.2018.09.015 ◽

2018 ◽

Vol 119 ◽

pp. 109-124 ◽

Cited By ~ 6

Author(s):

Carlos Sáez ◽

Juan M García-Gómez

Keyword(s):

Data Analysis ◽

Functional Data Analysis ◽

Functional Data ◽

Temporal Variability ◽

Temporal Evolution ◽

Biomedical Data ◽

Data Repositories ◽

Statistical Manifolds ◽

Non Parametric

Download Full-text

Precision Medicine Landscape of Genomic Testing for Patients With Cancer in the National Institutes of Health All of Us Database Using Informatics Approaches

JCO Clinical Cancer Informatics ◽

10.1200/cci.21.00152 ◽

2022 ◽

Author(s):

Jay G. Ronquillo ◽

William T. Lester

Keyword(s):

Precision Medicine ◽

Race And Ethnicity ◽

Gene Mutations ◽

National Institutes Of Health ◽

Genomic Testing ◽

Biomedical Data ◽

Data Repositories ◽

Cancer Data ◽

Patients With Cancer ◽

Significant Difference

PURPOSE The rapid growth of biomedical data ecosystems has catalyzed research for oncology and precision medicine. We leverage federal cloud-based precision medicine databases and tools to better understand the current landscape of precision medicine and genomic testing for patients with cancer. METHODS Retrospective observational study of genomic testing for patients with cancer in the National Institutes of Health All of Us Research Program, with the cancer cohort defined as having at least two documented or reported cancer diagnoses. RESULTS There were 5,678 (1.8%) All of Us participants in the cancer cohort, with a significant difference between cancer status by age category, sex, race, and ethnicity ( P < .001 for all). There were 295 (5.2%) patients with cancer who received genomic testing compared with 6,734 (2.2%) of noncancer patients, with 752 genomic tests commonly focused on gene mutations (primarily pharmacogenomics), molecular pathology, or clinical cytogenetic reports. CONCLUSION Although not yet ubiquitous, diverse clinical genomic analyses in oncology can set the stage to grow the practice of precision medicine by integrating research patient data repositories, cancer data ecosystems, and biomedical informatics.

Download Full-text

DataMed: Finding useful data across multiple biomedical data repositories

10.1101/094888 ◽

2016 ◽

Cited By ~ 1

Author(s):

L Ohno-Machado ◽

SA Sansone ◽

G Alter ◽

I Fore ◽

J Grethe ◽

...

Keyword(s):

Big Data ◽

Biomedical Research ◽

Scientific Literature ◽

Service Providers ◽

Research Community ◽

Biomedical Data ◽

Data Repositories ◽

Data Intensive ◽

Fair Principles ◽

Community Of Researchers

AbstractThe value of broadening searches for data across multiple repositories has been identified by the biomedical research community. As part of the NIH Big Data to Knowledge initiative, we work with an international community of researchers, service providers and knowledge experts to develop and test a data index and search engine, which are based on metadata extracted from various datasets in a range of repositories. DataMed is designed to be, for data, what PubMed has been for the scientific literature. DataMed supports Findability and Accessibility of datasets. These characteristics - along with Interoperability and Reusability - compose the four FAIR principles to facilitate knowledge discovery in today’s big data-intensive science landscape.

Download Full-text

SATORI: A System for Ontology-Guided Visual Exploration of Biomedical Data Repositories

10.1101/046755 ◽

2016 ◽

Author(s):

Fritz Lekschas ◽

Nils Gehlenborg

Keyword(s):

Real World ◽

Web Application ◽

Large Data ◽

Current Data ◽

Visual Exploration ◽

Data Sets ◽

Biomedical Data ◽

Structured Interviews ◽

Real World Data ◽

Data Repositories

AbstractThe ever-increasing number of biomedical data sets provides tremendous opportunities for re-use but current data repositories provide limited means of exploration apart from text-based search. Ontological metadata annotations provide context by semantically relating data sets. Visualizing this rich network of relationships can improve the explorability of large data repositories and help researchers find data sets of interest. We developed SATORI—an integrative search and visual exploration interface for the exploration of biomedical data repositories. The design is informed by a requirements analysis through a series of semi-structured interviews. We evaluated the implementation of SATORI in a field study on a real-world data collection.SATORI enables researchers to seamlessly search, browse, and semantically query data repositories via two visualizations that are highly interconnected with a powerful search interface. SATORI is an open-source web application,which is freely available at http://satori.refinery-platform.org and integrated into the Refinery Platform.

Download Full-text