scholarly journals Finding useful data across multiple biomedical data repositories using DataMed

2017 ◽  
Vol 49 (6) ◽  
pp. 816-819 ◽  
Author(s):  
Lucila Ohno-Machado ◽  
Susanna-Assunta Sansone ◽  
George Alter ◽  
Ian Fore ◽  
Jeffrey Grethe ◽  
...  
2019 ◽  
Vol 21 (6) ◽  
pp. 1937-1953 ◽  
Author(s):  
Jussi Paananen ◽  
Vittorio Fortino

Abstract The drug discovery process starts with identification of a disease-modifying target. This critical step traditionally begins with manual investigation of scientific literature and biomedical databases to gather evidence linking molecular target to disease, and to evaluate the efficacy, safety and commercial potential of the target. The high-throughput and affordability of current omics technologies, allowing quantitative measurements of many putative targets (e.g. DNA, RNA, protein, metabolite), has exponentially increased the volume of scientific data available for this arduous task. Therefore, computational platforms identifying and ranking disease-relevant targets from existing biomedical data sources, including omics databases, are needed. To date, more than 30 drug target discovery (DTD) platforms exist. They provide information-rich databases and graphical user interfaces to help scientists identify putative targets and pre-evaluate their therapeutic efficacy and potential side effects. Here we survey and compare a set of popular DTD platforms that utilize multiple data sources and omics-driven knowledge bases (either directly or indirectly) for identifying drug targets. We also provide a description of omics technologies and related data repositories which are important for DTD tasks.


2014 ◽  
Vol 23 (04) ◽  
pp. 1460010 ◽  
Author(s):  
Georgia Tsiliki ◽  
Sophia Kossida ◽  
Natalja Friesen ◽  
Stefan Rüping ◽  
Manolis Tzagarakis ◽  
...  

Biomedical research becomes increasingly multidisciplinary and collaborative in nature. At the same time, it has recently seen a vast growth in publicly and instantly available information. As the available resources become more specialized, there is a growing need for multidisciplinary collaborations between biomedical researchers to address complex research questions. We present an application of a data mining algorithm to genomic data in a collaborative decision-making support environment, as a typical example of how multidisciplinary researchers can collaborate in analyzing and interpreting biomedical data. Through the proposed approach, researchers can easily decide about which data repositories should be considered, analyze the algorithmic results, discuss the weaknesses of the patterns identified, and set up new iterations of the data mining algorithm by defining other descriptive attributes or integrating other relevant data. Evaluation results show that the proposed approach facilitates users to set their research objectives and better understand the data and methodologies used in their research.


2020 ◽  
Author(s):  
Hannes Wartmann ◽  
Sven Heins ◽  
Karin Kloiber ◽  
Stefan Bonn

AbstractRecent technological advances have resulted in an unprecedented increase in publicly available biomedical data, yet the reuse of the data is often precluded by experimental bias and a lack of annotation depth and consistency. Here we investigate RNA-seq metadata prediction based on gene expression values. We present a deep-learning based domain adaptation algorithm for the automatic annotation of RNA-seq metadata. We show how our algorithm outperforms existing approaches as well as traditional deep learning methods for the prediction of tissue, sample source, and patient sex information across several large data repositories. By using a model architecture similar to siamese networks the algorithm is able to learn biases from datasets with few samples. Our domain adaptation approach achieves metadata annotation accuracies up to 12.3% better than a previously published method. Lastly, we provide a list of more than 10,000 novel tissue and sex label annotations for 8,495 unique SRA samples.


2021 ◽  
Author(s):  
Chunlei Tang ◽  
Li Zhou ◽  
Joseph Plasek ◽  
Yangyong Zhu ◽  
Yajun Huang ◽  
...  

UNSTRUCTURED Electronic patient data are critical to clinical and translational science, and research patient data repositories (RPDRs) are a central resource for any work in biomedical data science. However, the data science ecosystem, due to its inherently transdisciplinary nature, poses challenges to existing RPDRs and demands expansions and new developments, calling for a wide variety of new functions and capabilities in the administrative, educational, and organizational domains. The power of data science in the business realm is tremendous. In business, it is already viewed as a critical resource, and this will likely occur in healthcare as well. This perspective focuses on best practices in developing RPDRs, and identifies areas which we believe have not received enough attention. These include deployment, contribution calculation, internal talent marketplaces, data partnerships, data sovereigns’ new capital assets, and cross-border data sharing.


2021 ◽  
Author(s):  
Alexander M Waldrop ◽  
John B Cheadle ◽  
Kira Bradford ◽  
Nathan T Braswell ◽  
Matt Watson ◽  
...  

As the number of public data resources continues to proliferate, identifying relevant datasets across heterogenous repositories is becoming critical to answering scientific questions. To help researchers navigate this data landscape, we developed Dug: a semantic search tool for biomedical datasets that utilizes evidence-based relationships from curated knowledge graphs to find relevant datasets and explain why those results are returned. Developed through the National Heart, Lung, and Blood Institute's (NHLBI) BioData Catalyst ecosystem, Dug can index more than 15,911 study variables from public datasets in just over 39 minutes. On a manually curated search dataset, Dug's mean recall (total relevant results/total results) of 0.79 outperformed default Elasticsearch's mean recall of 0.76. When using synonyms or related concepts as search queries, Dug's (0.28) far outperforms Elasticsearch (0.1) in terms of mean recall. Dug is freely available at https://github.com/helxplatform/dug, and an example Dug deployment is also available for use at https://helx.renci.org/ui.


Author(s):  
Jay G. Ronquillo ◽  
William T. Lester

PURPOSE The rapid growth of biomedical data ecosystems has catalyzed research for oncology and precision medicine. We leverage federal cloud-based precision medicine databases and tools to better understand the current landscape of precision medicine and genomic testing for patients with cancer. METHODS Retrospective observational study of genomic testing for patients with cancer in the National Institutes of Health All of Us Research Program, with the cancer cohort defined as having at least two documented or reported cancer diagnoses. RESULTS There were 5,678 (1.8%) All of Us participants in the cancer cohort, with a significant difference between cancer status by age category, sex, race, and ethnicity ( P < .001 for all). There were 295 (5.2%) patients with cancer who received genomic testing compared with 6,734 (2.2%) of noncancer patients, with 752 genomic tests commonly focused on gene mutations (primarily pharmacogenomics), molecular pathology, or clinical cytogenetic reports. CONCLUSION Although not yet ubiquitous, diverse clinical genomic analyses in oncology can set the stage to grow the practice of precision medicine by integrating research patient data repositories, cancer data ecosystems, and biomedical informatics.


2016 ◽  
Author(s):  
L Ohno-Machado ◽  
SA Sansone ◽  
G Alter ◽  
I Fore ◽  
J Grethe ◽  
...  

AbstractThe value of broadening searches for data across multiple repositories has been identified by the biomedical research community. As part of the NIH Big Data to Knowledge initiative, we work with an international community of researchers, service providers and knowledge experts to develop and test a data index and search engine, which are based on metadata extracted from various datasets in a range of repositories. DataMed is designed to be, for data, what PubMed has been for the scientific literature. DataMed supports Findability and Accessibility of datasets. These characteristics - along with Interoperability and Reusability - compose the four FAIR principles to facilitate knowledge discovery in today’s big data-intensive science landscape.


2016 ◽  
Author(s):  
Fritz Lekschas ◽  
Nils Gehlenborg

AbstractThe ever-increasing number of biomedical data sets provides tremendous opportunities for re-use but current data repositories provide limited means of exploration apart from text-based search. Ontological metadata annotations provide context by semantically relating data sets. Visualizing this rich network of relationships can improve the explorability of large data repositories and help researchers find data sets of interest. We developed SATORI—an integrative search and visual exploration interface for the exploration of biomedical data repositories. The design is informed by a requirements analysis through a series of semi-structured interviews. We evaluated the implementation of SATORI in a field study on a real-world data collection.SATORI enables researchers to seamlessly search, browse, and semantically query data repositories via two visualizations that are highly interconnected with a powerful search interface. SATORI is an open-source web application,which is freely available at http://satori.refinery-platform.org and integrated into the Refinery Platform.


Sign in / Sign up

Export Citation Format

Share Document