DataMed: Finding useful data across multiple biomedical data repositories

Mapping Intimacies ◽

10.1101/094888 ◽

2016 ◽

Cited By ~ 1

Author(s):

L Ohno-Machado ◽

SA Sansone ◽

G Alter ◽

I Fore ◽

J Grethe ◽

...

Keyword(s):

Big Data ◽

Biomedical Research ◽

Scientific Literature ◽

Service Providers ◽

Research Community ◽

Biomedical Data ◽

Data Repositories ◽

Data Intensive ◽

Fair Principles ◽

Community Of Researchers

AbstractThe value of broadening searches for data across multiple repositories has been identified by the biomedical research community. As part of the NIH Big Data to Knowledge initiative, we work with an international community of researchers, service providers and knowledge experts to develop and test a data index and search engine, which are based on metadata extracted from various datasets in a range of repositories. DataMed is designed to be, for data, what PubMed has been for the scientific literature. DataMed supports Findability and Accessibility of datasets. These characteristics - along with Interoperability and Reusability - compose the four FAIR principles to facilitate knowledge discovery in today’s big data-intensive science landscape.

Download Full-text

The medical science DMZ: a network design pattern for data-intensive medical science

Journal of the American Medical Informatics Association ◽

10.1093/jamia/ocx104 ◽

2017 ◽

Vol 25 (3) ◽

pp. 267-274 ◽

Cited By ~ 4

Author(s):

Sean Peisert ◽

Eli Dart ◽

William Barnett ◽

Edward Balas ◽

James Cuff ◽

...

Keyword(s):

Biomedical Research ◽

Network Flows ◽

Medical Science ◽

High Capacity ◽

Security And Privacy ◽

Research Networks ◽

Data Repositories ◽

Data Intensive ◽

Network Intrusion ◽

Capacity Data

Abstract Objective We describe a detailed solution for maintaining high-capacity, data-intensive network flows (eg, 10, 40, 100 Gbps+) in a scientific, medical context while still adhering to security and privacy laws and regulations. Materials and Methods High-end networking, packet-filter firewalls, network intrusion-detection systems. Results We describe a “Medical Science DMZ” concept as an option for secure, high-volume transport of large, sensitive datasets between research institutions over national research networks, and give 3 detailed descriptions of implemented Medical Science DMZs. Discussion The exponentially increasing amounts of “omics” data, high-quality imaging, and other rapidly growing clinical datasets have resulted in the rise of biomedical research “Big Data.” The storage, analysis, and network resources required to process these data and integrate them into patient diagnoses and treatments have grown to scales that strain the capabilities of academic health centers. Some data are not generated locally and cannot be sustained locally, and shared data repositories such as those provided by the National Library of Medicine, the National Cancer Institute, and international partners such as the European Bioinformatics Institute are rapidly growing. The ability to store and compute using these data must therefore be addressed by a combination of local, national, and industry resources that exchange large datasets. Maintaining data-intensive flows that comply with the Health Insurance Portability and Accountability Act (HIPAA) and other regulations presents a new challenge for biomedical research. We describe a strategy that marries performance and security by borrowing from and redefining the concept of a Science DMZ, a framework that is used in physical sciences and engineering research to manage high-capacity data flows. Conclusion By implementing a Medical Science DMZ architecture, biomedical researchers can leverage the scale provided by high-performance computer and cloud storage facilities and national high-speed research networks while preserving privacy and meeting regulatory requirements.

Download Full-text

Development of an informatics system for accelerating biomedical research.

F1000Research ◽

10.12688/f1000research.19161.2 ◽

2020 ◽

Vol 8 ◽

pp. 1430 ◽

Cited By ~ 1

Author(s):

Vivek Navale ◽

Michele Ji ◽

Olga Vovk ◽

Leonie Misquitta ◽

Tsega Gebremichael ◽

...

Keyword(s):

Biomedical Research ◽

Science Research ◽

Computing System ◽

Study Data ◽

Research Management ◽

Data Repository ◽

Biomedical Data ◽

Unique Identifier ◽

Data Repositories ◽

Common Data Elements

The Biomedical Research Informatics Computing System (BRICS) was developed to support multiple disease-focused research programs. Seven service modules are integrated together to provide a collaborative and extensible web-based environment. The modules—Data Dictionary, Account Management, Query Tool, Protocol and Form Research Management System, Meta Study, Data Repository and Globally Unique Identifier —facilitate the management of research protocols, to submit, process, curate, access and store clinical, imaging, and derived genomics data within the associated data repositories. Multiple instances of BRICS are deployed to support various biomedical research communities focused on accelerating discoveries for rare diseases, Traumatic Brain Injury, Parkinson’s Disease, inherited eye diseases and symptom science research. No Personally Identifiable Information is stored within the data repositories. Digital Object Identifiers are associated with the research studies. Reusability of biomedical data is enhanced by Common Data Elements (CDEs) which enable systematic collection, analysis and sharing of data. The use of CDEs with a service-oriented informatics architecture enabled the development of disease-specific repositories that support hypothesis-based biomedical research.

Download Full-text

Enabling Web-scale data integration in biomedicine through Linked Open Data

npj Digital Medicine ◽

10.1038/s41746-019-0162-5 ◽

2019 ◽

Vol 2 (1) ◽

Cited By ~ 3

Author(s):

Maulik R. Kamdar ◽

Javier D. Fernández ◽

Axel Polleres ◽

Tania Tudorache ◽

Mark A. Musen

Keyword(s):

Data Integration ◽

Biomedical Research ◽

Semantic Processing ◽

Open Data ◽

Heterogeneous Data ◽

Research Community ◽

Linked Open Data ◽

Biomedical Data ◽

Semantic Web Technologies ◽

The Web

Abstract The biomedical data landscape is fragmented with several isolated, heterogeneous data and knowledge sources, which use varying formats, syntaxes, schemas, and entity notations, existing on the Web. Biomedical researchers face severe logistical and technical challenges to query, integrate, analyze, and visualize data from multiple diverse sources in the context of available biomedical knowledge. Semantic Web technologies and Linked Data principles may aid toward Web-scale semantic processing and data integration in biomedicine. The biomedical research community has been one of the earliest adopters of these technologies and principles to publish data and knowledge on the Web as linked graphs and ontologies, hence creating the Life Sciences Linked Open Data (LSLOD) cloud. In this paper, we provide our perspective on some opportunities proffered by the use of LSLOD to integrate biomedical data and knowledge in three domains: (1) pharmacology, (2) cancer research, and (3) infectious diseases. We will discuss some of the major challenges that hinder the wide-spread use and consumption of LSLOD by the biomedical research community. Finally, we provide a few technical solutions and insights that can address these challenges. Eventually, LSLOD can enable the development of scalable, intelligent infrastructures that support artificial intelligence methods for augmenting human intelligence to achieve better clinical outcomes for patients, to enhance the quality of biomedical research, and to improve our understanding of living systems.

Download Full-text

The Medical Science DMZ

Journal of the American Medical Informatics Association ◽

10.1093/jamia/ocw032 ◽

2016 ◽

Vol 23 (6) ◽

pp. 1199-1201 ◽

Cited By ~ 6

Author(s):

Sean Peisert ◽

William Barnett ◽

Eli Dart ◽

James Cuff ◽

Robert L Grossman ◽

...

Keyword(s):

Biomedical Research ◽

Medical Science ◽

High Capacity ◽

Security And Privacy ◽

Data Sets ◽

Research Networks ◽

Data Repositories ◽

Data Intensive ◽

Network Intrusion ◽

Capacity Data

Abstract Objective We describe use cases and an institutional reference architecture for maintaining high-capacity, data-intensive network flows (e.g., 10, 40, 100 Gbps+) in a scientific, medical context while still adhering to security and privacy laws and regulations. Materials and Methods High-end networking, packet filter firewalls, network intrusion detection systems. Results We describe a “Medical Science DMZ” concept as an option for secure, high-volume transport of large, sensitive data sets between research institutions over national research networks. Discussion The exponentially increasing amounts of “omics” data, the rapid increase of high-quality imaging, and other rapidly growing clinical data sets have resulted in the rise of biomedical research “big data.” The storage, analysis, and network resources required to process these data and integrate them into patient diagnoses and treatments have grown to scales that strain the capabilities of academic health centers. Some data are not generated locally and cannot be sustained locally, and shared data repositories such as those provided by the National Library of Medicine, the National Cancer Institute, and international partners such as the European Bioinformatics Institute are rapidly growing. The ability to store and compute using these data must therefore be addressed by a combination of local, national, and industry resources that exchange large data sets. Maintaining data-intensive flows that comply with HIPAA and other regulations presents a new challenge for biomedical research. Recognizing this, we describe a strategy that marries performance and security by borrowing from and redefining the concept of a “Science DMZ”—a framework that is used in physical sciences and engineering research to manage high-capacity data flows. Conclusion By implementing a Medical Science DMZ architecture, biomedical researchers can leverage the scale provided by high-performance computer and cloud storage facilities and national high-speed research networks while preserving privacy and meeting regulatory requirements.

Download Full-text

Estimating the scale of biomedical data generation using text mining

10.1101/182857 ◽

2017 ◽

Author(s):

Gabriel Rosenfeld ◽

Dawei Lin

Keyword(s):

Text Mining ◽

Biomedical Research ◽

Word Embedding ◽

Research Articles ◽

Free Text ◽

Similar Amount ◽

Biomedical Data ◽

Data Types ◽

Data Repositories ◽

The Impact

AbstractWhile the impact of biomedical research has traditionally been measured using bibliographic metrics such as citation or journal impact factor, the data itself is an output which can be directly measured to provide additional context about a publication’s impact. Data are a resource that can be repurposed and reused providing dividends on the original investment used to support the primary work. Moreover, it is the cornerstone upon which a tested hypothesis is rejected or accepted and specific scientific conclusions are reached. Understanding how and where it is being produced enhances the transparency and reproducibility of the biomedical research enterprise. Most biomedical data are not directly deposited in data repositories and are instead found in the publication within figures or attachments making it hard to measure. We attempted to address this challenge by using recent advances in word embedding to identify the technical and methodological features of terms used in the free text of articles’ methods sections. We created term usage signatures for five types of biomedical research data, which were used in univariate clustering to correctly identify a large fraction of positive control articles and a set of manually annotated articles where generation of data types could be validated. The approach was then used to estimate the fraction of PLOS articles generating each biomedical data type over time. Out of all PLOS articles analyzed (n = 129,918), ~7%, 19%, 12%, 18%, and 6% generated flow cytometry, immunoassay, genomic microarray, microscopy, and high-throughput sequencing data. The estimate portends a vast amount of biomedical data being produced: in 2016, if other publishers generated a similar amount of data then roughly 40,000 NIH-funded research articles would produce ~56,000 datasets consisting of the five data types we analyzed.One Sentence SummaryApplication of a word-embedding model trained on the methods sections of research articles allows for estimation of the production of diverse biomedical data types using text mining.

Download Full-text

Issues in security and privacy of big data

International Journal of Advanced Research in Computer Science and Software Engineering ◽

10.23956/ijarcsse.v7i12.482 ◽

2018 ◽

Vol 7 (12) ◽

pp. 1

Author(s):

Shaveta Bhatia

Keyword(s):

Cloud Computing ◽

Big Data ◽

Approximate Method ◽

Biomedical Research ◽

Cyber Security ◽

Data Science ◽

Third Party ◽

Security And Privacy ◽

Security Threats ◽

The Third

The epoch of the big data presents many opportunities for the development in the range of data science, biomedical research cyber security, and cloud computing. Nowadays the big data gained popularity. It also invites many provocations and upshot in the security and privacy of the big data. There are various type of threats, attacks such as leakage of data, the third party tries to access, viruses and vulnerability that stand against the security of the big data. This paper will discuss about the security threats and their approximate method in the field of biomedical research, cyber security and cloud computing.

Download Full-text

Learning Analytics Considered Harmful

Online Learning ◽

10.24059/olj.v16i3.272 ◽

2012 ◽

Vol 16 (3) ◽

Cited By ~ 3

Author(s):

Laurie P Dringus

Keyword(s):

Big Data ◽

Online Learning ◽

Learning Analytics ◽

Online Courses ◽

Research Community ◽

Current Status ◽

Online Course ◽

Full Potential

This essay is written to present a prospective stance on how learning analytics, as a core evaluative approach, must help instructors uncover the important trends and evidence of quality learner data in the online course. A critique is presented of strategic and tactical issues of learning analytics. The approach to the critique is taken through the lens of questioning the current status of applying learning analytics to online courses. The goal of the discussion is twofold: (1) to inform online learning practitioners (e.g., instructors and administrators) of the potential of learning analytics in online courses and (2) to broaden discussion in the research community about the advancement of learning analytics in online learning. In recognizing the full potential of formalizing big data in online coures, the community must address this issue also in the context of the potentially "harmful" application of learning analytics.

Download Full-text

The academic viewpoint on Big data and patient data ownership (as seen in the scientific literature)

European Journal of Public Health ◽

10.1093/eurpub/ckaa166.171 ◽

2020 ◽

Vol 30 (Supplement_5) ◽

Author(s):

I Mircheva ◽

M Mirchev

Keyword(s):

Public Health ◽

Big Data ◽

Medical Research ◽

Information And Communication Technologies ◽

Patient Information ◽

Scientific Literature ◽

Medical Data ◽

Patient Data ◽

Academic Community ◽

Ethical Challenges

Abstract Background Ownership of patient information in the context of Big Data is a relatively new problem, apparently not yet fully understood. There are not enough publications on the subject. Since the topic is interdisciplinary, incorporating legal, ethical, medical and aspects of information and communication technologies, a slightly more sophisticated analysis of the issue is needed. Aim To determine how the medical academic community perceives the issue of ownership of patient information in the context of Big Data. Methods Literature search for full text publications, indexed in PubMed, Springer, ScienceDirect and Scopus identified only 27 appropriate articles authored by academicians and corresponding to three focus areas: problem (ownership); area (healthcare); context (Big Data). Three major aspects were studied: scientific area of publications, aspects and academicians' perception of ownership in the context of Big Data. Results Publications are in the period 2014 - 2019, 37% published in health and medical informatics journals, 30% in medicine and public health, 19% in law and ethics; 78% authored by American and British academicians, highly cited. The majority (63%) are in the area of scientific research - clinical studies, access and use of patient data for medical research, secondary use of medical data, ethical challenges to Big data in healthcare. The majority (70%) of the publications discuss ownership in ethical and legal aspects and 67% see ownership as a challenge mostly to medical research, access control, ethics, politics and business. Conclusions Ownership of medical data is seen first and foremost as a challenge. Addressing this challenge requires the combined efforts of politicians, lawyers, ethicists, computer and medical professionals, as well as academicians, sharing these efforts, experiences and suggestions. However, this issue is neglected in the scientific literature. Publishing may help in open debates and adequate policy solutions. Key messages Ownership of patient information in the context of Big Data is a problem that should not be marginalized but needs a comprehensive attitude, consideration and combined efforts from all stakeholders. Overcoming the challenge of ownership may help in improving healthcare services, medical and public health research and the health of the population as a whole.

Download Full-text

The Challenge of the Effective Implementation of FAIR Principles in Biomedical Research

Methods of Information in Medicine ◽

10.1055/s-0040-1721726 ◽

2020 ◽

Vol 59 (04/05) ◽

pp. 117-118

Author(s):

Carlos Luis Parra-Calderón ◽

Ferran Sanz ◽

Leslie D. McIntosh

Keyword(s):

Biomedical Research ◽

Effective Implementation ◽

Fair Principles

Download Full-text

Finding useful data across multiple biomedical data repositories using DataMed

Nature Genetics ◽

10.1038/ng.3864 ◽

2017 ◽

Vol 49 (6) ◽

pp. 816-819 ◽

Cited By ~ 43

Author(s):

Lucila Ohno-Machado ◽

Susanna-Assunta Sansone ◽

George Alter ◽

Ian Fore ◽

Jeffrey Grethe ◽

...

Keyword(s):

Biomedical Data ◽

Data Repositories

Download Full-text