The Medical Science DMZ

Sean Peisert; William Barnett; Eli Dart; James Cuff; Robert L Grossman; Edward Balas; Ari Berman; Anurag Shankar; Brian Tierney

doi:10.1093/jamia/ocw032

The Medical Science DMZ

Journal of the American Medical Informatics Association ◽

10.1093/jamia/ocw032 ◽

2016 ◽

Vol 23 (6) ◽

pp. 1199-1201 ◽

Cited By ~ 6

Author(s):

Sean Peisert ◽

William Barnett ◽

Eli Dart ◽

James Cuff ◽

Robert L Grossman ◽

...

Keyword(s):

Biomedical Research ◽

Medical Science ◽

High Capacity ◽

Security And Privacy ◽

Data Sets ◽

Research Networks ◽

Data Repositories ◽

Data Intensive ◽

Network Intrusion ◽

Capacity Data

Abstract Objective We describe use cases and an institutional reference architecture for maintaining high-capacity, data-intensive network flows (e.g., 10, 40, 100 Gbps+) in a scientific, medical context while still adhering to security and privacy laws and regulations. Materials and Methods High-end networking, packet filter firewalls, network intrusion detection systems. Results We describe a “Medical Science DMZ” concept as an option for secure, high-volume transport of large, sensitive data sets between research institutions over national research networks. Discussion The exponentially increasing amounts of “omics” data, the rapid increase of high-quality imaging, and other rapidly growing clinical data sets have resulted in the rise of biomedical research “big data.” The storage, analysis, and network resources required to process these data and integrate them into patient diagnoses and treatments have grown to scales that strain the capabilities of academic health centers. Some data are not generated locally and cannot be sustained locally, and shared data repositories such as those provided by the National Library of Medicine, the National Cancer Institute, and international partners such as the European Bioinformatics Institute are rapidly growing. The ability to store and compute using these data must therefore be addressed by a combination of local, national, and industry resources that exchange large data sets. Maintaining data-intensive flows that comply with HIPAA and other regulations presents a new challenge for biomedical research. Recognizing this, we describe a strategy that marries performance and security by borrowing from and redefining the concept of a “Science DMZ”—a framework that is used in physical sciences and engineering research to manage high-capacity data flows. Conclusion By implementing a Medical Science DMZ architecture, biomedical researchers can leverage the scale provided by high-performance computer and cloud storage facilities and national high-speed research networks while preserving privacy and meeting regulatory requirements.

Download Full-text

The medical science DMZ: a network design pattern for data-intensive medical science

Journal of the American Medical Informatics Association ◽

10.1093/jamia/ocx104 ◽

2017 ◽

Vol 25 (3) ◽

pp. 267-274 ◽

Cited By ~ 4

Author(s):

Sean Peisert ◽

Eli Dart ◽

William Barnett ◽

Edward Balas ◽

James Cuff ◽

...

Keyword(s):

Biomedical Research ◽

Network Flows ◽

Medical Science ◽

High Capacity ◽

Security And Privacy ◽

Research Networks ◽

Data Repositories ◽

Data Intensive ◽

Network Intrusion ◽

Capacity Data

Abstract Objective We describe a detailed solution for maintaining high-capacity, data-intensive network flows (eg, 10, 40, 100 Gbps+) in a scientific, medical context while still adhering to security and privacy laws and regulations. Materials and Methods High-end networking, packet-filter firewalls, network intrusion-detection systems. Results We describe a “Medical Science DMZ” concept as an option for secure, high-volume transport of large, sensitive datasets between research institutions over national research networks, and give 3 detailed descriptions of implemented Medical Science DMZs. Discussion The exponentially increasing amounts of “omics” data, high-quality imaging, and other rapidly growing clinical datasets have resulted in the rise of biomedical research “Big Data.” The storage, analysis, and network resources required to process these data and integrate them into patient diagnoses and treatments have grown to scales that strain the capabilities of academic health centers. Some data are not generated locally and cannot be sustained locally, and shared data repositories such as those provided by the National Library of Medicine, the National Cancer Institute, and international partners such as the European Bioinformatics Institute are rapidly growing. The ability to store and compute using these data must therefore be addressed by a combination of local, national, and industry resources that exchange large datasets. Maintaining data-intensive flows that comply with the Health Insurance Portability and Accountability Act (HIPAA) and other regulations presents a new challenge for biomedical research. We describe a strategy that marries performance and security by borrowing from and redefining the concept of a Science DMZ, a framework that is used in physical sciences and engineering research to manage high-capacity data flows. Conclusion By implementing a Medical Science DMZ architecture, biomedical researchers can leverage the scale provided by high-performance computer and cloud storage facilities and national high-speed research networks while preserving privacy and meeting regulatory requirements.

Download Full-text

DataMed: Finding useful data across multiple biomedical data repositories

10.1101/094888 ◽

2016 ◽

Cited By ~ 1

Author(s):

L Ohno-Machado ◽

SA Sansone ◽

G Alter ◽

I Fore ◽

J Grethe ◽

...

Keyword(s):

Big Data ◽

Biomedical Research ◽

Scientific Literature ◽

Service Providers ◽

Research Community ◽

Biomedical Data ◽

Data Repositories ◽

Data Intensive ◽

Fair Principles ◽

Community Of Researchers

AbstractThe value of broadening searches for data across multiple repositories has been identified by the biomedical research community. As part of the NIH Big Data to Knowledge initiative, we work with an international community of researchers, service providers and knowledge experts to develop and test a data index and search engine, which are based on metadata extracted from various datasets in a range of repositories. DataMed is designed to be, for data, what PubMed has been for the scientific literature. DataMed supports Findability and Accessibility of datasets. These characteristics - along with Interoperability and Reusability - compose the four FAIR principles to facilitate knowledge discovery in today’s big data-intensive science landscape.

Download Full-text

Reliable Metadata and the Creation of Trustworthy, Reproducible, and Re-usable Data Sets

Stepping in the Same River Twice ◽

10.12987/yale/9780300209549.003.0013 ◽

2017 ◽

Cited By ~ 1

Author(s):

Kristin Vanderbilt ◽

David Blankman

Keyword(s):

Data Sets ◽

Data Set ◽

Data Repositories ◽

Error Prevention ◽

Data Intensive ◽

Use Of Data ◽

Multiple Data ◽

Original Analysis ◽

Public Data ◽

Multiple Data Sets

Science has become a data-intensive enterprise. Data sets are commonly being stored in public data repositories and are thus available for others to use in new, often unexpected ways. Such re-use of data sets can take the form of reproducing the original analysis, analyzing the data in new ways, or combining multiple data sets into new data sets that are analyzed still further. A scientist who re-uses a data set collected by another must be able to assess its trustworthiness. This chapter reviews the types of errors that are found in metadata referring to data collected manually, data collected by instruments (sensors), and data recovered from specimens in museum collections. It also summarizes methods used to screen these types of data for errors. It stresses the importance of ensuring that metadata associated with a data set thoroughly document the error prevention, detection, and correction methods applied to the data set prior to publication.

Download Full-text

Performance analysis of communication links based on VCSEL and silicon photonics technology for high-capacity data-intensive scenario

Optics Express ◽

10.1364/oe.23.001806 ◽

2015 ◽

Vol 23 (2) ◽

pp. 1806 ◽

Cited By ~ 12

Author(s):

A. Boletti ◽

P. Boffi ◽

P. Martelli ◽

M. Ferrario ◽

M. Martinelli

Keyword(s):

Performance Analysis ◽

Silicon Photonics ◽

High Capacity ◽

Data Intensive ◽

Communication Links ◽

Capacity Data

Download Full-text

Issues in security and privacy of big data

International Journal of Advanced Research in Computer Science and Software Engineering ◽

10.23956/ijarcsse.v7i12.482 ◽

2018 ◽

Vol 7 (12) ◽

pp. 1

Author(s):

Shaveta Bhatia

Keyword(s):

Cloud Computing ◽

Big Data ◽

Approximate Method ◽

Biomedical Research ◽

Cyber Security ◽

Data Science ◽

Third Party ◽

Security And Privacy ◽

Security Threats ◽

The Third

The epoch of the big data presents many opportunities for the development in the range of data science, biomedical research cyber security, and cloud computing. Nowadays the big data gained popularity. It also invites many provocations and upshot in the security and privacy of the big data. There are various type of threats, attacks such as leakage of data, the third party tries to access, viruses and vulnerability that stand against the security of the big data. This paper will discuss about the security threats and their approximate method in the field of biomedical research, cyber security and cloud computing.

Download Full-text

Proteomics: Opportunities and Challenges

International Journal of Pharmaceutical Sciences and Nanotechnology ◽

10.37285/ijpsn.2010.3.4.1 ◽

2011 ◽

Vol 3 (4) ◽

pp. 1165-1172 ◽

Cited By ~ 1

Author(s):

Parag A Pathade ◽

Vinod A Bairagi ◽

Yogesh S. Ahire ◽

Neela M Bhatia

Keyword(s):

Biomedical Research ◽

Therapy Monitoring ◽

Dynamic State ◽

Data Sets ◽

Protein Markers ◽

Drug Target Discovery ◽

Proteomic Data ◽

Cell Tissue ◽

A Cell ◽

Analytical Tools

‘‘Proteomics’’, is the emerging technology leading to high-throughput identification and understanding of proteins. Proteomics is the protein equivalent of genomics and has captured the imagination of biomolecular scientists, worldwide. Because proteome reveals more accurately the dynamic state of a cell, tissue, or organism, much is expected from proteomics to indicate better disease markers for diagnosis and therapy monitoring. Proteomics is expected to play a major role in biomedical research, and it will have a significant impact on the development of diagnostics and therapeutics for cancer, heart ailments and infectious diseases, in future. Proteomics research leads to the identification of new protein markers for diagnostic purposes and novel molecular targets for drug discovery. Though the potential is great, many challenges and issues remain to be solved, such as gene expression, peptides, generation of low abundant proteins, analytical tools, drug target discovery and cost. A systematic and efficient analysis of vast genomic and proteomic data sets is a major challenge for researchers, today. Nevertheless, proteomics is the groundwork for constructing and extracting useful comprehension to biomedical research. This review article covers some opportunities and challenges offered by proteomics.

Download Full-text

Quadrature Modulation Techniques for High Capacity Data Transport over Polymer Optical Fibre Links

10.1109/ecoc.2006.4801424 ◽

2006 ◽

Cited By ~ 1

Author(s):

A.M.J. Koonen ◽

X. Li ◽

H.P.A. van den Boom

Keyword(s):

Optical Fibre ◽

High Capacity ◽

Quadrature Modulation ◽

Data Transport ◽

Modulation Techniques ◽

Capacity Data

Download Full-text

A robust and high capacity data hiding method for JPEG compressed images with SVD-based block selection and advanced error correcting techniques

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-189877 ◽

2021 ◽

pp. 1-11

Author(s):

Kusan Biswas

Keyword(s):

Data Hiding ◽

High Capacity ◽

Error Correcting Codes ◽

Human Vision ◽

Data Embedding ◽

Compressed Images ◽

Matrix Encoding ◽

Dct Coefficients ◽

Capacity Data ◽

Value Decomposition

In this paper, we propose a frequency domain data hiding method for the JPEG compressed images. The proposed method embeds data in the DCT coefficients of the selected 8 × 8 blocks. According to the theories of Human Visual Systems (HVS), human vision is less sensitive to perturbation of pixel values in the uneven areas of the image. In this paper we propose a Singular Value Decomposition based image roughness measure (SVD-IRM) using which we select the coarse 8 × 8 blocks as data embedding destinations. Moreover, to make the embedded data more robust against re-compression attack and error due to transmission over noisy channels, we employ Turbo error correcting codes. The actual data embedding is done using a proposed variant of matrix encoding that is capable of embedding three bits by modifying only one bit in block of seven carrier features. We have carried out experiments to validate the performance and it is found that the proposed method achieves better payload capacity and visual quality and is more robust than some of the recent state-of-the-art methods proposed in the literature.

Download Full-text

Improvements for research data repositories: The case of text spam

Journal of Information Science ◽

10.1177/0165551521998636 ◽

2021 ◽

pp. 016555152199863

Author(s):

Ismael Vázquez ◽

María Novo-Lourés ◽

Reyes Pavón ◽

Rosalía Laza ◽

José Ramón Méndez ◽

...

Keyword(s):

Web Application ◽

Research Data ◽

Data Sets ◽

Data Repositories ◽

Software Applications ◽

Public Data ◽

Protection Mechanisms ◽

Experimental Protocols ◽

Learning Research ◽

Processing Steps

Current research has evolved in such a way scientists must not only adequately describe the algorithms they introduce and the results of their application, but also ensure the possibility of reproducing the results and comparing them with those obtained through other approximations. In this context, public data sets (sometimes shared through repositories) are one of the most important elements for the development of experimental protocols and test benches. This study has analysed a significant number of CS/ML ( Computer Science/ Machine Learning) research data repositories and data sets and detected some limitations that hamper their utility. Particularly, we identify and discuss the following demanding functionalities for repositories: (1) building customised data sets for specific research tasks, (2) facilitating the comparison of different techniques using dissimilar pre-processing methods, (3) ensuring the availability of software applications to reproduce the pre-processing steps without using the repository functionalities and (4) providing protection mechanisms for licencing issues and user rights. To show the introduced functionality, we created STRep (Spam Text Repository) web application which implements our recommendations adapted to the field of spam text repositories. In addition, we launched an instance of STRep in the URL https://rdata.4spam.group to facilitate understanding of this study.

Download Full-text

Estimating a Stoichiometric Solid’s Gibbs Free Energy Model by Means of a Constrained Evolutionary Strategy

Materials ◽

10.3390/ma14020471 ◽

2021 ◽

Vol 14 (2) ◽

pp. 471

Author(s):

Constantino Grau Turuelo ◽

Sebastian Pinnau ◽

Cornelia Breitkopf

Keyword(s):

Heat Capacity ◽

Free Energy ◽

Evolutionary Strategy ◽

Data Sets ◽

Automatic Data ◽

Physical Constraints ◽

Stoichiometric Model ◽

Covariance Matrix Adaptation ◽

Dsc Measurements ◽

Capacity Data

Modeling of thermodynamic properties, like heat capacities for stoichiometric solids, includes the treatment of different sources of data which may be inconsistent and diverse. In this work, an approach based on the covariance matrix adaptation evolution strategy (CMA-ES) is proposed and described as an alternative method for data treatment and fitting with the support of data source dependent weight factors and physical constraints. This is applied to a Gibb’s Free Energy stoichiometric model for different magnesium sulfate hydrates by means of the NASA9 polynomial. Its behavior is proved by: (i) The comparison of the model to other standard methods for different heat capacity data, yielding a more plausible curve at high temperature ranges; (ii) the comparison of the fitted heat capacity values of MgSO4·7H2O against DSC measurements, resulting in a mean relative error of a 0.7% and a normalized root mean square deviation of 1.1%; and (iii) comparing the Van’t Hoff and proposed Stoichiometric model vapor-solid equilibrium curves to different literature data for MgSO4·7H2O, MgSO4·6H2O, and MgSO4·1H2O, resulting in similar equilibrium values, especially for MgSO4·7H2O and MgSO4·6H2O. The results show good agreement with the employed data and confirm this method as a viable alternative for fitting complex physically constrained data sets, while being a potential approach for automatic data fitting of substance data.

Download Full-text