The Science DMZ: A Network Design Pattern for Data-Intensive Science

Eli Dart; Lauren Rotman; Brian Tierney; Mary Hester; Jason Zurawski

doi:10.1155/2014/701405

The Science DMZ: A Network Design Pattern for Data-Intensive Science

Scientific Programming ◽

10.1155/2014/701405 ◽

2014 ◽

Vol 22 (2) ◽

pp. 173-185 ◽

Cited By ~ 17

Author(s):

Eli Dart ◽

Lauren Rotman ◽

Brian Tierney ◽

Mary Hester ◽

Jason Zurawski

Keyword(s):

Network Design ◽

Network Architecture ◽

Design Patterns ◽

Data Transfer ◽

Scientific Discovery ◽

High Capacity ◽

Scientific Progress ◽

Scientific Data ◽

Data Intensive ◽

And Performance

The ever-increasing scale of scientific data has become a significant challenge for researchers that rely on networks to interact with remote computing systems and transfer results to collaborators worldwide. Despite the availability of high-capacity connections, scientists struggle with inadequate cyberinfrastructure that cripples data transfer performance, and impedes scientific progress. The ScienceDMZparadigm comprises a proven set of network design patterns that collectively address these problems for scientists. We explain the Science DMZ model, including network architecture, system configuration, cybersecurity, and performance tools, that creates an optimized network environment for science. We describe use cases from universities, supercomputing centers and research laboratories, highlighting the effectiveness of the Science DMZ model in diverse operational settings. In all, the Science DMZ model is a solid platform that supports any science workflow, and flexibly accommodates emerging network technologies. As a result, the Science DMZ vastly improves collaboration, accelerating scientific discovery.

Download Full-text

Data Science: Trends, Perspectives, and Prospects

10.21203/rs.3.rs-1014621/v1 ◽

2021 ◽

Author(s):

Chaolemen Borjigin ◽

Chen Zhang

Keyword(s):

Big Data ◽

Data Science ◽

Scientific Discovery ◽

Big Data Analytics ◽

Theoretical Studies ◽

Data Intensive ◽

Data Ethics ◽

Big Data Visualization ◽

Data Products ◽

And Performance

Abstract Data Science is one of today’s most rapidly growing academic fields and has significant implications for all conventional scientific studies. However, most of the relevant studies so far have been limited to one or several facets of Data Science from a specific application domain perspective and fail to discuss its theoretical framework. Data Science is a novel science in that its research goals, perspectives, and body of knowledge is distinct from other sciences. The core theories of Data Science are the DIKW pyramid, data-intensive scientific discovery, data science lifecycle, data wrangling or munging, big data analytics, data management and governance, data products development, and big data visualization. Six main trends characterize the recent theoretical studies on Data Science: growing significance of DataOps, the rise of citizen data scientists, enabling augmented data science, diversity of domain-specific data science, and implementing data stories as data products. The further development of Data Science should prioritize four ways to turning challenges into opportunities: accelerating theoretical studies of data science, the trade-off between explainability and performance, achieving data ethics, privacy and trust, and aligning academic curricula to industrial needs.

Download Full-text

Research on Computer Network Data Transfer Methods

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.926-930.2807 ◽

2014 ◽

Vol 926-930 ◽

pp. 2807-2810

Author(s):

Li Jun Liu

Keyword(s):

Grid Computing ◽

Computer Network ◽

Data Transfer ◽

Task Assignment ◽

Weather Forecast ◽

Ease Of Use ◽

Data Intensive ◽

Dynamic Task ◽

Assignment Algorithm ◽

And Performance

In order to spread across different locations, sharing of computer resources, and ease of use of idle CPU or storage space Resources, there is the concept of grid and grid computing. Data - intensive scientific and engineering applications ( such as seismic data Numerical Simulation of physics, computational mechanics, weather forecast ) needed in a wide area, quick and safe transmission in distributed computing environments Huge amounts of data. So how in a grid environment efficient, reliable, and secure transfer massive files are in the grid computing A study on the key issue. Design and realization of dynamic task assignment algorithm and Performance experiment of the system.

Download Full-text

Stork data scheduler: mitigating the data bottleneck in e-Science

Philosophical Transactions of The Royal Society A Mathematical Physical and Engineering Sciences ◽

10.1098/rsta.2011.0148 ◽

2011 ◽

Vol 369 (1949) ◽

pp. 3254-3267 ◽

Cited By ~ 14

Author(s):

Tevfik Kosar ◽

Mehmet Balman ◽

Esma Yildirim ◽

Sivakumar Kulasekaran ◽

Brandon Ross

Keyword(s):

Side Effect ◽

Data Transfer ◽

Scientific Discovery ◽

Data Access ◽

Data Placement ◽

Transfer Performance ◽

Data Intensive ◽

End To End ◽

Computational Resources

In this paper, we present the Stork data scheduler as a solution for mitigating the data bottleneck in e-Science and data-intensive scientific discovery. Stork focuses on planning, scheduling, monitoring and management of data placement tasks and application-level end-to-end optimization of networked inputs/outputs for petascale distributed e-Science applications. Unlike existing approaches, Stork treats data resources and the tasks related to data access and movement as first-class entities just like computational resources and compute tasks, and not simply the side-effect of computation. Stork provides unique features such as aggregation of data transfer jobs considering their source and destination addresses, and an application-level throughput estimation and optimization service. We describe how these two features are implemented in Stork and their effects on end-to-end data transfer performance.

Download Full-text

Network architecture and performance evaluation of TCP/IP and ATM over satellite

10.2514/6.2000-1180 ◽

2000 ◽

Author(s):

Y. Chotikapong ◽

Z. Sun ◽

B. Evans ◽

T. Ors

Keyword(s):

Performance Evaluation ◽

Network Architecture ◽

And Performance

Download Full-text

Text Encryption based on Huffman Coding and ElGamal Cryptosystem

Recent Patents on Engineering ◽

10.2174/1872212114999200917144000 ◽

2020 ◽

Vol 14 ◽

Author(s):

Khoirom Motilal Singh ◽

Laiphrakpam Dolendro Singh ◽

Themrichon Tuithung

Keyword(s):

Data Transfer ◽

Huffman Coding ◽

Large Integer ◽

Text Data ◽

Storage Devices ◽

Scientific World ◽

Elgamal Cryptosystem ◽

Encryption Schemes ◽

Transfer Operation ◽

And Performance

Background: Data which are in the form of text, audio, image and video are used everywhere in our modern scientific world. These data are stored in physical storage, cloud storage and other storage devices. Some of it are very sensitive and requires efficient security while storing as well as in transmitting from the sender to the receiver. Objective: With the increase in data transfer operation, enough space is also required to store these data. Many researchers have been working to develop different encryption schemes, yet there exist many limitations in their works. There is always a need for encryption schemes with smaller cipher data, faster execution time and low computation cost. Methods: A text encryption based on Huffman coding and ElGamal cryptosystem is proposed. Initially, the text data is converted to its corresponding binary bits using Huffman coding. Next, the binary bits are grouped and again converted into large integer values which will be used as the input for the ElGamal cryptosystem. Results: Encryption and Decryption are successfully performed where the data size is reduced using Huffman coding and advance security with the smaller key size is provided by the ElGamal cryptosystem. Conclusion: Simulation results and performance analysis specifies that our encryption algorithm is better than the existing algorithms under consideration.

Download Full-text

Network design and performance for a massively parallel SIMD system

[Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation ◽

10.1109/fmpc.1992.234889 ◽

2003 ◽

Author(s):

S. Darbha ◽

E.W. Davis

Keyword(s):

Network Design ◽

Massively Parallel ◽

And Performance

Download Full-text

Performance Evaluations of Distributed File Systems for Scientific Big Data in FUSE Environment

Electronics ◽

10.3390/electronics10121471 ◽

2021 ◽

Vol 10 (12) ◽

pp. 1471

Author(s):

Jun-Yeong Lee ◽

Moon-Hyun Kim ◽

Syed Asif Raza Raza Shah ◽

Sang-Un Ahn ◽

Heejun Yoon ◽

...

Keyword(s):

Data Storage ◽

Scale Up ◽

File Systems ◽

Performance Evaluations ◽

Distributed File Systems ◽

Data Intensive Computing ◽

Data Intensive ◽

Tremendous Amount ◽

Computing Environments ◽

And Performance

Data are important and ever growing in data-intensive scientific environments. Such research data growth requires data storage systems that play pivotal roles in data management and analysis for scientific discoveries. Redundant Array of Independent Disks (RAID), a well-known storage technology combining multiple disks into a single large logical volume, has been widely used for the purpose of data redundancy and performance improvement. However, this requires RAID-capable hardware or software to build up a RAID-enabled disk array. In addition, it is difficult to scale up the RAID-based storage. In order to mitigate such a problem, many distributed file systems have been developed and are being actively used in various environments, especially in data-intensive computing facilities, where a tremendous amount of data have to be handled. In this study, we investigated and benchmarked various distributed file systems, such as Ceph, GlusterFS, Lustre and EOS for data-intensive environments. In our experiment, we configured the distributed file systems under a Reliable Array of Independent Nodes (RAIN) structure and a Filesystem in Userspace (FUSE) environment. Our results identify the characteristics of each file system that affect the read and write performance depending on the features of data, which have to be considered in data-intensive computing environments.

Download Full-text

Final Session: The Fourth Paradigm : Data-Intensive Scientific Discovery: More than 10 years later.

10.26226/morressier.5fd757103d762219be34f389 ◽

2021 ◽

Author(s):

Irina Sens

Keyword(s):

Scientific Discovery ◽

Final Session ◽

Data Intensive

Download Full-text

Secure XML Aware Network Design and Performance Analysis

Computational Science and Its Applications – ICCSA 2005 - Lecture Notes in Computer Science ◽

10.1007/11424758_33 ◽

2005 ◽

pp. 311-319 ◽

Cited By ~ 1

Author(s):

Eui-Nam Huh ◽

Jong-Youl Jeong ◽

Young-Shin Kim ◽

Ki-Young Mun

Keyword(s):

Performance Analysis ◽

Network Design ◽

And Performance

Download Full-text

Online Intelligent Controllers for an Enzyme Recovery Plant: Design Methodology and Performance

Enzyme Research ◽

10.4061/2010/250843 ◽

2010 ◽

Vol 2010 ◽

pp. 1-13 ◽

Cited By ~ 1

Author(s):

M. S. Leite ◽

T. L. Fujiki ◽

F. V. Silva ◽

A. M. F. Fileti

Keyword(s):

Network Architecture ◽

Design Methodology ◽

Protein Denaturation ◽

Plant Design ◽

Inference System ◽

Enzyme Recovery ◽

And Performance ◽

Neural Predictive Control ◽

Rule Bases ◽

Intelligent Controllers

This paper focuses on the development of intelligent controllers for use in a process of enzyme recovery from pineapple rind. The proteolytic enzyme bromelain (EC 3.4.22.4) is precipitated with alcohol at low temperature in a fed-batch jacketed tank. Temperature control is crucial to avoid irreversible protein denaturation. Fuzzy or neural controllers offer a way of implementing solutions that cover dynamic and nonlinear processes. The design methodology and a comparative study on the performance of fuzzy-PI, neurofuzzy, and neural network intelligent controllers are presented. To tune the fuzzy PI Mamdani controller, various universes of discourse, rule bases, and membership function support sets were tested. A neurofuzzy inference system (ANFIS), based on Takagi-Sugeno rules, and a model predictive controller, based on neural modeling, were developed and tested as well. Using a Fieldbus network architecture, a coolant variable speed pump was driven by the controllers. The experimental results show the effectiveness of fuzzy controllers in comparison to the neural predictive control. The fuzzy PI controller exhibited a reduced error parameter (ITAE), lower power consumption, and better recovery of enzyme activity.

Download Full-text