scholarly journals The Science DMZ: A Network Design Pattern for Data-Intensive Science

2014 ◽  
Vol 22 (2) ◽  
pp. 173-185 ◽  
Author(s):  
Eli Dart ◽  
Lauren Rotman ◽  
Brian Tierney ◽  
Mary Hester ◽  
Jason Zurawski

The ever-increasing scale of scientific data has become a significant challenge for researchers that rely on networks to interact with remote computing systems and transfer results to collaborators worldwide. Despite the availability of high-capacity connections, scientists struggle with inadequate cyberinfrastructure that cripples data transfer performance, and impedes scientific progress. The ScienceDMZparadigm comprises a proven set of network design patterns that collectively address these problems for scientists. We explain the Science DMZ model, including network architecture, system configuration, cybersecurity, and performance tools, that creates an optimized network environment for science. We describe use cases from universities, supercomputing centers and research laboratories, highlighting the effectiveness of the Science DMZ model in diverse operational settings. In all, the Science DMZ model is a solid platform that supports any science workflow, and flexibly accommodates emerging network technologies. As a result, the Science DMZ vastly improves collaboration, accelerating scientific discovery.

2021 ◽  
Author(s):  
Chaolemen Borjigin ◽  
Chen Zhang

Abstract Data Science is one of today’s most rapidly growing academic fields and has significant implications for all conventional scientific studies. However, most of the relevant studies so far have been limited to one or several facets of Data Science from a specific application domain perspective and fail to discuss its theoretical framework. Data Science is a novel science in that its research goals, perspectives, and body of knowledge is distinct from other sciences. The core theories of Data Science are the DIKW pyramid, data-intensive scientific discovery, data science lifecycle, data wrangling or munging, big data analytics, data management and governance, data products development, and big data visualization. Six main trends characterize the recent theoretical studies on Data Science: growing significance of DataOps, the rise of citizen data scientists, enabling augmented data science, diversity of domain-specific data science, and implementing data stories as data products. The further development of Data Science should prioritize four ways to turning challenges into opportunities: accelerating theoretical studies of data science, the trade-off between explainability and performance, achieving data ethics, privacy and trust, and aligning academic curricula to industrial needs.


2014 ◽  
Vol 926-930 ◽  
pp. 2807-2810
Author(s):  
Li Jun Liu

In order to spread across different locations, sharing of computer resources, and ease of use of idle CPU or storage space Resources, there is the concept of grid and grid computing. Data - intensive scientific and engineering applications ( such as seismic data Numerical Simulation of physics, computational mechanics, weather forecast ) needed in a wide area, quick and safe transmission in distributed computing environments Huge amounts of data. So how in a grid environment efficient, reliable, and secure transfer massive files are in the grid computing A study on the key issue. Design and realization of dynamic task assignment algorithm and Performance experiment of the system.


Author(s):  
Tevfik Kosar ◽  
Mehmet Balman ◽  
Esma Yildirim ◽  
Sivakumar Kulasekaran ◽  
Brandon Ross

In this paper, we present the Stork data scheduler as a solution for mitigating the data bottleneck in e-Science and data-intensive scientific discovery. Stork focuses on planning, scheduling, monitoring and management of data placement tasks and application-level end-to-end optimization of networked inputs/outputs for petascale distributed e-Science applications. Unlike existing approaches, Stork treats data resources and the tasks related to data access and movement as first-class entities just like computational resources and compute tasks, and not simply the side-effect of computation. Stork provides unique features such as aggregation of data transfer jobs considering their source and destination addresses, and an application-level throughput estimation and optimization service. We describe how these two features are implemented in Stork and their effects on end-to-end data transfer performance.


2020 ◽  
Vol 14 ◽  
Author(s):  
Khoirom Motilal Singh ◽  
Laiphrakpam Dolendro Singh ◽  
Themrichon Tuithung

Background: Data which are in the form of text, audio, image and video are used everywhere in our modern scientific world. These data are stored in physical storage, cloud storage and other storage devices. Some of it are very sensitive and requires efficient security while storing as well as in transmitting from the sender to the receiver. Objective: With the increase in data transfer operation, enough space is also required to store these data. Many researchers have been working to develop different encryption schemes, yet there exist many limitations in their works. There is always a need for encryption schemes with smaller cipher data, faster execution time and low computation cost. Methods: A text encryption based on Huffman coding and ElGamal cryptosystem is proposed. Initially, the text data is converted to its corresponding binary bits using Huffman coding. Next, the binary bits are grouped and again converted into large integer values which will be used as the input for the ElGamal cryptosystem. Results: Encryption and Decryption are successfully performed where the data size is reduced using Huffman coding and advance security with the smaller key size is provided by the ElGamal cryptosystem. Conclusion: Simulation results and performance analysis specifies that our encryption algorithm is better than the existing algorithms under consideration.


Electronics ◽  
2021 ◽  
Vol 10 (12) ◽  
pp. 1471
Author(s):  
Jun-Yeong Lee ◽  
Moon-Hyun Kim ◽  
Syed Asif Raza Raza Shah ◽  
Sang-Un Ahn ◽  
Heejun Yoon ◽  
...  

Data are important and ever growing in data-intensive scientific environments. Such research data growth requires data storage systems that play pivotal roles in data management and analysis for scientific discoveries. Redundant Array of Independent Disks (RAID), a well-known storage technology combining multiple disks into a single large logical volume, has been widely used for the purpose of data redundancy and performance improvement. However, this requires RAID-capable hardware or software to build up a RAID-enabled disk array. In addition, it is difficult to scale up the RAID-based storage. In order to mitigate such a problem, many distributed file systems have been developed and are being actively used in various environments, especially in data-intensive computing facilities, where a tremendous amount of data have to be handled. In this study, we investigated and benchmarked various distributed file systems, such as Ceph, GlusterFS, Lustre and EOS for data-intensive environments. In our experiment, we configured the distributed file systems under a Reliable Array of Independent Nodes (RAIN) structure and a Filesystem in Userspace (FUSE) environment. Our results identify the characteristics of each file system that affect the read and write performance depending on the features of data, which have to be considered in data-intensive computing environments.


2010 ◽  
Vol 2010 ◽  
pp. 1-13 ◽  
Author(s):  
M. S. Leite ◽  
T. L. Fujiki ◽  
F. V. Silva ◽  
A. M. F. Fileti

This paper focuses on the development of intelligent controllers for use in a process of enzyme recovery from pineapple rind. The proteolytic enzyme bromelain (EC 3.4.22.4) is precipitated with alcohol at low temperature in a fed-batch jacketed tank. Temperature control is crucial to avoid irreversible protein denaturation. Fuzzy or neural controllers offer a way of implementing solutions that cover dynamic and nonlinear processes. The design methodology and a comparative study on the performance of fuzzy-PI, neurofuzzy, and neural network intelligent controllers are presented. To tune the fuzzy PI Mamdani controller, various universes of discourse, rule bases, and membership function support sets were tested. A neurofuzzy inference system (ANFIS), based on Takagi-Sugeno rules, and a model predictive controller, based on neural modeling, were developed and tested as well. Using a Fieldbus network architecture, a coolant variable speed pump was driven by the controllers. The experimental results show the effectiveness of fuzzy controllers in comparison to the neural predictive control. The fuzzy PI controller exhibited a reduced error parameter (ITAE), lower power consumption, and better recovery of enzyme activity.


Sign in / Sign up

Export Citation Format

Share Document