scholarly journals The Modern Research Data Portal: a design pattern for networked, data-intensive science

2018 ◽  
Vol 4 ◽  
pp. e144 ◽  
Author(s):  
Kyle Chard ◽  
Eli Dart ◽  
Ian Foster ◽  
David Shifflett ◽  
Steven Tuecke ◽  
...  

We describe best practices for providing convenient, high-speed, secure access to large data via research data portals. We capture these best practices in a new design pattern, the Modern Research Data Portal, that disaggregates the traditional monolithic web-based data portal to achieve orders-of-magnitude increases in data transfer performance, support new deployment architectures that decouple control logic from data storage, and reduce development and operations costs. We introduce the design pattern; explain how it leverages high-performance data enclaves and cloud-based data management services; review representative examples at research laboratories and universities, including both experimental facilities and supercomputer sites; describe how to leverage Python APIs for authentication, authorization, data transfer, and data sharing; and use coding examples to demonstrate how these APIs can be used to implement a range of research data portal capabilities. Sample code at a companion web site, https://docs.globus.org/mrdp, provides application skeletons that readers can adapt to realize their own research data portals.

Author(s):  
Kyle Chard ◽  
Eli Dart ◽  
Ian Foster ◽  
David Shifflett ◽  
Steven Tuecke ◽  
...  

We describe best practices for providing convenient, high-speed, secure access to large data via research data portals. We capture these best practices in a new design pattern, the Modern Research Data Portal, that disaggregates the traditional monolithic web-based data portal to achieve orders-of-magnitude increases in data transfer performance, support new deployment architectures that decouple control logic from data storage, and reduce development and operations costs. We introduce the design pattern; explain how it leverages high-performance Science DMZs and cloud-based data management services; review representative examples at research laboratories and universities, including both experimental facilities and supercomputer sites; describe how to leverage Python APIs for authentication, authorization, data transfer, and data sharing; and use coding examples to demonstrate how these APIs can be used to implement a range of research data portal capabilities. Sample code at a companion web site, https://docs.globus.org/mrdp, provides application skeletons that readers can adapt to realize their own research data portals.


2017 ◽  
Author(s):  
Kyle Chard ◽  
Eli Dart ◽  
Ian Foster ◽  
David Shifflett ◽  
Steven Tuecke ◽  
...  

We describe best practices for providing convenient, high-speed, secure access to large data via research data portals. We capture these best practices in a new design pattern, the Modern Research Data Portal, that disaggregates the traditional monolithic web-based data portal to achieve orders-of-magnitude increases in data transfer performance, support new deployment architectures that decouple control logic from data storage, and reduce development and operations costs. We introduce the design pattern; explain how it leverages high-performance Science DMZs and cloud-based data management services; review representative examples at research laboratories and universities, including both experimental facilities and supercomputer sites; describe how to leverage Python APIs for authentication, authorization, data transfer, and data sharing; and use coding examples to demonstrate how these APIs can be used to implement a range of research data portal capabilities. Sample code at a companion web site, https://docs.globus.org/mrdp, provides application skeletons that readers can adapt to realize their own research data portals.


2017 ◽  
Author(s):  
Kyle Chard ◽  
Eli Dart ◽  
Ian Foster ◽  
David Shifflett ◽  
Steven Tuecke ◽  
...  

We describe best practices for providing convenient, high-speed, secure access to large data via research data portals. We capture these best practices in a new design pattern, the Modern Research Data Portal, that disaggregates the traditional monolithic web-based data portal to achieve orders-of-magnitude increases in data transfer performance, support new deployment architectures that decouple control logic from data storage, and reduce development and operations costs. We introduce the design pattern; explain how it leverages high-performance Science DMZs and cloud-based data management services; review representative examples at research laboratories and universities, including both experimental facilities and supercomputer sites; describe how to leverage Python APIs for authentication, authorization, data transfer, and data sharing; and use coding examples to demonstrate how these APIs can be used to implement a range of research data portal capabilities. Sample code at a companion web site, https://docs.globus.org/mrdp, provides application skeletons that readers can adapt to realize their own research data portals.


2012 ◽  
Vol 479-481 ◽  
pp. 65-70
Author(s):  
Xiao Hui Zhang ◽  
Liu Qing ◽  
Mu Li

Based on the target detection of alignment template, the paper designs a lane alignment template by using correlation matching method, and combines with genetic algorithm for template stochastic matching and optimization to realize the lane detection. In order to solve the real-time problem of lane detection algorithm based on genetic algorithm, this paper uses the high performance multi-core DSP chip TMS320C6474 as the core, combines with high-speed data transmission technology of Rapid10, realizes the hardware parallel processing of the lane detection algorithm. By Rapid10 bus, the data transmission speed between the DSP and the DSP can reach 3.125Gbps, it basically realizes transmission without delay, and thereby solves the high speed transmission of the large data quantity between processor. The experimental results show that, no matter the calculated lane line, or the running time is better than the single DSP and PC at the parallel C6474 platform. In addition, the road detection is accurate and reliable, and it has good robustness.


2021 ◽  
pp. 60-70
Author(s):  
Piyush Kumar Shukla ◽  
◽  
Prashant Kumar Shukla ◽  

The interpretation of large data streams necessitates high-performance repeated transfers, which overload Microprocessor System on Chips (SoC). The effective direct memory access (DMA) controller performs bulk data transfers without the CPU's involvement. The Direct Memory Controller (DMAC) solves this by facilitating bulk data transfer and execution. In this work, we created an intelligent DMAC (I-DMAC) for accessing video processing data without using CPUs. The model includes Bus selection Module, User control signal, Status Register, DMA supported Address, and AXI-PCI subsystems for improved video frame analysis. These modules are experimentally verified in Xilinx FPGA SoC architecture using VHDL code simulation and results compared to the E-DMAC model.


Author(s):  
Ivan Mozghovyi ◽  
Anatoliy Sergiyenko ◽  
Roman Yershov

Increasing requirements for data transfer and storage is one of the crucial questions now. There are several ways of high-speed data transmission, but they meet limited requirements applied to their narrowly focused specific target. The data compression approach gives the solution to the problems of high-speed transfer and low-volume data storage. This paper is devoted to the compression of GIF images, using a modified LZW algorithm with a tree-based dictionary. It has led to a decrease in lookup time and an increase in the speed of data compression, and in turn, allows developing the method of constructing a hardware compression accelerator during the future research.


Author(s):  
Valentin Cristea ◽  
Ciprian Dobre ◽  
Corina Stratan ◽  
Florin Pop

The latest advances in network and distributedsystem technologies now allow integration of a vast variety of services with almost unlimited processing power, using large amounts of data. Sharing of resources is often viewed as the key goal for distributed systems, and in this context the sharing of stored data appears as the most important aspect of distributed resource sharing. Scientific applications are the first to take advantage of such environments as the requirements of current and future high performance computing experiments are pressing, in terms of even higher volumes of issued data to be stored and managed. While these new environments reveal huge opportunities for large-scale distributed data storage and management, they also raise important technical challenges, which need to be addressed. The ability to support persistent storage of data on behalf of users, the consistent distribution of up-to-date data, the reliable replication of fast changing datasets or the efficient management of large data transfers are just some of these new challenges. In this chapter we discuss how the existing distributed computing infrastructure is adequate for supporting the required data storage and management functionalities. We highlight the issues raised from storing data over large distributed environments and discuss the recent research efforts dealing with challenges of data retrieval, replication and fast data transfers. Interaction of data management with other data sensitive, emerging technologies as the workflow management is also addressed.


2013 ◽  
Vol 3 (1) ◽  
pp. 13-26 ◽  
Author(s):  
Sanjay P. Ahuja ◽  
Sindhu Mani

High Performance Computing (HPC) applications are scientific applications that require significant CPU capabilities. They are also data-intensive applications requiring large data storage. While many researchers have examined the performance of Amazon’s EC2 platform across some HPC benchmarks, an extensive study and their comparison between Amazon’s EC2 and Microsoft’s Windows Azure is largely missing with metrics such as memory bandwidth, I/O performance, and communication and computational performance. The purpose of this paper is to implement existing benchmarks to evaluate and analyze these metrics for EC2 and Windows Azure that span both Infrastructure-as-a-Service and Platform-as-a-Service types. This was accomplished by running MPI versions of STREAM, Interleaved or Random (IOR) and NAS Parallel (NPB) benchmarks on small and medium instance types. In addition a new EC2 medium instance type (m1.medium) was also included in the analysis. These benchmarks measure the memory bandwidth, I/O performance, communication and computational performance.


2018 ◽  
Vol 7 (4.6) ◽  
pp. 13
Author(s):  
Mekala Sandhya ◽  
Ashish Ladda ◽  
Dr. Uma N Dulhare ◽  
. . ◽  
. .

In this generation of Internet, information and data are growing continuously. Even though various Internet services and applications. The amount of information is increasing rapidly. Hundred billions even trillions of web indexes exist. Such large data brings people a mass of information and more difficulty discovering useful knowledge in these huge amounts of data at the same time. Cloud computing can provide infrastructure for large data. Cloud computing has two significant characteristics of distributed computing i.e. scalability, high availability. The scalability can seamlessly extend to large-scale clusters. Availability says that cloud computing can bear node errors. Node failures will not affect the program to run correctly. Cloud computing with data mining does significant data processing through high-performance machine. Mass data storage and distributed computing provide a new method for mass data mining and become an effective solution to the distributed storage and efficient computing in data mining. 


Sign in / Sign up

Export Citation Format

Share Document