The Modern Research Data Portal: a design pattern for networked, data-intensive science

PeerJ Computer Science ◽

10.7717/peerj-cs.144 ◽

2018 ◽

Vol 4 ◽

pp. e144 ◽

Cited By ~ 13

Author(s):

Kyle Chard ◽

Eli Dart ◽

Ian Foster ◽

David Shifflett ◽

Steven Tuecke ◽

...

Keyword(s):

Best Practices ◽

Data Storage ◽

Design Pattern ◽

High Speed ◽

High Performance ◽

Data Transfer ◽

Large Data ◽

Research Data ◽

Control Logic ◽

Data Portal

We describe best practices for providing convenient, high-speed, secure access to large data via research data portals. We capture these best practices in a new design pattern, the Modern Research Data Portal, that disaggregates the traditional monolithic web-based data portal to achieve orders-of-magnitude increases in data transfer performance, support new deployment architectures that decouple control logic from data storage, and reduce development and operations costs. We introduce the design pattern; explain how it leverages high-performance data enclaves and cloud-based data management services; review representative examples at research laboratories and universities, including both experimental facilities and supercomputer sites; describe how to leverage Python APIs for authentication, authorization, data transfer, and data sharing; and use coding examples to demonstrate how these APIs can be used to implement a range of research data portal capabilities. Sample code at a companion web site, https://docs.globus.org/mrdp, provides application skeletons that readers can adapt to realize their own research data portals.

Download Full-text

The Modern Research Data Portal: A design pattern for networked, data-intensive science

10.7287/peerj.preprints.3194v2 ◽

2017 ◽

Cited By ~ 1

Author(s):

Kyle Chard ◽

Eli Dart ◽

Ian Foster ◽

David Shifflett ◽

Steven Tuecke ◽

...

Keyword(s):

Best Practices ◽

Data Storage ◽

Design Pattern ◽

High Speed ◽

High Performance ◽

Data Transfer ◽

Large Data ◽

Research Data ◽

Control Logic ◽

Data Portal

We describe best practices for providing convenient, high-speed, secure access to large data via research data portals. We capture these best practices in a new design pattern, the Modern Research Data Portal, that disaggregates the traditional monolithic web-based data portal to achieve orders-of-magnitude increases in data transfer performance, support new deployment architectures that decouple control logic from data storage, and reduce development and operations costs. We introduce the design pattern; explain how it leverages high-performance Science DMZs and cloud-based data management services; review representative examples at research laboratories and universities, including both experimental facilities and supercomputer sites; describe how to leverage Python APIs for authentication, authorization, data transfer, and data sharing; and use coding examples to demonstrate how these APIs can be used to implement a range of research data portal capabilities. Sample code at a companion web site, https://docs.globus.org/mrdp, provides application skeletons that readers can adapt to realize their own research data portals.

Download Full-text

The Modern Research Data Portal: A design pattern for networked, data-intensive science

10.7287/peerj.preprints.3194v1 ◽

2017 ◽

Author(s):

Kyle Chard ◽

Eli Dart ◽

Ian Foster ◽

David Shifflett ◽

Steven Tuecke ◽

...

Keyword(s):

Best Practices ◽

Data Storage ◽

Design Pattern ◽

High Speed ◽

High Performance ◽

Data Transfer ◽

Large Data ◽

Research Data ◽

Control Logic ◽

Data Portal

We describe best practices for providing convenient, high-speed, secure access to large data via research data portals. We capture these best practices in a new design pattern, the Modern Research Data Portal, that disaggregates the traditional monolithic web-based data portal to achieve orders-of-magnitude increases in data transfer performance, support new deployment architectures that decouple control logic from data storage, and reduce development and operations costs. We introduce the design pattern; explain how it leverages high-performance Science DMZs and cloud-based data management services; review representative examples at research laboratories and universities, including both experimental facilities and supercomputer sites; describe how to leverage Python APIs for authentication, authorization, data transfer, and data sharing; and use coding examples to demonstrate how these APIs can be used to implement a range of research data portal capabilities. Sample code at a companion web site, https://docs.globus.org/mrdp, provides application skeletons that readers can adapt to realize their own research data portals.

Download Full-text

The Modern Research Data Portal: A design pattern for networked, data-intensive science

10.7287/peerj.preprints.3194 ◽

2017 ◽

Author(s):

Kyle Chard ◽

Eli Dart ◽

Ian Foster ◽

David Shifflett ◽

Steven Tuecke ◽

...

Keyword(s):

Best Practices ◽

Data Storage ◽

Design Pattern ◽

High Speed ◽

High Performance ◽

Data Transfer ◽

Large Data ◽

Research Data ◽

Control Logic ◽

Data Portal

We describe best practices for providing convenient, high-speed, secure access to large data via research data portals. We capture these best practices in a new design pattern, the Modern Research Data Portal, that disaggregates the traditional monolithic web-based data portal to achieve orders-of-magnitude increases in data transfer performance, support new deployment architectures that decouple control logic from data storage, and reduce development and operations costs. We introduce the design pattern; explain how it leverages high-performance Science DMZs and cloud-based data management services; review representative examples at research laboratories and universities, including both experimental facilities and supercomputer sites; describe how to leverage Python APIs for authentication, authorization, data transfer, and data sharing; and use coding examples to demonstrate how these APIs can be used to implement a range of research data portal capabilities. Sample code at a companion web site, https://docs.globus.org/mrdp, provides application skeletons that readers can adapt to realize their own research data portals.

Download Full-text

Lane Detection Algorithm Based on Genetic Algorithm and its Parallel Computing Realization

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.479-481.65 ◽

2012 ◽

Vol 479-481 ◽

pp. 65-70

Author(s):

Xiao Hui Zhang ◽

Liu Qing ◽

Mu Li

Keyword(s):

Genetic Algorithm ◽

Data Transmission ◽

High Speed ◽

High Performance ◽

Large Data ◽

Detection Algorithm ◽

Lane Detection ◽

The Road ◽

Time Problem ◽

High Speed Data

Based on the target detection of alignment template, the paper designs a lane alignment template by using correlation matching method, and combines with genetic algorithm for template stochastic matching and optimization to realize the lane detection. In order to solve the real-time problem of lane detection algorithm based on genetic algorithm, this paper uses the high performance multi-core DSP chip TMS320C6474 as the core, combines with high-speed data transmission technology of Rapid10, realizes the hardware parallel processing of the lane detection algorithm. By Rapid10 bus, the data transmission speed between the DSP and the DSP can reach 3.125Gbps, it basically realizes transmission without delay, and thereby solves the high speed transmission of the large data quantity between processor. The experimental results show that, no matter the calculated lane line, or the running time is better than the single DSP and PC at the parallel C6474 platform. In addition, the road detection is accurate and reliable, and it has good robustness.

Download Full-text

I-DMAC: An Intelligent DMA Controller for Utilization - Aware Video Streaming used in AI Applications

10.54216/jcim.080203 ◽

2021 ◽

pp. 60-70

Author(s):

Piyush Kumar Shukla ◽

◽

Prashant Kumar Shukla ◽

Keyword(s):

Video Processing ◽

High Performance ◽

Data Transfer ◽

Direct Memory Access ◽

Large Data ◽

Video Frame ◽

Microprocessor System ◽

Bulk Data ◽

Xilinx Fpga ◽

Vhdl Code

The interpretation of large data streams necessitates high-performance repeated transfers, which overload Microprocessor System on Chips (SoC). The effective direct memory access (DMA) controller performs bulk data transfers without the CPU's involvement. The Direct Memory Controller (DMAC) solves this by facilitating bulk data transfer and execution. In this work, we created an intelligent DMAC (I-DMAC) for accessing video processing data without using CPUs. The model includes Bus selection Module, User control signal, Status Register, DMA supported Address, and AXI-PCI subsystems for improved video frame analysis. These modules are experimentally verified in Xilinx FPGA SoC architecture using VHDL code simulation and results compared to the E-DMAC model.

Download Full-text

GIF IMAGE HARDWARE COMPRESSORS

Information, Computing and Intelligent systems ◽

10.20535/2708-4930.2.2021.244189 ◽

2021 ◽

Author(s):

Ivan Mozghovyi ◽

Anatoliy Sergiyenko ◽

Roman Yershov

Keyword(s):

Data Compression ◽

Data Storage ◽

High Speed ◽

Data Transfer ◽

Future Research ◽

Volume Data ◽

High Speed Data ◽

Lzw Algorithm ◽

And Storage ◽

Speed Data Transmission

Increasing requirements for data transfer and storage is one of the crucial questions now. There are several ways of high-speed data transmission, but they meet limited requirements applied to their narrowly focused specific target. The data compression approach gives the solution to the problems of high-speed transfer and low-volume data storage. This paper is devoted to the compression of GIF images, using a modified LZW algorithm with a tree-based dictionary. It has led to a decrease in lookup time and an increase in the speed of data compression, and in turn, allows developing the method of constructing a hardware compression accelerator during the future research.

Download Full-text

Data Storage, Retrieval and Management

Advances in Systems Analysis, Software Engineering, and High Performance Computing - Large-Scale Distributed Computing and Applications ◽

10.4018/978-1-61520-703-9.ch006 ◽

2010 ◽

pp. 111-140

Author(s):

Valentin Cristea ◽

Ciprian Dobre ◽

Corina Stratan ◽

Florin Pop

Keyword(s):

Data Storage ◽

Resource Sharing ◽

High Performance ◽

Large Scale ◽

Workflow Management ◽

Large Data ◽

Data Retrieval ◽

Distributed Data Storage ◽

Processing Power ◽

Data Transfers

The latest advances in network and distributedsystem technologies now allow integration of a vast variety of services with almost unlimited processing power, using large amounts of data. Sharing of resources is often viewed as the key goal for distributed systems, and in this context the sharing of stored data appears as the most important aspect of distributed resource sharing. Scientific applications are the first to take advantage of such environments as the requirements of current and future high performance computing experiments are pressing, in terms of even higher volumes of issued data to be stored and managed. While these new environments reveal huge opportunities for large-scale distributed data storage and management, they also raise important technical challenges, which need to be addressed. The ability to support persistent storage of data on behalf of users, the consistent distribution of up-to-date data, the reliable replication of fast changing datasets or the efficient management of large data transfers are just some of these new challenges. In this chapter we discuss how the existing distributed computing infrastructure is adequate for supporting the required data storage and management functionalities. We highlight the issues raised from storing data over large distributed environments and discuss the recent research efforts dealing with challenges of data retrieval, replication and fast data transfers. Interaction of data management with other data sensitive, emerging technologies as the workflow management is also addressed.

Download Full-text

Empirical Performance Analysis of HPC Benchmarks Across Variations in Cloud Computing

International Journal of Cloud Applications and Computing ◽

10.4018/ijcac.2013010102 ◽

2013 ◽

Vol 3 (1) ◽

pp. 13-26 ◽

Cited By ~ 4

Author(s):

Sanjay P. Ahuja ◽

Sindhu Mani

Keyword(s):

Data Storage ◽

High Performance ◽

Large Data ◽

Extensive Study ◽

Memory Bandwidth ◽

Platform As A Service ◽

Data Intensive ◽

Computational Performance ◽

Empirical Performance ◽

Data Intensive Applications

High Performance Computing (HPC) applications are scientific applications that require significant CPU capabilities. They are also data-intensive applications requiring large data storage. While many researchers have examined the performance of Amazon’s EC2 platform across some HPC benchmarks, an extensive study and their comparison between Amazon’s EC2 and Microsoft’s Windows Azure is largely missing with metrics such as memory bandwidth, I/O performance, and communication and computational performance. The purpose of this paper is to implement existing benchmarks to evaluate and analyze these metrics for EC2 and Windows Azure that span both Infrastructure-as-a-Service and Platform-as-a-Service types. This was accomplished by running MPI versions of STREAM, Interleaved or Random (IOR) and NAS Parallel (NPB) benchmarks on small and medium instance types. In addition a new EC2 medium instance type (m1.medium) was also included in the analysis. These benchmarks measure the memory bandwidth, I/O performance, communication and computational performance.

Download Full-text

A Review: Map Reduce Framework for Cloud Computing

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i4.6.20224 ◽

2018 ◽

Vol 7 (4.6) ◽

pp. 13

Author(s):

Mekala Sandhya ◽

Ashish Ladda ◽

Dr. Uma N Dulhare ◽

. . ◽

. .

Keyword(s):

Data Mining ◽

Cloud Computing ◽

Distributed Computing ◽

Data Storage ◽

High Performance ◽

Large Scale ◽

Distributed Storage ◽

Large Data ◽

Mass Data ◽

Internet Information

In this generation of Internet, information and data are growing continuously. Even though various Internet services and applications. The amount of information is increasing rapidly. Hundred billions even trillions of web indexes exist. Such large data brings people a mass of information and more difficulty discovering useful knowledge in these huge amounts of data at the same time. Cloud computing can provide infrastructure for large data. Cloud computing has two significant characteristics of distributed computing i.e. scalability, high availability. The scalability can seamlessly extend to large-scale clusters. Availability says that cloud computing can bear node errors. Node failures will not affect the program to run correctly. Cloud computing with data mining does significant data processing through high-performance machine. Mass data storage and distributed computing provide a new method for mass data mining and become an effective solution to the distributed storage and efficient computing in data mining.

Download Full-text

Novel HDD-type SNDM ferroelectric data storage system aimed at high-speed data transfer with single probe operation

IEEE Transactions on Ultrasonics Ferroelectrics and Frequency Control ◽

10.1109/tuffc.2007.571 ◽

2007 ◽

Vol 54 (12) ◽

pp. 2523-2528 ◽

Cited By ~ 7

Author(s):

Yoshiomi Hiranaga ◽

Tomoya Uda ◽

Yuichi Kurihashi ◽

Kenkou Tanaka ◽

Yasuo Cho

Keyword(s):

Data Storage ◽

High Speed ◽

Data Transfer ◽

Storage System ◽

Single Probe ◽

Data Storage System ◽

High Speed Data

Download Full-text