Stochastic Approach for Secondary Storage Data Access Cost Estimation

Program Transferability - Data Access Representation for Secondary Storage

10.21236/ad0753400 ◽

1972 ◽

Author(s):

Stuart C. Schaffner ◽

David B. Loveman ◽

Robert E. Millstein

Keyword(s):

Data Access ◽

Secondary Storage

Download Full-text

Cost- and Time-Based Data Deployment for Improving Scheduling Efficiency in Distributed Clouds

The Computer Journal ◽

10.1093/comjnl/bxaa121 ◽

2020 ◽

Author(s):

Chunlin Li ◽

Yihan Zhang ◽

Xiaomei Qu ◽

Youlong Luo

Keyword(s):

Task Scheduling ◽

Scheduling Algorithm ◽

Service Level Agreement ◽

Data Access ◽

Service Level ◽

Speculative Execution ◽

Improved Genetic Algorithm ◽

Real Time Processing ◽

Data Intensive ◽

Access Cost

Abstract In recent years, with the continuous development of internet of things and cloud computing technologies, data intensive applications have gotten more and more attention. In the distributed cloud environment, the access of massive data is often the bottleneck of its performance. It is very significant to propose a suitable data deployment algorithm for improving the utilization of cloud server and the efficiency of task scheduling. In order to reduce data access cost and data deployment time, an optimal data deployment algorithm is proposed in this paper. By modeling and analyzing the data deployment problem, the problem is solved by using the improved genetic algorithm. After the data are well deployed, aiming at improving the efficiency of task scheduling, a task progress aware scheduling algorithm is proposed in this paper in order to make the speculative execution mechanism more accurate. Firstly, the threshold to detect the slow tasks and fast nodes are set. Then, the slow tasks and fast nodes are detected by calculating the remaining time of the tasks and the real-time processing ability of the nodes, respectively. Finally, the backup execution of the slow tasks is performed on the fast nodes. While satisfying the load balancing of the system, the experimental results show that the proposed algorithms can obviously reduce data access cost, service-level agreement (SLA) default rate and the execution time of the system and optimize data deployment for improving scheduling efficiency in distributed clouds.

Download Full-text

Examining the impact of data-access cost on XML twig pattern matching

Information Sciences ◽

10.1016/j.ins.2012.03.011 ◽

2012 ◽

Vol 203 ◽

pp. 24-43 ◽

Cited By ~ 5

Author(s):

SangKeun Lee ◽

Byung-Gul Ryu ◽

Kun-Lung Wu

Keyword(s):

Pattern Matching ◽

Data Access ◽

Access Cost ◽

Twig Pattern ◽

Twig Pattern Matching ◽

The Impact

Download Full-text

MINIMIZING LATENCY AND JITTER FOR LARGE-SCALE MULTIMEDIA REPOSITORIES THROUGH PREFIX CACHING

International Journal of Image and Graphics ◽

10.1142/s0219467803000932 ◽

2003 ◽

Vol 03 (01) ◽

pp. 95-117 ◽

Cited By ~ 1

Author(s):

SUNIL PRABHAKAR ◽

RAHUL CHARI

Keyword(s):

Large Scale ◽

Data Access ◽

Multimedia Data ◽

Secondary Storage ◽

Storage And Retrieval ◽

Multiple User ◽

Large Size ◽

Tertiary Storage ◽

Hierarchical Storage ◽

Access Patterns

Multimedia data poses challenges for efficient storage and retrieval due to its large size and playback timing requirements. For applications that store very large volumes of multimedia data, hierarchical storage offers a scalable and economical alternative to store data on magnetic disks. In a hierarchical storage architecture data is stored on a tape or optical disk based tertiary storage layer with the secondary storage disks serving as a cache or buffer. Due to the need for swapping media on drives, retrieving multimedia data from tertiary storage can potentially result in large delays before playback (startup latency) begins as well as during playback (jitter). In this paper we address the important problem of reducing startup latency and jitter for very large multimedia repositories. We propose that secondary storage should not be used as a cache in the traditional manner — instead, most of the secondary storage should be used to permanently store partial objects. Furthermore, replication is employed at the tertiary storage level to avoid expensive media switching. In particular, we show that by saving the initial segments of documents permanently on secondary storage, and replicating them on tertiary storage, startup latency can be significantly reduced. Since we are effectively reducing the amount of secondary storage available for buffering the data from tertiary storage, an increase in jitter may be expected. However, our results show that the technique also reduces jitter, in contrast to the expected behavior. Our technique exploits the pattern of data access. Advance knowledge of the access pattern is helpful, but not essential. Lack of this information or changes in access patterns are handled through adaptive techniques. Our study addresses both single- and multiple-user scenarios. Our results show that startup latency can be reduced by as much as 75% and jitter practically eliminated through the use of these techniques.

Download Full-text

Access cost estimation for physical database design

Data & Knowledge Engineering ◽

10.1016/0169-023x(93)90002-7 ◽

1993 ◽

Vol 11 (2) ◽

pp. 125-150 ◽

Cited By ~ 4

Author(s):

Paolo Ciaccia ◽

Dario Maio

Keyword(s):

Cost Estimation ◽

Database Design ◽

Physical Database Design ◽

Access Cost

Download Full-text

Predicting memory-access cost based on data-access patterns

2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935) ◽

10.1109/clustr.2004.1392630 ◽

2005 ◽

Cited By ~ 10

Author(s):

S. Byna ◽

Xian-He Sun ◽

W. Gropp ◽

R. Thakur

Keyword(s):

Data Access ◽

Memory Access ◽

Access Cost ◽

Data Access Patterns ◽

Access Patterns

Download Full-text

The Study on Cloud Storage Data Management Method Based on Minimum Access Cost for Internet of Things

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.198-199.1657 ◽

2012 ◽

Vol 198-199 ◽

pp. 1657-1662 ◽

Cited By ~ 1

Author(s):

Xiao Hong Song

Keyword(s):

Internet Of Things ◽

Cloud Storage ◽

Low Cost ◽

Minimum Cost ◽

Data Access ◽

Distribution Method ◽

Access Cost ◽

Cost Distribution ◽

Fundamental Research ◽

Management Method

According to the characteristics of data access in Internet of things, we put forward a replicate distribution method of minimum cost which considers the number and type of replicate in cloud storage. By this method, the replicates will be distributed to storage servers with minimum total cost from several data service points. This method is fundamental research of low cost distribution for replicates in cloud storage based on data access load.

Download Full-text

Community-based replica management in distributed systems

International Journal of Web Information Systems ◽

10.1108/ijwis-01-2017-0006 ◽

2018 ◽

Vol 14 (1) ◽

pp. 41-61

Author(s):

Masoud Nosrati ◽

Mahmood Fazlali

Keyword(s):

Distributed Systems ◽

Graph Model ◽

Data Replication ◽

Data Access ◽

Time Data ◽

Community Based ◽

Content Type ◽

Communication Latency ◽

Redundant Data ◽

Access Cost

Purpose One of the techniques for improving the performance of distributed systems is data replication, wherein new replicas are created to provide more accessibility, fault tolerance and lower access cost of the data. In this paper, the authors propose a community-based solution for the management of data replication, based on the graph model of communication latency between computing and storage nodes. Communities are the clusters of nodes that the communication latency between the nodes are minimum values. The purpose of this study if to, by using this method, minimize the latency and access cost of the data. Design/methodology/approach This paper used the Louvain algorithm for finding the best communities. In the proposed algorithm, by requesting a file according to the nodes of each community, the cost of accessing the file located out of the applicant’s community was calculated and the results were accumulated. On exceeding the accumulated costs from a specified threshold, a new replica of the file was created in the applicant’s community. Besides, the number of replicas of each file should be limited to prevent the system from creating useless and redundant data. Findings To evaluate the method, four metrics were introduced and measured, including communication latency, response time, data access cost and data redundancy. The results indicated acceptable improvement in all of them. Originality/value So far, this is the first research that aims at managing the replicas via community detection algorithms. It opens many opportunities for further studies in this area.

Download Full-text

Access cost estimation for unified grid storage systems

Proceedings. First Latin American Web Congress ◽

10.1109/grid.2003.1261710 ◽

2004 ◽

Cited By ~ 2

Author(s):

K. Stockinger ◽

H. Stockinger ◽

L. Dutka ◽

R. Slota ◽

D. Nikolow ◽

...

Keyword(s):

Cost Estimation ◽

Storage Systems ◽

Access Cost ◽

Grid Storage

Download Full-text

LargeGraph

ACM Transactions on Architecture and Code Optimization ◽

10.1145/3477603 ◽

2021 ◽

Vol 18 (4) ◽

pp. 1-24

Author(s):

Yu Zhang ◽

Da Peng ◽

Xiaofei Liao ◽

Hai Jin ◽

Haikun Liu ◽

...

Keyword(s):

Large Scale ◽

Data Access ◽

Memory Systems ◽

Graph Processing ◽

Graph Data ◽

Iterative Processing ◽

Access Cost ◽

New States ◽

Long Time ◽

Small Set

Many out-of-GPU-memory systems are recently designed to support iterative processing of large-scale graphs. However, these systems still suffer from long time to converge because of inefficient propagation of active vertices’ new states along graph paths. To efficiently support out-of-GPU-memory graph processing, this work designs a system LargeGraph . Different from existing out-of-GPU-memory systems, LargeGraph proposes a dependency-aware data-driven execution approach , which can significantly accelerate active vertices’ state propagations along graph paths with low data access cost and also high parallelism. Specifically, according to the dependencies between the vertices, it only loads and processes the graph data associated with dependency chains originated from active vertices for smaller access cost. Because most active vertices frequently use a small evolving set of paths for their new states’ propagation because of power-law property, this small set of paths are dynamically identified and maintained and efficiently handled on the GPU to accelerate most propagations for faster convergence, whereas the remaining graph data are handled over the CPU. For out-of-GPU-memory graph processing, LargeGraph outperforms four cutting-edge systems: Totem (5.19–11.62×), Graphie (3.02–9.41×), Garaph (2.75–8.36×), and Subway (2.45–4.15×).

Download Full-text