Stochastic Approach for Secondary Storage Data Access Cost Estimation

Author(s):  
Lukasz Dutka ◽  
Jacek Kitowski
1972 ◽  
Author(s):  
Stuart C. Schaffner ◽  
David B. Loveman ◽  
Robert E. Millstein

2020 ◽  
Author(s):  
Chunlin Li ◽  
Yihan Zhang ◽  
Xiaomei Qu ◽  
Youlong Luo

Abstract In recent years, with the continuous development of internet of things and cloud computing technologies, data intensive applications have gotten more and more attention. In the distributed cloud environment, the access of massive data is often the bottleneck of its performance. It is very significant to propose a suitable data deployment algorithm for improving the utilization of cloud server and the efficiency of task scheduling. In order to reduce data access cost and data deployment time, an optimal data deployment algorithm is proposed in this paper. By modeling and analyzing the data deployment problem, the problem is solved by using the improved genetic algorithm. After the data are well deployed, aiming at improving the efficiency of task scheduling, a task progress aware scheduling algorithm is proposed in this paper in order to make the speculative execution mechanism more accurate. Firstly, the threshold to detect the slow tasks and fast nodes are set. Then, the slow tasks and fast nodes are detected by calculating the remaining time of the tasks and the real-time processing ability of the nodes, respectively. Finally, the backup execution of the slow tasks is performed on the fast nodes. While satisfying the load balancing of the system, the experimental results show that the proposed algorithms can obviously reduce data access cost, service-level agreement (SLA) default rate and the execution time of the system and optimize data deployment for improving scheduling efficiency in distributed clouds.


2012 ◽  
Vol 203 ◽  
pp. 24-43 ◽  
Author(s):  
SangKeun Lee ◽  
Byung-Gul Ryu ◽  
Kun-Lung Wu

2003 ◽  
Vol 03 (01) ◽  
pp. 95-117 ◽  
Author(s):  
SUNIL PRABHAKAR ◽  
RAHUL CHARI

Multimedia data poses challenges for efficient storage and retrieval due to its large size and playback timing requirements. For applications that store very large volumes of multimedia data, hierarchical storage offers a scalable and economical alternative to store data on magnetic disks. In a hierarchical storage architecture data is stored on a tape or optical disk based tertiary storage layer with the secondary storage disks serving as a cache or buffer. Due to the need for swapping media on drives, retrieving multimedia data from tertiary storage can potentially result in large delays before playback (startup latency) begins as well as during playback (jitter). In this paper we address the important problem of reducing startup latency and jitter for very large multimedia repositories. We propose that secondary storage should not be used as a cache in the traditional manner — instead, most of the secondary storage should be used to permanently store partial objects. Furthermore, replication is employed at the tertiary storage level to avoid expensive media switching. In particular, we show that by saving the initial segments of documents permanently on secondary storage, and replicating them on tertiary storage, startup latency can be significantly reduced. Since we are effectively reducing the amount of secondary storage available for buffering the data from tertiary storage, an increase in jitter may be expected. However, our results show that the technique also reduces jitter, in contrast to the expected behavior. Our technique exploits the pattern of data access. Advance knowledge of the access pattern is helpful, but not essential. Lack of this information or changes in access patterns are handled through adaptive techniques. Our study addresses both single- and multiple-user scenarios. Our results show that startup latency can be reduced by as much as 75% and jitter practically eliminated through the use of these techniques.


2012 ◽  
Vol 198-199 ◽  
pp. 1657-1662 ◽  
Author(s):  
Xiao Hong Song

According to the characteristics of data access in Internet of things, we put forward a replicate distribution method of minimum cost which considers the number and type of replicate in cloud storage. By this method, the replicates will be distributed to storage servers with minimum total cost from several data service points. This method is fundamental research of low cost distribution for replicates in cloud storage based on data access load.


2018 ◽  
Vol 14 (1) ◽  
pp. 41-61
Author(s):  
Masoud Nosrati ◽  
Mahmood Fazlali

Purpose One of the techniques for improving the performance of distributed systems is data replication, wherein new replicas are created to provide more accessibility, fault tolerance and lower access cost of the data. In this paper, the authors propose a community-based solution for the management of data replication, based on the graph model of communication latency between computing and storage nodes. Communities are the clusters of nodes that the communication latency between the nodes are minimum values. The purpose of this study if to, by using this method, minimize the latency and access cost of the data. Design/methodology/approach This paper used the Louvain algorithm for finding the best communities. In the proposed algorithm, by requesting a file according to the nodes of each community, the cost of accessing the file located out of the applicant’s community was calculated and the results were accumulated. On exceeding the accumulated costs from a specified threshold, a new replica of the file was created in the applicant’s community. Besides, the number of replicas of each file should be limited to prevent the system from creating useless and redundant data. Findings To evaluate the method, four metrics were introduced and measured, including communication latency, response time, data access cost and data redundancy. The results indicated acceptable improvement in all of them. Originality/value So far, this is the first research that aims at managing the replicas via community detection algorithms. It opens many opportunities for further studies in this area.


Author(s):  
K. Stockinger ◽  
H. Stockinger ◽  
L. Dutka ◽  
R. Slota ◽  
D. Nikolow ◽  
...  

2021 ◽  
Vol 18 (4) ◽  
pp. 1-24
Author(s):  
Yu Zhang ◽  
Da Peng ◽  
Xiaofei Liao ◽  
Hai Jin ◽  
Haikun Liu ◽  
...  

Many out-of-GPU-memory systems are recently designed to support iterative processing of large-scale graphs. However, these systems still suffer from long time to converge because of inefficient propagation of active vertices’ new states along graph paths. To efficiently support out-of-GPU-memory graph processing, this work designs a system LargeGraph . Different from existing out-of-GPU-memory systems, LargeGraph proposes a dependency-aware data-driven execution approach , which can significantly accelerate active vertices’ state propagations along graph paths with low data access cost and also high parallelism. Specifically, according to the dependencies between the vertices, it only loads and processes the graph data associated with dependency chains originated from active vertices for smaller access cost. Because most active vertices frequently use a small evolving set of paths for their new states’ propagation because of power-law property, this small set of paths are dynamically identified and maintained and efficiently handled on the GPU to accelerate most propagations for faster convergence, whereas the remaining graph data are handled over the CPU. For out-of-GPU-memory graph processing, LargeGraph outperforms four cutting-edge systems: Totem (5.19–11.62×), Graphie (3.02–9.41×), Garaph (2.75–8.36×), and Subway (2.45–4.15×).


Sign in / Sign up

Export Citation Format

Share Document