Overcoming data locality: An in-memory runtime file system with symmetrical data distribution

Concentric Layout, a New Scientific Data Distribution Scheme in Hadoop File System

2010 IEEE Fifth International Conference on Networking, Architecture, and Storage ◽

10.1109/nas.2010.59 ◽

2010 ◽

Author(s):

Lu Cheng ◽

Pengju Shang ◽

Saba Sehrish ◽

Grant Mackey ◽

Jun Wang

Keyword(s):

File System ◽

Data Distribution ◽

Scientific Data ◽

Distribution Scheme

Download Full-text

A Transparent Runtime Data Distribution Engine for OpenMP

Scientific Programming ◽

10.1155/2000/417570 ◽

2000 ◽

Vol 8 (3) ◽

pp. 143-162 ◽

Cited By ~ 4

Author(s):

Dimitrios S. Nikolopoulos ◽

Theodore S. Papatheodorou ◽

Constantine D. Polychronopoulos ◽

Jesús Labarta ◽

Eduard Ayguadé

Keyword(s):

High Performance ◽

Programming Model ◽

Data Distribution ◽

Data Locality ◽

Remote Memory ◽

Main Body ◽

Performance Loss ◽

Page Migration ◽

Runtime Environment ◽

Memory Accesses

This paper makes two important contributions. First, the paper investigates the performance implications of data placement in OpenMP programs running on modern NUMA multiprocessors. Data locality and minimization of the rate of remote memory accesses are critical for sustaining high performance on these systems. We show that due to the low remote-to-local memory access latency ratio of contemporary NUMA architectures, reasonably balanced page placement schemes, such as round-robin or random distribution, incur modest performance losses. Second, the paper presents a transparent, user-level page migration engine with an ability to gain back any performance loss that stems from suboptimal placement of pages in iterative OpenMP programs. The main body of the paper describes how our OpenMP runtime environment uses page migration for implementing implicit data distribution and redistribution schemes without programmer intervention. Our experimental results verify the effectiveness of the proposed framework and provide a proof of concept that it is not necessary to introduce data distribution directives in OpenMP and warrant the simplicity or the portability of the programming model.

Download Full-text

POSTER: MemFS: An in-memory runtime file system with symmetrical data distribution

2014 IEEE International Conference on Cluster Computing (CLUSTER) ◽

10.1109/cluster.2014.6968773 ◽

2014 ◽

Cited By ~ 6

Author(s):

Alexandru Uta ◽

Andreea Sandu ◽

Thilo Kielmann

Keyword(s):

File System ◽

Data Distribution

Download Full-text

VFC: The Vienna Fortran Compiler

Scientific Programming ◽

10.1155/1999/304639 ◽

1999 ◽

Vol 7 (1) ◽

pp. 67-81 ◽

Cited By ~ 34

Author(s):

Siegfried Benkner

Keyword(s):

Message Passing ◽

High Performance ◽

Data Distribution ◽

Data Locality ◽

Performance Measurements ◽

Fortran Compiler ◽

Work Distribution ◽

Local Access ◽

High Level ◽

Access Patterns

High Performance Fortran (HPF) offers an attractive high‐level language interface for programming scalable parallel architectures providing the user with directives for the specification of data distribution and delegating to the compiler the task of generating an explicitly parallel program. Available HPF compilers can handle regular codes quite efficiently, but dramatic performance losses may be encountered for applications which are based on highly irregular, dynamically changing data structures and access patterns. In this paper we introduce the Vienna Fortran Compiler (VFC), a new source‐to‐source parallelization system for HPF+, an optimized version of HPF, which addresses the requirements of irregular applications. In addition to extended data distribution and work distribution mechanisms, HPF+ provides the user with language features for specifying certain information that decisively influence a program’s performance. This comprises data locality assertions, non‐local access specifications and the possibility of reusing runtime‐generated communication schedules of irregular loops. Performance measurements of kernels from advanced applications demonstrate that with a high‐level data parallel language such as HPF+ a performance close to hand‐written message‐passing programs can be achieved even for highly irregular codes.

Download Full-text

Replication Based on Data Locality for Hadoop Distributed File System

Proceedings of 2019 the 9th International Workshop on Computer Science and Engineering ◽

10.18178/wcse.2019.06.098 ◽

2019 ◽

Keyword(s):

File System ◽

Data Locality ◽

Distributed File System ◽

Hadoop Distributed File System

Download Full-text

Automatic Data Distribution for Improving Data Locality on the Cell BE Architecture

Languages and Compilers for Parallel Computing - Lecture Notes in Computer Science ◽

10.1007/978-3-642-13374-9_17 ◽

2010 ◽

pp. 247-262 ◽

Cited By ~ 1

Author(s):

Miao Wang ◽

François Bodin ◽

Sébastien Matz

Keyword(s):

Data Distribution ◽

Data Locality ◽

Automatic Data ◽

Automatic Data Distribution

Download Full-text

Software installation and condition data distribution via CernVM File System in ATLAS

Journal of Physics Conference Series ◽

10.1088/1742-6596/396/3/032030 ◽

2012 ◽

Vol 396 (3) ◽

pp. 032030 ◽

Cited By ~ 8

Author(s):

A De Salvo ◽

A De Silva ◽

D Benjamin ◽

J Blomer ◽

P Buncic ◽

...

Keyword(s):

File System ◽

Data Distribution

Download Full-text

HaRD: a heterogeneity-aware replica deletion for HDFS

Journal Of Big Data ◽

10.1186/s40537-019-0256-6 ◽

2019 ◽

Vol 6 (1) ◽

Cited By ~ 1

Author(s):

Hilmi Egemen Ciritoglu ◽

John Murphy ◽

Christina Thorpe

Keyword(s):

Data Distribution ◽

Large Data ◽

Data Replication ◽

Data Locality ◽

Data Availability ◽

Unbalanced Data ◽

Data Sets ◽

Heterogeneous Cluster ◽

Replication Factor ◽

Hadoop Distributed File System

Abstract The Hadoop distributed file system (HDFS) is responsible for storing very large data-sets reliably on clusters of commodity machines. The HDFS takes advantage of replication to serve data requested by clients with high throughput. Data replication is a trade-off between better data availability and higher disk usage. Recent studies propose different data replication management frameworks that alter the replication factor of files dynamically in response to the popularity of the data, keeping more replicas for in-demand data to enhance the overall performance of the system. When data gets less popular, these schemes reduce the replication factor, which changes the data distribution and leads to unbalanced data distribution. Such an unbalanced data distribution causes hot spots, low data locality and excessive network usage in the cluster. In this work, we first confirm that reducing the replication factor causes unbalanced data distribution when using Hadoop’s default replica deletion scheme. Then, we show that even keeping a balanced data distribution using WBRD (data-distribution-aware replica deletion scheme) that we proposed in previous work performs sub-optimally on heterogeneous clusters. In order to overcome this issue, we propose a heterogeneity-aware replica deletion scheme (HaRD). HaRD considers the nodes’ processing capabilities when deleting replicas; hence it stores more replicas on the more powerful nodes. We implemented HaRD on top of HDFS and conducted a performance evaluation on a 23-node dedicated heterogeneous cluster. Our results show that HaRD reduced execution time by up to 60%, and 17% when compared to Hadoop and WBRD, respectively.

Download Full-text

An Adaptive Data Distribution Through Tree Rules in Frequent Pattern Mining

International Journal of Scientific Research in Computer Science Engineering and Information Technology ◽

10.32628/cseit183894 ◽

2018 ◽

pp. 300-305

Keyword(s):

Information Sharing ◽

Pattern Mining ◽

Data Distribution ◽

Frequent Pattern Mining ◽

Frequent Pattern ◽

General Development ◽

Secure Information ◽

Evaluation Parameters ◽

Secure Information Sharing

Information sharing among the associations is a general development in a couple of zones like business headway and exhibiting. As bit of the touchy principles that ought to be kept private may be uncovered and such disclosure of delicate examples may impacts the advantages of the association that have the data. Subsequently the standards which are delicate must be secured before sharing the data. In this paper to give secure information sharing delicate guidelines are bothered first which was found by incessant example tree. Here touchy arrangement of principles are bothered by substitution. This kind of substitution diminishes the hazard and increment the utility of the dataset when contrasted with different techniques. Examination is done on certifiable dataset. Results shows that proposed work is better as appear differently in relation to various past strategies on the introduce of evaluation parameters.

Download Full-text

A Study On Ext4 file system meta-data management method for effective mobile commerce

The e-Business Studies ◽

10.15719/geba.15.5.201410.73 ◽

2014 ◽

Vol 15 (5) ◽

pp. 73-90

Author(s):

민연아

Keyword(s):

Data Management ◽

File System ◽

Mobile Commerce ◽

Meta Data ◽

Management Method

Download Full-text