G-Hadoop: MapReduce across distributed data centers for data-intensive computing

In the distributed data-intensive computing environment, relegating certain assignments to specific machines in a protected way is a major test for the employment planning issue. The unpredictability of this issue increments with the size of the activity and it is hard to understand viably. A few metaheuristic calculations including particle swarm optimization (PSO) strategy and variable neighborhood particle swarm optimization VNPSO) system are utilized to tackle the employment planning issue in distributed computing. While allocating assignments to the machines, to fulfill the security requirements and to limit the cost capacity, we proposed an altered PSO with a scout adjustment (MPSO-SA) calculation which utilized a cyclic term called change administrator to get the best cost capacity. The exhibition of the proposed MPSO-SA booking component is contrasted and the Genetic calculation (GA), PSO and VNPSO systems and the exploratory outcome demonstrate that the proposed technique diminishes the likelihood of hazard with security requirements and it has preferable intermingling property over the current conventions.

Download Full-text

Concerto: Dynamic Processor Scaling for Distributed Data Systems with Replication

Applied Sciences ◽

10.3390/app11125731 ◽

2021 ◽

Vol 11 (12) ◽

pp. 5731

Author(s):

Jinsu Lee ◽

Eunji Lee

Keyword(s):

Data Centers ◽

Energy Savings ◽

Distributed Data ◽

Data Systems ◽

Data Intensive ◽

Performance Constraints ◽

Fast Running ◽

Drastic Increase ◽

Dynamic Voltage ◽

Voltage Frequency

A surge of interest in data-intensive computing has led to a drastic increase in the demand for data centers. Given this growing popularity, data centers are becoming a primary contributor to the increased consumption of energy worldwide. To mitigate this problem, this paper revisits DVFS (Dynamic Voltage Frequency Scaling), a well-known technique to reduce the energy usage of processors, from the viewpoint of distributed systems. Distributed data systems typically adopt a replication facility to provide high availability and short latency. In this type of architecture, the replicas are maintained in an asynchronous manner, while the master synchronously operates via user requests. Based on this relaxation constraint of replica, we present a novel DVFS technique called Concerto, which intentionally scales down the frequency of processors operating for the replicas. This mechanism can achieve considerable energy savings without an increase in the user-perceived latency. We implemented Concerto on Redis 6.0.1, a commercial-level distributed key-value store, demonstrating that all associated performance issues were resolved. To prevent a delay in read queries assigned to the replicas, we offload the independent part of the read operation to the fast-running thread. We also empirically demonstrate that the decreased performance of the replica does not cause an increase of the replication lag because the inherent load unbalance between the master and replica hides the increased latency of the replica. Performance evaluations with micro and real-world benchmarks show that Redis saves 32% on average and up to 51% of energy with Concerto under various workloads, with minor performance losses in the replicas. Despite numerous studies of the energy saving in data centers, to the best of our best knowledge, Concerto is the first approach that considers clock-speed scaling at the aggregate level, exploiting heterogeneous performance constraints across data nodes.

Download Full-text

Swarm scheduling approaches for work-flow applications with security constraints in distributed data-intensive computing environments

Information Sciences ◽

10.1016/j.ins.2011.12.032 ◽

2012 ◽

Vol 192 ◽

pp. 228-243 ◽

Cited By ~ 49

Author(s):

Hongbo Liu ◽

Ajith Abraham ◽

Václav Snášel ◽

Seán McLoone

Keyword(s):

Work Flow ◽

Distributed Data ◽

Data Intensive Computing ◽

Data Intensive ◽

Computing Environments

Download Full-text

RDD-Eclat: Approaches to Parallelize Eclat Algorithm on Spark RDD Framework (Extended Version)

10.21203/rs.3.rs-1079576/v1 ◽

2021 ◽

Author(s):

Pankaj Singh ◽

Sudhakar Singh ◽

P K Mishra ◽

Rakhi Garg

Keyword(s):

Data Processing ◽

Iterative Algorithms ◽

Frequent Itemset ◽

Experimental Results ◽

Distributed Data ◽

Data Intensive ◽

Hadoop Mapreduce ◽

Distributed Data Processing ◽

Benchmark Datasets ◽

Processing Framework

Abstract Frequent itemset mining (FIM) is a highly computational and data intensive algorithm. Therefore, parallel and distributed FIM algorithms have been designed to process large volume of data in a reduced time. Recently, a number of FIM algorithms have been designed on Hadoop MapReduce, a distributed big data processing framework. But, due to heavy disk I/O, MapReduce is found to be inefficient for the highly iterative FIM algorithms. Therefore, Spark, a more efficient distributed data processing framework, has been developed with in-memory computation and resilient distributed dataset (RDD) features to support the iterative algorithms. On this framework, Apriori and FP-Growth based FIM algorithms have been designed on the Spark RDD framework, but Eclat-based algorithm has not been explored yet. In this paper, RDD-Eclat, a parallel Eclat algorithm on the Spark RDD framework is proposed with its five variants. The proposed algorithms are evaluated on the various benchmark datasets, and the experimental results show that RDD-Eclat outperforms the Spark-based Apriori by many times. Also, the experimental results show the scalability of the proposed algorithms on increasing the number of cores and size of the dataset.

Download Full-text