Adaptive Performance Modeling of Data-intensive Workloads for Resource Provisioning in Virtualized Environment

Hosein Mohamamdi Makrani; Hossein Sayadi; Najmeh Nazari; Sai Mnoj Pudukotai Dinakarrao; Avesta Sasan; Tinoosh Mohsenin; Setareh Rafatirad; Houman Homayoun

doi:10.1145/3442696

Adaptive Performance Modeling of Data-intensive Workloads for Resource Provisioning in Virtualized Environment

ACM Transactions on Modeling and Performance Evaluation of Computing Systems ◽

10.1145/3442696 ◽

2021 ◽

Vol 5 (4) ◽

pp. 1-24

Author(s):

Hosein Mohamamdi Makrani ◽

Hossein Sayadi ◽

Najmeh Nazari ◽

Sai Mnoj Pudukotai Dinakarrao ◽

Avesta Sasan ◽

...

Keyword(s):

Optimization Technique ◽

Big Data Analytics ◽

Resource Provisioning ◽

Optimal Configuration ◽

Adaptive Performance ◽

Data Intensive ◽

Resource Matching ◽

Cross Platform ◽

The Cost ◽

Data Intensive Applications

The processing of data-intensive workloads is a challenging and time-consuming task that often requires massive infrastructure to ensure fast data analysis. The cloud platform is the most popular and powerful scale-out infrastructure to perform big data analytics and eliminate the need to maintain expensive and high-end computing resources at the user side. The performance and the cost of such infrastructure depend on the overall server configuration, such as processor, memory, network, and storage configurations. In addition to the cost of owning or maintaining the hardware, the heterogeneity in the server configuration further expands the selection space, leading to non-convergence. The challenge is further exacerbated by the dependency of the application’s performance on the underlying hardware. Despite an increasing interest in resource provisioning, few works have been done to develop accurate and practical models to proactively predict the performance of data-intensive applications corresponding to the server configuration and provision a cost-optimal configuration online. In this work, through a comprehensive real-system empirical analysis of performance, we address these challenges by introducing ProMLB: a proactive machine-learning-based methodology for resource provisioning. We first characterize diverse types of data-intensive workloads across different types of server architectures. The characterization aids in accurately capture applications’ behavior and train a model for prediction of their performance. Then, ProMLB builds a set of cross-platform performance models for each application. Based on the developed predictive model, ProMLB uses an optimization technique to distinguish close-to-optimal configuration to minimize the product of execution time and cost. Compared to the oracle scheduler, ProMLB achieves 91% accuracy in terms of application-resource matching. On average, ProMLB improves the performance and resource utilization by 42.6% and 41.1%, respectively, compared to baseline scheduler. Moreover, ProMLB improves the performance per cost by 2.5× on average.

Download Full-text

Resource provisioning for data-intensive applications with deadline constraints on hybrid clouds using Aneka

Future Generation Computer Systems ◽

10.1016/j.future.2017.05.042 ◽

2018 ◽

Vol 79 ◽

pp. 765-775 ◽

Cited By ~ 37

Author(s):

Adel Nadjaran Toosi ◽

Richard O. Sinnott ◽

Rajkumar Buyya

Keyword(s):

Resource Provisioning ◽

Hybrid Clouds ◽

Data Intensive ◽

Deadline Constraints ◽

Data Intensive Applications

Download Full-text

Decoding Big Data Analytics for Emerging Business Through Data-Intensive Applications and Business Intelligence

Big Data Analytics for Sustainable Computing - Advances in Data Mining and Database Management ◽

10.4018/978-1-5225-9750-6.ch004 ◽

2020 ◽

pp. 66-80

Author(s):

Vinay Kellengere Shankarnarayan

Keyword(s):

Big Data ◽

Business Intelligence ◽

Business Models ◽

Big Data Analytics ◽

Research Area ◽

Future Perspective ◽

Massive Data ◽

Data Intensive ◽

Tools And Techniques ◽

Data Intensive Applications

In recent years, big data have gained massive popularity among researchers, decision analysts, and data architects in any enterprise. Big data had been just another way of saying analytics. In today's world, the company's capital lies with big data. Think of worlds huge companies. The value they offer comes from their data, which they analyze for their proactive benefits. This chapter showcases the insight of big data and its tools and techniques the companies have adopted to deal with data problems. The authors also focus on framework and methodologies to handle the massive data in order to make more accurate and precise decisions. The chapter begins with the current organizational scenario and what is meant by big data. Next, it draws out various challenges faced by organizations. The authors also observe big data business models and different frameworks available and how it has been categorized and finally the conclusion discusses the challenges and what is the future perspective of this research area.

Download Full-text

Heuristic Data Placement for Data-Intensive Applications in Heterogeneous Cloud

Journal of Electrical and Computer Engineering ◽

10.1155/2016/3516358 ◽

2016 ◽

Vol 2016 ◽

pp. 1-8 ◽

Cited By ~ 4

Author(s):

Qing Zhao ◽

Congcong Xiong ◽

Peng Wang

Keyword(s):

Clustering Algorithm ◽

Recursive Partitioning ◽

Data Placement ◽

Data Intensive ◽

High Bandwidth ◽

Tree Data ◽

Placement Algorithm ◽

Heterogeneous Cloud ◽

The Cost ◽

Data Intensive Applications

Data placement is an important issue which aims at reducing the cost of internode data transfers in cloud especially for data-intensive applications, in order to improve the performance of the entire cloud system. This paper proposes an improved data placement algorithm for heterogeneous cloud environments. In the initialization phase, a data clustering algorithm based on data dependency clustering and recursive partitioning has been presented, and both the factor of data size and fixed position are incorporated. And then a heuristic tree-to-tree data placement strategy is advanced in order to make frequent data movements occur on high-bandwidth channels. Simulation results show that, compared with two classical strategies, this strategy can effectively reduce the amount of data transmission and its time consumption during execution.

Download Full-text

Towards Data Intensive Many-Task Computing

Advances in Systems Analysis, Software Engineering, and High Performance Computing - Data Intensive Distributed Computing ◽

10.4018/978-1-61520-971-2.ch002 ◽

2012 ◽

pp. 28-73 ◽

Cited By ~ 8

Author(s):

Ioan Raicu ◽

Ian Foster ◽

Yong Zhao ◽

Alex Szalay ◽

Philip Little ◽

...

Keyword(s):

High Performance ◽

File Systems ◽

Data Locality ◽

Resource Provisioning ◽

Parallel File Systems ◽

Data Intensive ◽

Dynamic Resource Provisioning ◽

Rate Of Increase ◽

Parallel File ◽

Data Intensive Applications

Many-task computing aims to bridge the gap between two computing paradigms, high throughput computing and high performance computing. Traditional techniques to support many-task computing commonly found in scientific computing (i.e. the reliance on parallel file systems with static configurations) do not scale to today’s largest systems for data intensive application, as the rate of increase in the number of processors per system is outgrowing the rate of performance increase of parallel file systems. In this chapter, the authors argue that in such circumstances, data locality is critical to the successful and efficient use of large distributed systems for data-intensive applications. They propose a “data diffusion” approach to enable data-intensive many-task computing. They define an abstract model for data diffusion, define and implement scheduling policies with heuristics that optimize real world performance, and develop a competitive online caching eviction policy. They also offer many empirical experiments to explore the benefits of data diffusion, both under static and dynamic resource provisioning, demonstrating approaches that improve both performance and scalability.

Download Full-text

Adaptive performance prediction for distributed data-intensive applications

Proceedings of the 1999 ACM/IEEE conference on Supercomputing (CDROM) - Supercomputing '99 ◽

10.1145/331532.331568 ◽

1999 ◽

Cited By ~ 12

Author(s):

Marcio Faerman ◽

Alan Su ◽

Richard Wolski ◽

Francine Berman

Keyword(s):

Performance Prediction ◽

Distributed Data ◽

Adaptive Performance ◽

Data Intensive ◽

Data Intensive Applications

Download Full-text

Data Intensive Cloud Computing

Big Data ◽

10.4018/978-1-4666-9840-6.ch029 ◽

2016 ◽

pp. 639-654

Author(s):

Jayalakshmi D. S. ◽

R. Srinivasan ◽

K. G. Srinivasa

Keyword(s):

Cloud Computing ◽

Big Data ◽

Cluster Computing ◽

Resource Provisioning ◽

Data Intensive ◽

Scientific Value ◽

Data Intensive Applications ◽

Cloud Applications ◽

Problem Data ◽

Huge Challenge

Processing Big Data is a huge challenge for today's technology. There is a need to find, apply and analyze new ways of computing to make use of the Big Data so as to derive business and scientific value from it. Cloud computing with its promise of seemingly infinite computing resources is seen as the solution to this problem. Data Intensive computing on cloud builds upon the already mature parallel and distributed computing technologies such HPC, grid and cluster computing. However, handling Big Data in the cloud presents its own challenges. In this chapter, we analyze issues specific to data intensive cloud computing and provides a study on available solutions in programming models, data distribution and replication, resource provisioning and scheduling with reference to data intensive applications in cloud. Future directions for further research enabling data intensive cloud applications in cloud environment are identified.

Download Full-text

Intelligent Secure Storage Mechanism for Big Data

Webology ◽

10.14704/web/v18si01/web18057 ◽

2021 ◽

Vol 18 (Special Issue 01) ◽

pp. 246-261

Author(s):

K.R. Remesh Babu ◽

K.P. Madhu

Keyword(s):

Big Data ◽

Data Storage ◽

Big Data Analytics ◽

Business Organizations ◽

Storage Mechanism ◽

Data Intensive ◽

Secure Storage ◽

Huge Data ◽

Efficient Data ◽

Data Intensive Applications

The management of big data became more important due to the wide spread adoption of internet of things in various fields. The developments in technology, science, human habits, etc., generates massive amount of data, so it is increasingly important to store and protect these data from attacks. Big data analytics is now a hot topic. The data storage facility provided by the cloud computing enabled business organizations to overcome the burden of huge data storage and maintenance. Also, several distributed cloud applications supports them to analyze this data for taking appropriate decisions. The dynamic growth of data and data intensive applications demands an efficient intelligent storage mechanism for big data. The proposed system analyzes IP packets for vulnerabilities and classifies data nodes as reliable and unreliable nodes for the efficient data storage. The proposed Apriori algorithm based method automatically classifies the nodes for intelligent secure storage mechanism for the distributed big data storage.

Download Full-text

Data Intensive Cloud Computing

Advances in Systems Analysis, Software Engineering, and High Performance Computing - Advanced Research on Cloud Computing Design and Applications ◽

10.4018/978-1-4666-8676-2.ch019 ◽

2015 ◽

pp. 305-320

Author(s):

Jayalakshmi D. S. ◽

R. Srinivasan ◽

K. G. Srinivasa

Keyword(s):

Cloud Computing ◽

Big Data ◽

Cluster Computing ◽

Resource Provisioning ◽

Data Intensive ◽

Scientific Value ◽

Data Intensive Applications ◽

Cloud Applications ◽

Problem Data ◽

Huge Challenge

Download Full-text

AxCEM: Designing Approximate Comparator-Enabled Multipliers

Journal of Low Power Electronics and Applications ◽

10.3390/jlpea10010009 ◽

2020 ◽

Vol 10 (1) ◽

pp. 9

Author(s):

Samar Ghabraei ◽

Morteza Rezaalipour ◽

Masoud Dehyadegari ◽

Mahdi Nazm Bojnordi

Keyword(s):

Power Dissipation ◽

Floating Point ◽

Computational Error ◽

Approximate Computing ◽

Computing Systems ◽

Error Resilient ◽

Data Intensive ◽

Efficiency Performance ◽

The Cost ◽

Data Intensive Applications

Floating-point multipliers have been the key component of nearly all forms of modern computing systems. Most data-intensive applications, such as deep neural networks (DNNs), expend the majority of their resources and energy budget for floating-point multiplication. The error-resilient nature of these applications often suggests employing approximate computing to improve the energy-efficiency, performance, and area of floating-point multipliers. Prior work has shown that employing hardware-oriented approximation for computing the mantissa product may result in significant system energy reduction at the cost of an acceptable computational error. This article examines the design of an approximate comparator used for preforming mantissa products in the floating-point multipliers. First, we illustrate the use of exact comparators for enhancing power, area, and delay of floating-point multipliers. Then, we explore the design space of approximate comparators for designing efficient approximate comparator-enabled multipliers (AxCEM). Our simulation results indicate that the proposed architecture can achieve a 66% reduction in power dissipation, another 66% reduction in die-area, and a 71% decrease in delay. As compared with the state-of-the-art approximate floating-point multipliers, the accuracy loss in DNN applications due to the proposed AxCEM is less than 0.06%.

Download Full-text

Using Vehicles as Fog Infrastructures for Transportation Cyber-Physical Systems (T-CPS)

International Journal of Software Science and Computational Intelligence ◽

10.4018/ijssci.2019010104 ◽

2019 ◽

Vol 11 (1) ◽

pp. 47-69 ◽

Cited By ~ 10

Author(s):

Md Muzakkir Hussain ◽

M.M. S Beg

Keyword(s):

Intelligent Transportation System ◽

Fog Computing ◽

Big Data Analytics ◽

Future Research ◽

Connected Vehicles ◽

Location Awareness ◽

Data Intensive ◽

Service Oriented ◽

Computational Resources ◽

The Cost

The advent of intelligent vehicular applications and IoT technologies gives rise to data-intensive challenges across different architectural layers of an intelligent transportation system (ITS). Without powerful communication and computational infrastructure, various vehicular applications and services will still stay in the concept phase and cannot be put into practice in daily life. The current cloud computing and cellular set-ups are far from perfect because they are highly dependent on, and bear the cost of additional infrastructure deployment. Thus, the geo-distributed ITS components require a paradigm shift from centralized cloud-scale processing to edge centered fog computing (FC) paradigms. FC outspreads the computing facilities into the edge of a network, offering location-awareness, latency-sensitive monitoring, and intelligent control. In this article, the authors identify the mission-critical computing needs of the next generation ITS applications and highlight the scopes of FC based solutions towards addressing them. Then, the authors discuss the scenarios where the underutilized communication and computational resources available in connected vehicles can be brought in to perform the role of FC infrastructures. Then the authors present a service-oriented software architecture (SOA) for FC-based Big Data Analytics in ITS applications. The authors also provide a detailed analysis of the potential challenges of using connected vehicles as FC infrastructures along with future research directions.

Download Full-text