A new data-intensive task scheduling in optorsim, an open source grid simulator

Scheduling under Open stack – The Current State and Future Enhancements

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b1481.078219 ◽

2019 ◽

Vol 8 (2) ◽

pp. 375-382

Keyword(s):

Cloud Computing ◽

Open Source ◽

Task Scheduling ◽

Computing System ◽

Important Task ◽

Shared Resources ◽

Computing Systems ◽

Current State ◽

Cloud Computing System ◽

Research Findings

Cloud computing is being heavily used for implementing different kinds of applications. Many of the client applications are being migrated to cloud for the reasons of cost and elasticity. Cloud computing is generally implemented on distributing computing wherein the Physical servers are heavily distributed considering both hardware and software, the connectivity among which is established through Internet. The cloud computing systems as such have many physical servers which contain many resources. The resources can be made to be shared among many users who are the tenants to the cloud computing system. The resources can be virtualized so as to provide shared resources to the clients. Scheduling is one of the most important task of a cloud computing system which is concerned with task scheduling, resource scheduling and scheduling Virtual Machin Migration. It is important to understand the issue of scheduling within a cloud computing system more in-depth so that any improvements with reference to scheduling can be investigated and implemented. For carrying in depth research, an OPEN source based cloud computing system is needed. OPEN STACK is one such OPEN source based cloud computing system that can be considered for experimenting the research findings that are related to cloud computing system. In this paper an overview on the way the Scheduling aspect per say has been implemented within OPEN STACK cloud computing system

Download Full-text

NoSQL Databases

Advances in Data Mining and Database Management - Handbook of Research on Cloud Infrastructures for Big Data Analytics ◽

10.4018/978-1-4666-5864-6.ch008 ◽

2014 ◽

pp. 186-215 ◽

Cited By ~ 2

Author(s):

Ganesh Chandra Deka

Keyword(s):

Cloud Computing ◽

Big Data ◽

Data Processing ◽

Open Source ◽

Data Storage ◽

Big Data Processing ◽

Nosql Databases ◽

Data Intensive ◽

Huge Data ◽

Data Intensive Applications

NoSQL databases are designed to meet the huge data storage requirements of cloud computing and big data processing. NoSQL databases have lots of advanced features in addition to the conventional RDBMS features. Hence, the “NoSQL” databases are popularly known as “Not only SQL” databases. A variety of NoSQL databases having different features to deal with exponentially growing data-intensive applications are available with open source and proprietary option. This chapter discusses some of the popular NoSQL databases and their features on the light of CAP theorem.

Download Full-text

A hybrid evolutionary algorithm for task scheduling and data assignment of data-intensive scientific workflows on clouds

Future Generation Computer Systems ◽

10.1016/j.future.2017.05.017 ◽

2017 ◽

Vol 76 ◽

pp. 1-17 ◽

Cited By ~ 18

Author(s):

Luan Teylo ◽

Ubiratam de Paula ◽

Yuri Frota ◽

Daniel de Oliveira ◽

Lúcia M.A. Drummond

Keyword(s):

Evolutionary Algorithm ◽

Task Scheduling ◽

Scientific Workflows ◽

Hybrid Evolutionary Algorithm ◽

Data Intensive

Download Full-text

Approximation algorithms and heuristics for task scheduling in data-intensive distributed systems

International Transactions in Operational Research ◽

10.1111/itor.12527 ◽

2018 ◽

Vol 25 (5) ◽

pp. 1417-1441 ◽

Cited By ~ 1

Author(s):

Marcelo G. Póvoa ◽

Eduardo C. Xavier

Keyword(s):

Distributed Systems ◽

Approximation Algorithms ◽

Task Scheduling ◽

Data Intensive

Download Full-text

Cost- and Time-Based Data Deployment for Improving Scheduling Efficiency in Distributed Clouds

The Computer Journal ◽

10.1093/comjnl/bxaa121 ◽

2020 ◽

Author(s):

Chunlin Li ◽

Yihan Zhang ◽

Xiaomei Qu ◽

Youlong Luo

Keyword(s):

Task Scheduling ◽

Scheduling Algorithm ◽

Service Level Agreement ◽

Data Access ◽

Service Level ◽

Speculative Execution ◽

Improved Genetic Algorithm ◽

Real Time Processing ◽

Data Intensive ◽

Access Cost

Abstract In recent years, with the continuous development of internet of things and cloud computing technologies, data intensive applications have gotten more and more attention. In the distributed cloud environment, the access of massive data is often the bottleneck of its performance. It is very significant to propose a suitable data deployment algorithm for improving the utilization of cloud server and the efficiency of task scheduling. In order to reduce data access cost and data deployment time, an optimal data deployment algorithm is proposed in this paper. By modeling and analyzing the data deployment problem, the problem is solved by using the improved genetic algorithm. After the data are well deployed, aiming at improving the efficiency of task scheduling, a task progress aware scheduling algorithm is proposed in this paper in order to make the speculative execution mechanism more accurate. Firstly, the threshold to detect the slow tasks and fast nodes are set. Then, the slow tasks and fast nodes are detected by calculating the remaining time of the tasks and the real-time processing ability of the nodes, respectively. Finally, the backup execution of the slow tasks is performed on the fast nodes. While satisfying the load balancing of the system, the experimental results show that the proposed algorithms can obviously reduce data access cost, service-level agreement (SLA) default rate and the execution time of the system and optimize data deployment for improving scheduling efficiency in distributed clouds.

Download Full-text

Task Scheduling and File Replication for Data-Intensive Jobs with Batch-shared I/O

2006 15th IEEE International Conference on High Performance Distributed Computing ◽

10.1109/hpdc.2006.1652155 ◽

2006 ◽

Cited By ~ 6

Author(s):

G. Khanna ◽

N. Vydyanathan ◽

U. Catalyurek ◽

T. Kurc ◽

S. Krishnamoorthy ◽

...

Keyword(s):

Task Scheduling ◽

Data Intensive ◽

File Replication

Download Full-text

Locality and Network-Aware Reduce Task Scheduling for Data-Intensive Applications

2014 5th International Workshop on Data-Intensive Computing in the Clouds ◽

10.1109/datacloud.2014.10 ◽

2014 ◽

Cited By ~ 8

Author(s):

Engin Arslan ◽

Mrigank Shekhar ◽

Tevfik Kosar

Keyword(s):

Task Scheduling ◽

Data Intensive ◽

Data Intensive Applications

Download Full-text

Collaborative cache allocation and task scheduling for data-intensive applications in edge computing environment

Future Generation Computer Systems ◽

10.1016/j.future.2019.01.007 ◽

2019 ◽

Vol 95 ◽

pp. 249-264 ◽

Cited By ~ 27

Author(s):

Chunlin Li ◽

Jianhang Tang ◽

Hengliang Tang ◽

Youlong Luo

Keyword(s):

Task Scheduling ◽

Edge Computing ◽

Computing Environment ◽

Data Intensive ◽

Data Intensive Applications ◽

Cache Allocation ◽

And Task

Download Full-text

A Trust Model-Based Task Scheduling Algorithm for Data-Intensive Application

2011 Sixth Annual Chinagrid Conference ◽

10.1109/chinagrid.2011.16 ◽

2011 ◽

Cited By ~ 2

Author(s):

Yujiex Xu ◽

Wenyu Qu

Keyword(s):

Task Scheduling ◽

Scheduling Algorithm ◽

Trust Model ◽

Data Intensive ◽

Model Based ◽

Task Scheduling Algorithm ◽

Data Intensive Application

Download Full-text

The Types, Roles, and Practices of Documentation in Data Analytics Open Source Software Libraries: A Collaborative Ethnography of Documentation Work

10.31235/osf.io/2fdhr ◽

2018 ◽

Author(s):

R. Stuart Geiger ◽

Nelle Varoquaux ◽

Charlotte Cabasse ◽

Christopher Holdgraf

Keyword(s):

Open Source ◽

Open Source Software ◽

Data Analytics ◽

Qualitative Interviews ◽

Data Intensive ◽

Software Documentation ◽

Software Libraries ◽

Changing Practices ◽

Collaborative Ethnography ◽

Documentation Work

Computational research and data analytics increasingly relies on complex ecosystems of open source software (OSS) "libraries" -- curated collections of reusable code that programmers import to perform a specific task. Software documentation for these libraries is crucial in helping programmers/analysts know what libraries are available and how to use them. Yet documentation for open source software libraries is widely considered low-quality. This article is a collaboration between CSCW researchers and contributors to data analytics OSS libraries, based on ethnographic fieldwork and qualitative interviews. We examine several issues around the formats, practices, and challenges around documentation in these largely volunteer-based projects. There are many different kinds and formats of documentation that exist around such libraries, which play a variety of educational, promotional, and organizational roles. The work behind documentation is similarly multifaceted, including writing, reviewing, maintaining, and organizing documentation. Different aspects of documentation work require contributors to have different sets of skills and overcome various social and technical barriers. Finally, most of our interviewees do not report high levels of intrinsic enjoyment for doing documentation work (compared to writing code). Their motivation is affected by personal and project-specific factors, such as the perceived level of credit for doing documentation work versus more `technical' tasks like adding new features or fixing bugs. In studying documentation work for data analytics OSS libraries, we gain a new window into the changing practices of data-intensive research, as well as help practitioners better understand how to support this often invisible and infrastructural work in their projects.

Download Full-text