A new data-intensive task scheduling in optorsim, an open source grid simulator

Author(s):  
Mahshid Helali Moghadam ◽  
Seyyed Morteza Babamir

Cloud computing is being heavily used for implementing different kinds of applications. Many of the client applications are being migrated to cloud for the reasons of cost and elasticity. Cloud computing is generally implemented on distributing computing wherein the Physical servers are heavily distributed considering both hardware and software, the connectivity among which is established through Internet. The cloud computing systems as such have many physical servers which contain many resources. The resources can be made to be shared among many users who are the tenants to the cloud computing system. The resources can be virtualized so as to provide shared resources to the clients. Scheduling is one of the most important task of a cloud computing system which is concerned with task scheduling, resource scheduling and scheduling Virtual Machin Migration. It is important to understand the issue of scheduling within a cloud computing system more in-depth so that any improvements with reference to scheduling can be investigated and implemented. For carrying in depth research, an OPEN source based cloud computing system is needed. OPEN STACK is one such OPEN source based cloud computing system that can be considered for experimenting the research findings that are related to cloud computing system. In this paper an overview on the way the Scheduling aspect per say has been implemented within OPEN STACK cloud computing system


Author(s):  
Ganesh Chandra Deka

NoSQL databases are designed to meet the huge data storage requirements of cloud computing and big data processing. NoSQL databases have lots of advanced features in addition to the conventional RDBMS features. Hence, the “NoSQL” databases are popularly known as “Not only SQL” databases. A variety of NoSQL databases having different features to deal with exponentially growing data-intensive applications are available with open source and proprietary option. This chapter discusses some of the popular NoSQL databases and their features on the light of CAP theorem.


2020 ◽  
Author(s):  
Chunlin Li ◽  
Yihan Zhang ◽  
Xiaomei Qu ◽  
Youlong Luo

Abstract In recent years, with the continuous development of internet of things and cloud computing technologies, data intensive applications have gotten more and more attention. In the distributed cloud environment, the access of massive data is often the bottleneck of its performance. It is very significant to propose a suitable data deployment algorithm for improving the utilization of cloud server and the efficiency of task scheduling. In order to reduce data access cost and data deployment time, an optimal data deployment algorithm is proposed in this paper. By modeling and analyzing the data deployment problem, the problem is solved by using the improved genetic algorithm. After the data are well deployed, aiming at improving the efficiency of task scheduling, a task progress aware scheduling algorithm is proposed in this paper in order to make the speculative execution mechanism more accurate. Firstly, the threshold to detect the slow tasks and fast nodes are set. Then, the slow tasks and fast nodes are detected by calculating the remaining time of the tasks and the real-time processing ability of the nodes, respectively. Finally, the backup execution of the slow tasks is performed on the fast nodes. While satisfying the load balancing of the system, the experimental results show that the proposed algorithms can obviously reduce data access cost, service-level agreement (SLA) default rate and the execution time of the system and optimize data deployment for improving scheduling efficiency in distributed clouds.


2018 ◽  
Author(s):  
R. Stuart Geiger ◽  
Nelle Varoquaux ◽  
Charlotte Cabasse ◽  
Christopher Holdgraf

Computational research and data analytics increasingly relies on complex ecosystems of open source software (OSS) "libraries" -- curated collections of reusable code that programmers import to perform a specific task. Software documentation for these libraries is crucial in helping programmers/analysts know what libraries are available and how to use them. Yet documentation for open source software libraries is widely considered low-quality. This article is a collaboration between CSCW researchers and contributors to data analytics OSS libraries, based on ethnographic fieldwork and qualitative interviews. We examine several issues around the formats, practices, and challenges around documentation in these largely volunteer-based projects. There are many different kinds and formats of documentation that exist around such libraries, which play a variety of educational, promotional, and organizational roles. The work behind documentation is similarly multifaceted, including writing, reviewing, maintaining, and organizing documentation. Different aspects of documentation work require contributors to have different sets of skills and overcome various social and technical barriers. Finally, most of our interviewees do not report high levels of intrinsic enjoyment for doing documentation work (compared to writing code). Their motivation is affected by personal and project-specific factors, such as the perceived level of credit for doing documentation work versus more `technical' tasks like adding new features or fixing bugs. In studying documentation work for data analytics OSS libraries, we gain a new window into the changing practices of data-intensive research, as well as help practitioners better understand how to support this often invisible and infrastructural work in their projects.


Sign in / Sign up

Export Citation Format

Share Document