scholarly journals Phoenix: Making Data-Intensive Grid Applications Fault-Tolerant

Author(s):  
G. Kola ◽  
T. Kosar ◽  
M. Livny
2000 ◽  
Vol 16 (5) ◽  
pp. 473-481 ◽  
Author(s):  
Brian Tierney ◽  
William Johnston ◽  
Jason Lee ◽  
Mary Thompson

2009 ◽  
Author(s):  
Min Zhu ◽  
Shilin Xiao ◽  
Wei Guo ◽  
Anne Wei ◽  
Yaohui Jin ◽  
...  

Sensors ◽  
2021 ◽  
Vol 21 (21) ◽  
pp. 7238
Author(s):  
Zulfiqar Ahmad ◽  
Ali Imran Jehangiri ◽  
Mohammed Alaa Ala’anzy ◽  
Mohamed Othman ◽  
Arif Iqbal Umar

Cloud computing is a fully fledged, matured and flexible computing paradigm that provides services to scientific and business applications in a subscription-based environment. Scientific applications such as Montage and CyberShake are organized scientific workflows with data and compute-intensive tasks and also have some special characteristics. These characteristics include the tasks of scientific workflows that are executed in terms of integration, disintegration, pipeline, and parallelism, and thus require special attention to task management and data-oriented resource scheduling and management. The tasks executed during pipeline are considered as bottleneck executions, the failure of which result in the wholly futile execution, which requires a fault-tolerant-aware execution. The tasks executed during parallelism require similar instances of cloud resources, and thus, cluster-based execution may upgrade the system performance in terms of make-span and execution cost. Therefore, this research work presents a cluster-based, fault-tolerant and data-intensive (CFD) scheduling for scientific applications in cloud environments. The CFD strategy addresses the data intensiveness of tasks of scientific workflows with cluster-based, fault-tolerant mechanisms. The Montage scientific workflow is considered as a simulation and the results of the CFD strategy were compared with three well-known heuristic scheduling policies: (a) MCT, (b) Max-min, and (c) Min-min. The simulation results showed that the CFD strategy reduced the make-span by 14.28%, 20.37%, and 11.77%, respectively, as compared with the existing three policies. Similarly, the CFD reduces the execution cost by 1.27%, 5.3%, and 2.21%, respectively, as compared with the existing three policies. In case of the CFD strategy, the SLA is not violated with regard to time and cost constraints, whereas it is violated by the existing policies numerous times.


In the last decades, and due to emergence of Internet appliance, there is a strategical increase in the usage of data which had a high impact on the storage and mining technologies. It is also observed that the scientific/research field’s produces the zig-zag structure of data viz., structured, semi-structured, and unstructured data. Comparably, processing of such data is relatively increased due to rugged requirements. There are sustainable technologies to address the challenges and to expedite scalable services via effective physical infrastructure (in terms of mining), smart networking solutions, and useful software approaches. Indeed, the Cloud computing aims at data-intensive computing, by facilitating scalable processing of huge data. But still, the problem remains unaddressed with reference to huge data and conversely the data is growing exponentially faster. At this juncture, the recommendable algorithm is, the well-known model i.e., MapReduce, to compress the huge and voluminous data. Conceptualization of any problem with the current model is, less fault-tolerant and reliability, which may be surmounted by Hadoop architecture. On Contrary case, Hadoop is fault tolerant, and has the high throughput which is recommendable for applications having huge volume of data sets, file system requiring the streaming access. The paper examines and unravels, what efficient architectural/design changes are necessary to bring the benefits of the Everest model, HBase algorithm, and the existing MR algorithms.


Sign in / Sign up

Export Citation Format

Share Document