distributed computing infrastructures
Recently Published Documents


TOTAL DOCUMENTS

41
(FIVE YEARS 1)

H-INDEX

8
(FIVE YEARS 0)

2021 ◽  
Vol 36 (10) ◽  
pp. 2150070
Author(s):  
Maria Grigorieva ◽  
Dmitry Grin

Large-scale distributed computing infrastructures ensure the operation and maintenance of scientific experiments at the LHC: more than 160 computing centers all over the world execute tens of millions of computing jobs per day. ATLAS — the largest experiment at the LHC — creates an enormous flow of data which has to be recorded and analyzed by a complex heterogeneous and distributed computing environment. Statistically, about 10–12% of computing jobs end with a failure: network faults, service failures, authorization failures, and other error conditions trigger error messages which provide detailed information about the issue, which can be used for diagnosis and proactive fault handling. However, this analysis is complicated by the sheer scale of textual log data, and often exacerbated by the lack of a well-defined structure: human experts have to interpret the detected messages and create parsing rules manually, which is time-consuming and does not allow identifying previously unknown error conditions without further human intervention. This paper is dedicated to the description of a pipeline of methods for the unsupervised clustering of multi-source error messages. The pipeline is data-driven, based on machine learning algorithms, and executed fully automatically, allowing categorizing error messages according to textual patterns and meaning.


GigaScience ◽  
2020 ◽  
Vol 9 (11) ◽  
Author(s):  
Miroslav Kratochvíl ◽  
Oliver Hunewald ◽  
Laurent Heirendt ◽  
Vasco Verissimo ◽  
Jiří Vondrášek ◽  
...  

Abstract Background The amount of data generated in large clinical and phenotyping studies that use single-cell cytometry is constantly growing. Recent technological advances allow the easy generation of data with hundreds of millions of single-cell data points with >40 parameters, originating from thousands of individual samples. The analysis of that amount of high-dimensional data becomes demanding in both hardware and software of high-performance computational resources. Current software tools often do not scale to the datasets of such size; users are thus forced to downsample the data to bearable sizes, in turn losing accuracy and ability to detect many underlying complex phenomena. Results We present GigaSOM.jl, a fast and scalable implementation of clustering and dimensionality reduction for flow and mass cytometry data. The implementation of GigaSOM.jl in the high-level and high-performance programming language Julia makes it accessible to the scientific community and allows for efficient handling and processing of datasets with billions of data points using distributed computing infrastructures. We describe the design of GigaSOM.jl, measure its performance and horizontal scaling capability, and showcase the functionality on a large dataset from a recent study. Conclusions GigaSOM.jl facilitates the use of commonly available high-performance computing resources to process the largest available datasets within minutes, while producing results of the same quality as the current state-of-art software. Measurements indicate that the performance scales to much larger datasets. The example use on the data from a massive mouse phenotyping effort confirms the applicability of GigaSOM.jl to huge-scale studies.


2017 ◽  
Vol 42 (4) ◽  
pp. 299-313 ◽  
Author(s):  
Marcin Adamski ◽  
Krzysztof Kurowski ◽  
Marek Mika ◽  
Wojciech Piątek ◽  
Jan Węglarz

Abstract In many distributed computing systems, aspects related to security are getting more and more relevant. Security is ubiquitous and could not be treated as a separated problem or a challenge. In our opinion it should be considered in the context of resource management in distributed computing environments like Grids and Clouds, e.g. scheduled computations can be much delayed because of cyber-attacks, inefficient infrastructure or users valuable and sensitive data can be stolen even in the process of correct computation. To prevent such cases there is a need to introduce new evaluation metrics for resource management that will represent the level of security of computing resources and more broadly distributed computing infrastructures. In our approach, we have introduced a new metric called reputation, which simply determines the level of reliability of computing resources from the security perspective and could be taken into account during scheduling procedures. The new reputation metric is based on various relevant parameters regarding cyber-attacks (also energy attacks), administrative activities such as security updates, bug fixes and security patches. Moreover, we have conducted various computational experiments within the Grid Scheduling Simulator environment (GSSIM) inspired by real application scenarios. Finally, our experimental studies of new resource management approaches taking into account critical security aspects are also discussed in this paper.


Author(s):  
Rosa Filguiera ◽  
Amrey Krause ◽  
Malcolm Atkinson ◽  
Iraklis Klampanos ◽  
Alexander Moreno

This paper presents dispel4py, a new Python framework for describing abstract stream-based workflows for distributed data-intensive applications. These combine the familiarity of Python programming with the scalability of workflows. Data streaming is used to gain performance, rapid prototyping and applicability to live observations. dispel4py enables scientists to focus on their scientific goals, avoiding distracting details and retaining flexibility over the computing infrastructure they use. The implementation, therefore, has to map dispel4py abstract workflows optimally onto target platforms chosen dynamically. We present four dispel4py mappings: Apache Storm, message-passing interface (MPI), multi-threading and sequential, showing two major benefits: a) smooth transitions from local development on a laptop to scalable execution for production work, and b) scalable enactment on significantly different distributed computing infrastructures. Three application domains are reported and measurements on multiple infrastructures show the optimisations achieved; they have provided demanding real applications and helped us develop effective training. The dispel4py.org is an open-source project to which we invite participation. The effective mapping of dispel4py onto multiple target infrastructures demonstrates exploitation of data-intensive and high-performance computing (HPC) architectures and consistent scalability.


2016 ◽  
Vol 55 ◽  
pp. 428-443 ◽  
Author(s):  
Mircea Moca ◽  
Cristian Litan ◽  
Gheorghe Cosmin Silaghi ◽  
Gilles Fedak

Big Data ◽  
2016 ◽  
pp. 1555-1581
Author(s):  
Gueyoung Jung ◽  
Tridib Mukherjee

In the modern information era, the amount of data has exploded. Current trends further indicate exponential growth of data in the future. This prevalent humungous amount of data—referred to as big data—has given rise to the problem of finding the “needle in the haystack” (i.e., extracting meaningful information from big data). Many researchers and practitioners are focusing on big data analytics to address the problem. One of the major issues in this regard is the computation requirement of big data analytics. In recent years, the proliferation of many loosely coupled distributed computing infrastructures (e.g., modern public, private, and hybrid clouds, high performance computing clusters, and grids) have enabled high computing capability to be offered for large-scale computation. This has allowed the execution of the big data analytics to gather pace in recent years across organizations and enterprises. However, even with the high computing capability, it is a big challenge to efficiently extract valuable information from vast astronomical data. Hence, we require unforeseen scalability of performance to deal with the execution of big data analytics. A big question in this regard is how to maximally leverage the high computing capabilities from the aforementioned loosely coupled distributed infrastructure to ensure fast and accurate execution of big data analytics. In this regard, this chapter focuses on synchronous parallelization of big data analytics over a distributed system environment to optimize performance.


Author(s):  
Sergio Nesmachnow ◽  
Gabriel Usera ◽  
Francisco Brasileiro

This article describes the Digi-Clima Grid project, whose main goals are to design and implement semi-automatic techniques for digitalizing and recovering historical climate records applying parallel computing techniques over distributed computing infrastructures. The specific tool developed for image processing is described, and the implementation over grid and cloud infrastructures is reported. An experimental analysis over institutional and volunteer-based grid/cloud distributed systems demonstrate that the proposed approach is an efficient tool for recovering historical climate data. The parallel implementations allow to distribute the processing load, achieving accurate speedup values.


Sign in / Sign up

Export Citation Format

Share Document