distributed data analysis
Recently Published Documents


TOTAL DOCUMENTS

42
(FIVE YEARS 8)

H-INDEX

4
(FIVE YEARS 0)

Author(s):  
Abdullah-Al-Raihan Nayeem ◽  
Mohammed Elshambakey ◽  
Todd Dobbs ◽  
Huikyo Lee ◽  
Daniel Crichton ◽  
...  

2021 ◽  
Author(s):  
Wolfgang Maier ◽  
Simon Bray ◽  
Marius van den Beek ◽  
Dave Bouvier ◽  
Nathaniel Coraor ◽  
...  

The COVID-19 pandemic is the first global health crisis to occur in the age of big genomic data. Although data generation capacity is well established and sufficiently standardized, analytical capacity is not. To establish analytical capacity it is necessary to pull together global computational resources and deliver the best open source tools and analysis workflows within a ready to use, universally accessible resource. Such a resource should not be controlled by a single research group, institution, or country. Instead it should be maintained by a community of users and developers who ensure that the system remains operational and populated with current tools. A community is also essential for facilitating the types of discourse needed to establish best analytical practices. Bringing together public computational research infrastructure from the USA, Europe, and Australia, we developed a distributed data analysis platform that accomplishes these goals. It is immediately accessible to anyone in the world and is designed for the analysis of rapidly growing collections of deep sequencing datasets. We demonstrate its utility by detecting allelic variants in high-quality existing SARS-CoV-2 sequencing datasets and by continuous reanalysis of COG-UK data. All workflows, data, and documentation is available at https://covid19.galaxyproject.org.


2021 ◽  
Author(s):  
Philipp S. Sommer ◽  
Viktoria Wichert ◽  
Daniel Eggert ◽  
Tilman Dinter ◽  
Klaus Getzlaff ◽  
...  

<p>A common challenge for projects with multiple involved research institutes is a well-defined and productive collaboration. All parties measure and analyze different aspects, depend on each other, share common methods, and exchange the latest results, findings, and data. Today this exchange is often impeded by a lack of ready access to shared computing and storage resources. In our talk, we present a new and innovative remote procedure call (RPC) framework. We focus on a distributed setup, where project partners do not necessarily work at the same institute, and do not have access to each others resources.</p><p>We present the prototype of an application programming interface (API) developed in Python that enables scientists to collaboratively explore and analyze sets of distributed data. It offers the functionality to request remote data through a comfortable interface, and to share and invoke single computational methods or even entire analytical workflows and their results. The prototype enables researchers to make their methods accessible as a backend module running on their own <span>infrastructure</span>. Hence researchers from other institutes may apply the available methods through a lightweight python or Javascript API. This API transforms standard python calls into requests to the backend process on the remote server. In the end, the overhead for both, the backend developer and the remote user, is very low. The effort of implementing the necessary workflow and API usage equalizes the writing of code in a non-distributed setup. Besides that, data do not have to be downloaded locally, the analysis can be executed “close to the data” while using the institutional infrastructure where the eligible data set is stored.</p><p>With our prototype, we demonstrate distributed data access and analysis workflows across institutional borders to enable effective scientific collaboration, thus deepening our understanding of the Earth system.</p><p>This framework has been developed in a joint effort of the DataHub and Digitial Earth initiatives within the Research Centers of the Helmholtz-Gemeinschaft Deutscher Forschungszentren e.V.  (Helmholtz Association of German Research Centres, HGF).</p>


2021 ◽  
Vol 30 (1) ◽  
pp. 499-510
Author(s):  
Ruiling Yu ◽  
Mohammad Asif Ikbal ◽  
Abdul Rahman

Abstract Data analysis has become most widespread field of research and it has extended in almost every field of study. Considering the recent trends and developments in the field of communication and information technology, there is a scope of combining the monitoring of substation equipment with big data analysis technology. That will result in an improved data analysis ability, information sharing and utilization rate of monitoring data. In the proposed work, the authors have introduced the big data analysis and its corresponding application in the monitoring of substations. Basic concepts and the procedures of the typical data analysis for general problems are also discussed. As a main part of the paper, different types of distributed data analysis techniques have been proposed, in which two relational online analysis, namely Hive and Impala and one H Base multidimensional online analysis are important. These data analysis techniques are proposed considering the analysis efficiency, storage performance from the business development requirements point of view of the substation. The result obtained depicts that the proposed model has an advantage in storage overhead and roll-up performance, when compared with the traditional method, although the data loading speed is approximately 1.7–1.9 times of the traditional model. Some experiments are carried out in order to verify the validity of the model.


Computer ◽  
2020 ◽  
Vol 53 (3) ◽  
pp. 16-25
Author(s):  
Munehiro Fukuda ◽  
Collin Gordon ◽  
Utku Mert ◽  
Matthew Sell

Sign in / Sign up

Export Citation Format

Share Document