distributed data analysis Latest Research Papers

The COVID-19 pandemic is the first global health crisis to occur in the age of big genomic data. Although data generation capacity is well established and sufficiently standardized, analytical capacity is not. To establish analytical capacity it is necessary to pull together global computational resources and deliver the best open source tools and analysis workflows within a ready to use, universally accessible resource. Such a resource should not be controlled by a single research group, institution, or country. Instead it should be maintained by a community of users and developers who ensure that the system remains operational and populated with current tools. A community is also essential for facilitating the types of discourse needed to establish best analytical practices. Bringing together public computational research infrastructure from the USA, Europe, and Australia, we developed a distributed data analysis platform that accomplishes these goals. It is immediately accessible to anyone in the world and is designed for the analysis of rapidly growing collections of deep sequencing datasets. We demonstrate its utility by detecting allelic variants in high-quality existing SARS-CoV-2 sequencing datasets and by continuous reanalysis of COG-UK data. All workflows, data, and documentation is available at https://covid19.galaxyproject.org.

Download Full-text

A new distributed data analysis framework for better scientific collaborations

10.5194/egusphere-egu21-1614 ◽

2021 ◽

Author(s):

Philipp S. Sommer ◽

Viktoria Wichert ◽

Daniel Eggert ◽

Tilman Dinter ◽

Klaus Getzlaff ◽

...

Keyword(s):

Application Programming Interface ◽

Data Access ◽

Distributed Data ◽

Data Set ◽

Remote Server ◽

Institutional Infrastructure ◽

Application Programming ◽

Distributed Data Analysis ◽

And Storage ◽

Research Centres

A common challenge for projects with multiple involved research institutes is a well-defined and productive collaboration. All parties measure and analyze different aspects, depend on each other, share common methods, and exchange the latest results, findings, and data. Today this exchange is often impeded by a lack of ready access to shared computing and storage resources. In our talk, we present a new and innovative remote procedure call (RPC) framework. We focus on a distributed setup, where project partners do not necessarily work at the same institute, and do not have access to each others resources.We present the prototype of an application programming interface (API) developed in Python that enables scientists to collaboratively explore and analyze sets of distributed data. It offers the functionality to request remote data through a comfortable interface, and to share and invoke single computational methods or even entire analytical workflows and their results. The prototype enables researchers to make their methods accessible as a backend module running on their own infrastructure. Hence researchers from other institutes may apply the available methods through a lightweight python or Javascript API. This API transforms standard python calls into requests to the backend process on the remote server. In the end, the overhead for both, the backend developer and the remote user, is very low. The effort of implementing the necessary workflow and API usage equalizes the writing of code in a non-distributed setup. Besides that, data do not have to be downloaded locally, the analysis can be executed &#8220;close to the data&#8221; while using the institutional infrastructure where the eligible data set is stored.With our prototype, we demonstrate distributed data access and analysis workflows across institutional borders to enable effective scientific collaboration, thus deepening our understanding of the Earth system.This framework has been developed in a joint effort of the DataHub and Digitial Earth initiatives within the Research Centers of the Helmholtz-Gemeinschaft Deutscher Forschungszentren e.V.&#160; (Helmholtz Association of German Research Centres, HGF).

Download Full-text

Agent-based Distributed Data Analysis in Industrial Cyber-Physical Systems

IEEE Journal of Emerging and Selected Topics in Industrial Electronics ◽

10.1109/jestie.2021.3100775 ◽

2021 ◽

pp. 1-1

Author(s):

Jonas Queiroz ◽

Paulo Leitao ◽

Jose Barbosa ◽

Eugenio Oliveira ◽

Gisela Garcia

Keyword(s):

Data Analysis ◽

Cyber Physical Systems ◽

Distributed Data ◽

Agent Based ◽

Physical Systems ◽

Distributed Data Analysis

Download Full-text

Comparing Thread Migration, Mobile Agents, and ABM Simulators in Distributed Data Analysis

10.1007/978-3-030-85739-4_27 ◽

2021 ◽

pp. 328-340

Author(s):

Maxwell Wenger ◽

Jonathan Acoltzi ◽

Munehiro Fukuda

Keyword(s):

Data Analysis ◽

Mobile Agents ◽

Distributed Data ◽

Thread Migration ◽

Distributed Data Analysis

Download Full-text

Improvement of substation Monitoring aimed to improve its efficiency with the help of Big Data Analysis**

Journal of Intelligent Systems ◽

10.1515/jisys-2020-0083 ◽

2021 ◽

Vol 30 (1) ◽

pp. 499-510

Author(s):

Ruiling Yu ◽

Mohammad Asif Ikbal ◽

Abdul Rahman

Keyword(s):

Big Data ◽

Data Analysis ◽

Big Data Analysis ◽

Point Of View ◽

Utilization Rate ◽

Distributed Data ◽

Online Analysis ◽

Analysis Techniques ◽

Typical Data ◽

Distributed Data Analysis

Abstract Data analysis has become most widespread field of research and it has extended in almost every field of study. Considering the recent trends and developments in the field of communication and information technology, there is a scope of combining the monitoring of substation equipment with big data analysis technology. That will result in an improved data analysis ability, information sharing and utilization rate of monitoring data. In the proposed work, the authors have introduced the big data analysis and its corresponding application in the monitoring of substations. Basic concepts and the procedures of the typical data analysis for general problems are also discussed. As a main part of the paper, different types of distributed data analysis techniques have been proposed, in which two relational online analysis, namely Hive and Impala and one H Base multidimensional online analysis are important. These data analysis techniques are proposed considering the analysis efficiency, storage performance from the business development requirements point of view of the substation. The result obtained depicts that the proposed model has an advantage in storage overhead and roll-up performance, when compared with the traditional method, although the data loading speed is approximately 1.7–1.9 times of the traditional model. Some experiments are carried out in order to verify the validity of the model.

Download Full-text

WeightGrad: Geo-Distributed Data Analysis Using Quantization for Faster Convergence and Better Accuracy

Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining ◽

10.1145/3394486.3403097 ◽

2020 ◽

Author(s):

Syeda Nahida Akter ◽

Muhammad Abdullah Adnan

Keyword(s):

Data Analysis ◽

Distributed Data ◽

Distributed Data Analysis

Download Full-text

An Agent-Based Computational Framework for Distributed Data Analysis

Computer ◽

10.1109/mc.2019.2932964 ◽

2020 ◽

Vol 53 (3) ◽

pp. 16-25

Author(s):

Munehiro Fukuda ◽

Collin Gordon ◽

Utku Mert ◽

Matthew Sell

Keyword(s):

Data Analysis ◽

Distributed Data ◽

Computational Framework ◽

Agent Based ◽

Distributed Data Analysis

Download Full-text

distributed data analysis
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

A Visual Analytics Framework for Distributed Data Analysis Systems

Protocol for the Interaction of Agents Solving the Problem of Distributed Data Analysis

AN ARCHITECTURE FOR DISTRIBUTED DATA ANALYSIS PROBLEM SOLVING IN NEUROPHYSIOLOGY

Freely accessible ready to use global infrastructure for SARS-CoV-2 monitoring

A new distributed data analysis framework for better scientific collaborations

Agent-based Distributed Data Analysis in Industrial Cyber-Physical Systems

Comparing Thread Migration, Mobile Agents, and ABM Simulators in Distributed Data Analysis

Improvement of substation Monitoring aimed to improve its efficiency with the help of Big Data Analysis**

WeightGrad: Geo-Distributed Data Analysis Using Quantization for Faster Convergence and Better Accuracy

An Agent-Based Computational Framework for Distributed Data Analysis

Export Citation Format

distributed data analysisRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

A Visual Analytics Framework for Distributed Data Analysis Systems

Protocol for the Interaction of Agents Solving the Problem of Distributed Data Analysis

AN ARCHITECTURE FOR DISTRIBUTED DATA ANALYSIS PROBLEM SOLVING IN NEUROPHYSIOLOGY

Freely accessible ready to use global infrastructure for SARS-CoV-2 monitoring

A new distributed data analysis framework for better scientific collaborations

Agent-based Distributed Data Analysis in Industrial Cyber-Physical Systems

Comparing Thread Migration, Mobile Agents, and ABM Simulators in Distributed Data Analysis

Improvement of substation Monitoring aimed to improve its efficiency with the help of Big Data Analysis**

WeightGrad: Geo-Distributed Data Analysis Using Quantization for Faster Convergence and Better Accuracy

An Agent-Based Computational Framework for Distributed Data Analysis

distributed data analysis
Recently Published Documents