An agenda for research in large-scale distributed data repositories

CHORD: Distributed Data-sharing via Hybrid ROS 1 and 2 for Multi-robot Exploration of Large-scale Complex Environments

IEEE Robotics and Automation Letters ◽

10.1109/lra.2021.3061393 ◽

2021 ◽

pp. 1-1

Author(s):

Muhammad Fadhil Ginting ◽

Kyohei Otsu ◽

Jeffrey Edlund ◽

Jay Gao ◽

Ali-akbar Agha-mohammadi

Keyword(s):

Data Sharing ◽

Large Scale ◽

Distributed Data ◽

Complex Environments ◽

Multi Robot

Download Full-text

Distributed Data Processing for Large-Scale Simulations on Cloud

10.1109/emc/si/pi/emceurope52599.2021.9559316 ◽

2021 ◽

Author(s):

Tianjian Lu ◽

Stephan Hoyer ◽

Qing Wang ◽

Lily Hu ◽

Yi-Fan Chen

Keyword(s):

Data Processing ◽

Large Scale ◽

Distributed Data ◽

Distributed Data Processing ◽

Large Scale Simulations

Download Full-text

The Management Strategy of Metadata in Large-scale Network Storage System

MATEC Web of Conferences ◽

10.1051/matecconf/201822801011 ◽

2018 ◽

Vol 228 ◽

pp. 01011

Author(s):

Haifeng Zhong ◽

Jianying Xiong

Keyword(s):

Large Scale ◽

Storage System ◽

Hash Table ◽

Distributed Hash Table ◽

Distributed Data ◽

Mass Storage ◽

Large Scale Network ◽

Network Storage System ◽

Mass Storage System ◽

Scale Network

The wan Internet storage system based on Distributed Hash Table uses fully distributed data and metadata management, and constructs an extensible and efficient mass storage system for the application based on Internet. However, such systems work in highly dynamic environments, and the frequent entry and exit of nodes will lead to huge communication costs. Therefore, this paper proposes a new hierarchical metadata routing management mechanism based on DHT, which makes full use of the node stabilization point to reduce the maintenance overhead of the overlay. Analysis shows that the algorithm can effectively improve efficiency and enhance stability.

Download Full-text

A distributed data management system to support large-scale data analysis

Journal of Systems and Software ◽

10.1016/j.jss.2018.11.007 ◽

2019 ◽

Vol 148 ◽

pp. 105-115 ◽

Cited By ~ 6

Author(s):

Tamer Z. Emara ◽

Joshua Zhexue Huang

Keyword(s):

Data Analysis ◽

Data Management ◽

Management System ◽

Large Scale ◽

Data Management System ◽

Distributed Data ◽

Distributed Data Management ◽

Large Scale Data ◽

Scale Data

Download Full-text

Conceptual Data Modeling Using Aggregates to Ensure Large-Scale Distributed Data Management Systems Security

Intelligent Distributed Computing XIII - Studies in Computational Intelligence ◽

10.1007/978-3-030-32258-8_5 ◽

2019 ◽

pp. 41-47

Author(s):

Maria A. Poltavtseva ◽

Maxim O. Kalinin

Keyword(s):

Data Management ◽

Large Scale ◽

Data Modeling ◽

Management Systems ◽

Distributed Data ◽

Data Management Systems ◽

Distributed Data Management ◽

Systems Security ◽

Conceptual Data

Download Full-text

A framework for secure and decentralized sharing of medical imaging data via blockchain consensus

Health Informatics Journal ◽

10.1177/1460458218769699 ◽

2018 ◽

Vol 25 (4) ◽

pp. 1398-1411 ◽

Cited By ~ 58

Author(s):

Vishal Patel

Keyword(s):

Medical Imaging ◽

Large Scale ◽

Third Party ◽

Distributed Data ◽

Imaging Data ◽

Privacy And Security ◽

Image Sharing ◽

Security Models ◽

Number Of Factors ◽

Medical Imaging Data

The electronic sharing of medical imaging data is an important element of modern healthcare systems, but current infrastructure for cross-site image transfer depends on trust in third-party intermediaries. In this work, we examine the blockchain concept, which enables parties to establish consensus without relying on a central authority. We develop a framework for cross-domain image sharing that uses a blockchain as a distributed data store to establish a ledger of radiological studies and patient-defined access permissions. The blockchain framework is shown to eliminate third-party access to protected health information, satisfy many criteria of an interoperable health system, and readily generalize to domains beyond medical imaging. Relative drawbacks of the framework include the complexity of the privacy and security models and an unclear regulatory environment. Ultimately, the large-scale feasibility of such an approach remains to be demonstrated and will depend on a number of factors which we discuss in detail.

Download Full-text

Analyzing large-scale Earth Observation data repositories made simple with OpenEO Platform

10.5194/egusphere-egu21-9602 ◽

2021 ◽

Author(s):

Edzer Pebesma ◽

Patrick Griffiths ◽

Christian Briese ◽

Alexander Jacob ◽

Anze Skerlevaj ◽

...

Keyword(s):

Large Scale ◽

Virtual Machines ◽

Google Earth ◽

Earth Observation ◽

Data Cube ◽

Observation Data ◽

Data Repositories ◽

Software Ecosystem ◽

Wide Range ◽

Earth Observation Data

The OpenEO API allows the analysis of large amounts of Earth Observation data using a high-level abstraction of data and processes. Rather than focusing on the management of virtual machines and millions of imagery files, it allows to create jobs that take a spatio-temporal section of an image collection (such as Sentinel L2A), and treat it as a data cube. Processes iterate or aggregate over pixels, spatial areas, spectral bands, or time series, while working at arbitrary spatial resolution. This pattern, pioneered by Google Earth Engine&#8482; (GEE), lets the user focus on the science rather than on data management.The openEO H2020 project (2017-2020) has developed the API as well as an ecosystem of software around it, including clients (JavaScript, Python, R, QGIS, browser-based), back-ends that translate API calls into existing image analysis or GIS software or services (for Sentinel Hub, WCPS, Open Data Cube, GRASS GIS, GeoTrellis/GeoPySpark, and GEE) as well as a hub that allows querying and searching openEO providers for their capabilities and datasets. The project demonstrated this software in a number of use cases, where identical processing instructions were sent to different implementations, allowing comparison of returned results.A follow-up, ESA-funded project &#8220;openEO Platform&#8221; realizes the API and progresses the software ecosystem into operational services and applications that are accessible to everyone, that involve federated deployment (using the clouds managed by EODC, Terrascope, CreoDIAS and EuroDataCube), that will provide payment models (&#8220;pay per compute job&#8221;) conceived and implemented following the user community needs and that will use the EOSC (European Open Science Cloud) marketplace for dissemination and authentication. A wide range of large-scale cases studies will demonstrate the ability of the openEO Platform to scale to large data volumes.&#160; The case studies to be addressed include on-demand ARD generation for SAR and multi-spectral data, agricultural demonstrators like crop type and condition monitoring, forestry services like near real time forest damage assessment as well as canopy cover mapping, environmental hazard monitoring of floods and air pollution as well as security applications in terms of vessel detection in the mediterranean sea.While the landscape of cloud-based EO platforms and services has matured and diversified over the past decade, we believe there are strong advantages for scientists and government agencies to adopt the openEO approach. Beyond the absence of vendor/platform lock-in or EULA&#8217;s we mention the abilities to (i) run arbitrary user code (e.g. written in R or Python) close to the data, (ii) carry out scientific computations on an entirely open source software stack, (iii) integrate different platforms (e.g., different cloud providers offering different datasets), and (iv) help create and extend this software ecosystem. openEO uses the OpenAPI standard, aligns with modern OGC API standards, and uses the STAC (SpatioTemporal Asset Catalog) to describe image collections and image tiles.

Download Full-text

Grid Data Handling

IT Policy and Ethics ◽

10.4018/978-1-4666-2919-6.ch014 ◽

2013 ◽

pp. 294-321

Author(s):

Alexandru Costan

Keyword(s):

Fault Tolerance ◽

Data Storage ◽

Large Scale ◽

File Systems ◽

Future Research ◽

Distributed Data ◽

Data Handling ◽

Grid Data ◽

Distributed Data Storage ◽

Grid Environments

To accommodate the needs of large-scale distributed systems, scalable data storage and management strategies are required, allowing applications to efficiently cope with continuously growing, highly distributed data. This chapter addresses the key issues of data handling in grid environments focusing on storing, accessing, managing and processing data. We start by providing the background for the data storage issue in grid environments. We outline the main challenges addressed by distributed storage systems: high availability which translates into high resilience and consistency, corruption handling regarding arbitrary faults, fault tolerance, asynchrony, fairness, access control and transparency. The core part of the chapter presents how existing solutions cope with these high requirements. The most important research results are organized along several themes: grid data storage, distributed file systems, data transfer and retrieval and data management. Important characteristics such as performance, efficient use of resources, fault tolerance, security, and others are strongly determined by the adopted system architectures and the technologies behind them. For each topic, we shortly present previous work, describe the most recent achievements, highlight their advantages and limitations, and indicate future research trends in distributed data storage and management.

Download Full-text

New Frontiers for E-Learning in Education

Optimizing Student Engagement in Online Learning Environments - Advances in Educational Technologies and Instructional Design ◽

10.4018/978-1-5225-3634-5.ch010 ◽

2018 ◽

pp. 220-240

Author(s):

Mohammad Zubair Khan ◽

Yasser M. Alginahi

Keyword(s):

Big Data ◽

Large Scale ◽

Data Repositories ◽

Useful Knowledge ◽

Leading Role ◽

Large Scale Data ◽

E Learning ◽

Base Management ◽

Wide Group ◽

Scale Data

Big Data research is playing a leading role in investigating a wide group of issues fundamentally emerging concerning Database, Data Warehousing, and Data Mining research. Analytics research is intended to develop complex procedures running over large-scale data repositories with the objective of extracting useful knowledge hidden in such repositories. A standout amongst the most noteworthy application situations where Big Data emerge is, without uncertainty, logical figuring. Here, researchers and analysts create immense measures of information everyday by means of investigations (e.g., disciplines like high vitality material science, space science, bioinformatics, etc.). Nevertheless, separating helpful learning for basic leadership purposes from these enormous, vast scale data repositories are practically inconceivable for genuine Data Base Management Systems (DBMS), is inspired investigation tools.

Download Full-text

Pattern Recognition for Large-Scale Data Processing

Strategic Data-Based Wisdom in the Big Data Era - Advances in Knowledge Acquisition, Transfer, and Management ◽

10.4018/978-1-4666-8122-4.ch011 ◽

2015 ◽

pp. 198-208 ◽

Cited By ~ 2

Author(s):

Amir Basirat ◽

Asad I. Khan ◽

Heinz W. Schmidt

Keyword(s):

Large Scale ◽

Distributed Processing ◽

Data Sets ◽

Distributed Data ◽

Time Data ◽

Deterministic Learning ◽

Large Scale Data ◽

Future Data ◽

Large Scale Data Processing ◽

Learning Schemes

One of the main challenges for large-scale computer clouds dealing with massive real-time data is in coping with the rate at which unprocessed data is being accumulated. Transforming big data into valuable information requires a fundamental re-think of the way in which future data management models will need to be developed on the Internet. Unlike the existing relational schemes, pattern-matching approaches can analyze data in similar ways to which our brain links information. Such interactions when implemented in voluminous data clouds can assist in finding overarching relations in complex and highly distributed data sets. In this chapter, a different perspective of data recognition is considered. Rather than looking at conventional approaches, such as statistical computations and deterministic learning schemes, this chapter focuses on distributed processing approach for scalable data recognition and processing.

Download Full-text