Tools for the Storage and Analysis of Spatial Big Data

Proccedings of 10th International Conference "Environmental Engineering" ◽

10.3846/enviro.2017.216 ◽

2017 ◽

Author(s):

Przemysław Lisowski ◽

Adam Piórkowski ◽

Andrzej Lesniak

Keyword(s):

Big Data ◽

Spatial Data ◽

File Systems ◽

Large Datasets ◽

Distributed File Systems ◽

Data Systems ◽

Data Production ◽

Spatial Big Data ◽

Big Data Systems ◽

Access To Data

Storing large amounts of spatial data in GIS systems is problematic. This problem is growing due to ever- increasing data production from a variety of data sources. The phenomenon of collecting huge amounts of data is called Big Data. Existing solutions are capable of processing and storing large volumes of spatial data. These solutions also show new approaches to data processing. Conventional techniques work with ordinary data but are not suitable for large datasets. Their efficient action is possible only when connected to distributed file systems and algorithms able to reduce tasks. This review focuses on the characteristics of large spatial data and discusses opportunities offered by spatial big data systems. The work also draws attention to the problems of indexing and access to data, and proposed solutions in this area.

Download Full-text

Modeling of distributed file System in big data storage by event- B

MATEC Web of Conferences ◽

10.1051/matecconf/201821004042 ◽

2018 ◽

Vol 210 ◽

pp. 04042

Author(s):

Ammar Alhaj Ali ◽

Pavel Varacha ◽

Said Krayem ◽

Roman Jasek ◽

Petr Zacek ◽

...

Keyword(s):

Big Data ◽

Data Storage ◽

High Performance ◽

File System ◽

Formal Method ◽

File Systems ◽

Distributed File System ◽

Distributed File Systems ◽

Data Systems ◽

Big Data Systems

Nowadays, a wide set of systems and application, especially in high performance computing, depends on distributed environments to process and analyses huge amounts of data. As we know, the amount of data increases enormously, and the goal to provide and develop efficient, scalable and reliable storage solutions has become one of the major issue for scientific computing. The storage solution used by big data systems is Distributed File Systems (DFSs), where DFS is used to build a hierarchical and unified view of multiple file servers and shares on the network. In this paper we will offer Hadoop Distributed File System (HDFS) as DFS in big data systems and we will present an Event-B as formal method that can be used in modeling, where Event-B is a mature formal method which has been widely used in a number of industry projects in a number of domains, such as automotive, transportation, space, business information, medical device and so on, And will propose using the Rodin as modeling tool for Event-B, which integrates modeling and proving as well as the Rodin platform is open source, so it supports a large number of plug-in tools.

Download Full-text

QOS MANAGEMENT IN REAL-TIME SPATIAL BIG DATA USING FEEDBACK CONTROL SCHEDULING

ISPRS Annals of Photogrammetry Remote Sensing and Spatial Information Sciences ◽

10.5194/isprsannals-ii-3-w5-243-2015 ◽

2015 ◽

Vol II-3/W5 ◽

pp. 243-248 ◽

Cited By ~ 2

Author(s):

S. Hamdi ◽

E. Bouazizi ◽

S. Faiz

Keyword(s):

Big Data ◽

Feedback Control ◽

Real Time ◽

Spatial Data ◽

Large Scale ◽

The Real ◽

Remote Sensors ◽

Spatial Big Data ◽

Active Research ◽

Access To Data

Geographic Information System (GIS) is a computer system designed to capture, store, manipulate, analyze, manage, and present all types of spatial data. Spatial data, whether captured through remote sensors or large scale simulations has always been big and heterogenous. The issue of real-time and heterogeneity have been extremely important for taking effective decision. Thus, heterogeneous real-time spatial data management has become a very active research domain. Existing research has principally focused on querying of real-time spatial data and their updates. But the unpredictability of access to data maintain the behavior of the real-time GIS unstable. In this paper, we propose the use of the real-time Spatial Big Data and we define a new architecture called FCSA-RTSBD (Feedback Control Scheduling Architecture for Real-Time Spatial Big Data). The main objectives of this architecture are the following: take in account the heterogeneity of data, guarantee the data freshness, enhance the deadline miss ratio even in the presence of conflicts and unpredictable workloads and finally satisfy the requirements of users by the improving of the quality of service (QoS).

Download Full-text

The Need to Consider Hardware Selection when Designing Big Data Applications Supported by Metadata

Big Data Management, Technologies, and Applications - Advances in Data Mining and Database Management ◽

10.4018/978-1-4666-4699-5.ch015 ◽

2013 ◽

pp. 381-396 ◽

Cited By ~ 2

Author(s):

Nathan Regola ◽

David A. Cieslak ◽

Nitesh V. Chawla

Keyword(s):

Cloud Computing ◽

Big Data ◽

Large Volume ◽

Virtual Machines ◽

Large Datasets ◽

Data Systems ◽

Big Data Applications ◽

Component Systems ◽

Big Data Systems ◽

Selection Of

The selection of hardware to support big data systems is complex. Even defining the term “big data” is difficult. “Big data” can mean a large volume of data in a database, a MapReduce cluster that processes data, analytics and reporting applications that must access large datasets to operate, algorithms that can effectively operate on large datasets, or even basic scripts that produce a needed resulted by leveraging data. Big data systems can be composed of many component systems. For these reasons, it appears difficult to create a universal, representative benchmark that approximates a “big data” workload. Along with the trend to utilize large datasets and sophisticated tools to analyze data, the trend of cloud computing has emerged as an effective method of leasing compute time. This chapter explores some of the issues at the intersection of virtualized computing (since cloud computing often uses virtual machines), metadata stores, and big data. Metadata is important because it enables many applications and users to access datasets and effectively use them without relying on extensive knowledge from humans about the data.

Download Full-text