Big Data access and infrastructure for modern biology: case studies in data repository utility

2016 ◽  
Vol 1387 (1) ◽  
pp. 112-123 ◽  
Author(s):  
Nathan C. Boles ◽  
Tyler Stone ◽  
Charles Bergeron ◽  
Thomas R. Kiehl
2019 ◽  
Vol 6 (1) ◽  
Author(s):  
Mahdi Torabzadehkashi ◽  
Siavash Rezaei ◽  
Ali HeydariGorji ◽  
Hosein Bobarshad ◽  
Vladimir Alves ◽  
...  

AbstractIn the era of big data applications, the demand for more sophisticated data centers and high-performance data processing mechanisms is increasing drastically. Data are originally stored in storage systems. To process data, application servers need to fetch them from storage devices, which imposes the cost of moving data to the system. This cost has a direct relation with the distance of processing engines from the data. This is the key motivation for the emergence of distributed processing platforms such as Hadoop, which move process closer to data. Computational storage devices (CSDs) push the “move process to data” paradigm to its ultimate boundaries by deploying embedded processing engines inside storage devices to process data. In this paper, we introduce Catalina, an efficient and flexible computational storage platform, that provides a seamless environment to process data in-place. Catalina is the first CSD equipped with a dedicated application processor running a full-fledged operating system that provides filesystem-level data access for the applications. Thus, a vast spectrum of applications can be ported for running on Catalina CSDs. Due to these unique features, to the best of our knowledge, Catalina CSD is the only in-storage processing platform that can be seamlessly deployed in clusters to run distributed applications such as Hadoop MapReduce and HPC applications in-place without any modifications on the underlying distributed processing framework. For the proof of concept, we build a fully functional Catalina prototype and a CSD-equipped platform using 16 Catalina CSDs to run Intel HiBench Hadoop and HPC benchmarks to investigate the benefits of deploying Catalina CSDs in the distributed processing environments. The experimental results show up to 2.2× improvement in performance and 4.3× reduction in energy consumption, respectively, for running Hadoop MapReduce benchmarks. Additionally, thanks to the Neon SIMD engines, the performance and energy efficiency of DFT algorithms are improved up to 5.4× and 8.9×, respectively.


2021 ◽  

The use of big data is becoming increasingly important across the tourism sector and the value chain. With this publication, UNWTO intends to provide a baseline research on using big data by tourism and culture stakeholders, in order to improve the competitiveness of cultural tourism and reinforce its sustainability. The study sets the basis to connect tourism, culture and new technologies for mutual benefits, while calling for a reflection on the ethical implications for policymakers, businesses and end-users. The selection of case studies illustrates the most frequent case-scenarios of the use of big data in cultural tourism within destinations, compiled during the research. As the new technologies are facing ever-evolving scenarios, their use will be harnessed by the tourism sector in its endeavour to innovate and provide new cultural experiences.


2016 ◽  
Vol 2016 ◽  
pp. 1-10
Author(s):  
Ningyu Zhang ◽  
Huajun Chen ◽  
Xi Chen ◽  
Jiaoyan Chen

In the latest years, the rapid progress of urban computing has engendered big issues, which creates both opportunities and challenges. The heterogeneous and big volume of data and the big difference between physical and virtual worlds have resulted in lots of problems in quickly solving practical problems in urban computing. In this paper, we propose a general application framework of ELM for urban computing. We present several real case studies of the framework like smog-related health hazard prediction and optimal retain store placement. Experiments involving urban data in China show the efficiency, accuracy, and flexibility of our proposed framework.


2018 ◽  
Vol 1 ◽  
pp. 1-5
Author(s):  
David Fairbairn

The use of maps and other geovisualisation methods has been longstanding in archaeology. Archaeologists employ advanced contemporary tools in their data collection, analysis and presentation. Maps can be used to render the ‘big data’ commonly collected by archaeological prospection techniques, but are also fundamental output instru-ments for the dissemination of archaeological interpretation and modelling. This paper addresses, through case studies, alternate methods of geovisualisation in archaeology and identifies the efficiencies of each.


Author(s):  
Vinay Kumar ◽  
Arpana Chaturvedi

<div><p><em>With the advent of Social Networking Sites (SNS), volumes of data are generated daily. Most of these data are multimedia type and unstructured with exponential growth. This exponential growth of variety, volume and complexity of structured and unstructured data leads to the concept of big data. Managing big data and harnessing its benefits is a real challenge. With increase in access to big data repository for various applications, security and access control is another aspect that needs to be considered while managing big data. We have discussed area of application of big data, opportunities it provides and challenges that we face in the managing such huge amount of data for various applications. Issues related to security against different threat perception of big data are also discussed. </em></p></div>


2020 ◽  
Vol 1 ◽  
pp. 1-23
Author(s):  
Majid Hojati ◽  
Colin Robertson

Abstract. With new forms of digital spatial data driving new applications for monitoring and understanding environmental change, there are growing demands on traditional GIS tools for spatial data storage, management and processing. Discrete Global Grid System (DGGS) are methods to tessellate globe into multiresolution grids, which represent a global spatial fabric capable of storing heterogeneous spatial data, and improved performance in data access, retrieval, and analysis. While DGGS-based GIS may hold potential for next-generation big data GIS platforms, few of studies have tried to implement them as a framework for operational spatial analysis. Cellular Automata (CA) is a classic dynamic modeling framework which has been used with traditional raster data model for various environmental modeling such as wildfire modeling, urban expansion modeling and so on. The main objectives of this paper are to (i) investigate the possibility of using DGGS for running dynamic spatial analysis, (ii) evaluate CA as a generic data model for dynamic phenomena modeling within a DGGS data model and (iii) evaluate an in-database approach for CA modelling. To do so, a case study into wildfire spread modelling is developed. Results demonstrate that using a DGGS data model not only provides the ability to integrate different data sources, but also provides a framework to do spatial analysis without using geometry-based analysis. This results in a simplified architecture and common spatial fabric to support development of a wide array of spatial algorithms. While considerable work remains to be done, CA modelling within a DGGS-based GIS is a robust and flexible modelling framework for big-data GIS analysis in an environmental monitoring context.


Sign in / Sign up

Export Citation Format

Share Document