Large-Scale Data Computing Performance Comparisons on SYCL Heterogeneous Parallel Processing Layer Implementations

Woosuk Shin; Kwan-Hee Yoo; Nakhoon Baek

doi:10.3390/app10051656

Large-Scale Data Computing Performance Comparisons on SYCL Heterogeneous Parallel Processing Layer Implementations

Applied Sciences ◽

10.3390/app10051656 ◽

2020 ◽

Vol 10 (5) ◽

pp. 1656

Author(s):

Woosuk Shin ◽

Kwan-Hee Yoo ◽

Nakhoon Baek

Keyword(s):

Big Data ◽

Large Scale ◽

Heterogeneous Computing ◽

Cost Effective ◽

Massively Parallel ◽

Single Source ◽

Parallel Tasks ◽

Big Data Applications ◽

High Level ◽

Mathematical Operations

Today, many big data applications require massively parallel tasks to compute complicated mathematical operations. To perform parallel tasks, platforms like CUDA (Compute Unified Device Architecture) and OpenCL (Open Computing Language) are widely used and developed to enhance the throughput of massively parallel tasks. There is also a need for high-level abstractions and platform-independence over those massively parallel computing platforms. Recently, Khronos group announced SYCL (C++ Single-source Heterogeneous Programming for OpenCL), a new cross-platform abstraction layer, to provide an efficient way for single-source heterogeneous computing, with C++-template-level abstractions. However, since there has been no official implementation of SYCL, we currently have several different implementations from various vendors. In this paper, we analyse the characteristics of those SYCL implementations. We also show performance measures of those SYCL implementations, especially for well-known massively parallel tasks. We show that each implementation has its own strength in computing different types of mathematical operations, along with different sizes of data. Our analysis is available for fundamental measurements of the abstract-level cost-effective use of massively parallel computations, especially for big-data applications.

Download Full-text

Large-Scale Sensor Network Analysis

Big Data Management, Technologies, and Applications - Advances in Data Mining and Database Management ◽

10.4018/978-1-4666-4699-5.ch013 ◽

2013 ◽

pp. 314-347 ◽

Cited By ~ 1

Author(s):

Joaquin Vanschoren ◽

Ugo Vespier ◽

Shengfa Miao ◽

Marvin Meeng ◽

Ricardo Cachucho ◽

...

Keyword(s):

Big Data ◽

Data Analysis ◽

Large Scale ◽

Vital Signs ◽

Sensor Data ◽

Atmospheric Conditions ◽

Big Data Applications ◽

The World ◽

Sheer Size ◽

Effective Use

Sensors are increasingly being used to monitor the world around us. They measure movements of structures such as bridges, windmills, and plane wings, human’s vital signs, atmospheric conditions, and fluctuations in power and water networks. In many cases, this results in large networks with different types of sensors, generating impressive amounts of data. As the volume and complexity of data increases, their effective use becomes more challenging, and novel solutions are needed both on a technical as well as a scientific level. Founded on several real-world applications, this chapter discusses the challenges involved in large-scale sensor data analysis and describes practical solutions to address them. Due to the sheer size of the data and the large amount of computation involved, these are clearly “Big Data” applications.

Download Full-text

Affordances of Data Science in Agriculture, Manufacturing, and Education

Web Services ◽

10.4018/978-1-5225-7501-6.ch052 ◽

2019 ◽

pp. 953-978

Author(s):

Krishnan Umachandran ◽

Debra Sharon Ferdinand-James

Keyword(s):

Big Data ◽

Large Scale ◽

Data Science ◽

Data Generation ◽

Large Scale Data ◽

Big Data Applications ◽

Effective Decision ◽

Effective Decision Making ◽

Text Images ◽

Scale Data

Continued technological advancements of the 21st Century afford massive data generation in sectors of our economy to include the domains of agriculture, manufacturing, and education. However, harnessing such large-scale data, using modern technologies for effective decision-making appears to be an evolving science that requires knowledge of Big Data management and analytics. Big data in agriculture, manufacturing, and education are varied such as voluminous text, images, and graphs. Applying Big data science techniques (e.g., functional algorithms) for extracting intelligence data affords decision markers quick response to productivity, market resilience, and student enrollment challenges in today's unpredictable markets. This chapter serves to employ data science for potential solutions to Big Data applications in the sectors of agriculture, manufacturing and education to a lesser extent, using modern technological tools such as Hadoop, Hive, Sqoop, and MongoDB.

Download Full-text

Affordances of Data Science in Agriculture, Manufacturing, and Education

Privacy and Security Policies in Big Data - Advances in Information Security, Privacy, and Ethics ◽

10.4018/978-1-5225-2486-1.ch002 ◽

2017 ◽

pp. 14-40 ◽

Cited By ~ 2

Author(s):

Krishnan Umachandran ◽

Debra Sharon Ferdinand-James

Keyword(s):

Big Data ◽

Large Scale ◽

Data Science ◽

Data Generation ◽

Large Scale Data ◽

Big Data Applications ◽

Effective Decision ◽

Effective Decision Making ◽

Text Images ◽

Scale Data

Download Full-text

Cost-Effective, Workload-Adaptive Migration of Big Data Applications to the Cloud

Proceedings of the 2019 International Conference on Management of Data - SIGMOD '19 ◽

10.1145/3299869.3320240 ◽

2019 ◽

Author(s):

Victor Giannakouris ◽

Alejandro Fernandez ◽

Alkis Simitsis ◽

Shivnath Babu

Keyword(s):

Big Data ◽

Cost Effective ◽

Big Data Applications

Download Full-text

Upstream to downstream: a multiple-assessment-point approach for targeting non-point-source priority management areas at large watershed scale

Hydrology and Earth System Sciences Discussions ◽

10.5194/hessd-10-14535-2013 ◽

2013 ◽

Vol 10 (11) ◽

pp. 14535-14555

Author(s):

L. Chen ◽

Y. Zhong ◽

G. Wei ◽

Z. Shen

Keyword(s):

Point Source ◽

Large Scale ◽

Cost Effective ◽

Small Scale ◽

Watershed Scale ◽

Nps Pollution ◽

Specific Assessment ◽

High Level ◽

Multiple Assessment

Abstract. The identification of priority management areas (PMAs) is essential for the control of non-point source (NPS) pollution, especially for a large-scale watershed. However, previous studies have typically focused on small-scale catchments adjacent to specific assessment points; thus, the interactions between multiple river points remain poorly understood. In this study, a multiple-assessment-point PMA (MAP-PMA) framework was proposed by integrating the upstream sources and the downstream transport aspects of NPS pollution. Based on the results, the integration of the upstream input changes was vital for the final PMAs map, especially for downstream areas. Contrary to conventional wisdom, this research recommended that the NPS pollutants could be best controlled among the upstream high-level PMAs when protecting the water quality of the entire watershed. The MAP-PMA framework provided a more cost-effective tool for the establishment of conservation practices, especially for a large-scale watershed.

Download Full-text

Novel holistic architecture for analytical operation on sensory data relayed as cloud services

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v10i4.pp4322-4330 ◽

2020 ◽

Vol 10 (4) ◽

pp. 4322

Author(s):

Manujakshi B. C ◽

K. B. Ramesh

Keyword(s):

Big Data ◽

Data Analytics ◽

Large Scale ◽

Big Data Analytics ◽

Cost Effective ◽

Cloud Services ◽

Sensor Nodes ◽

Sensory Data ◽

Energy Constrained ◽

Analytical Operation

With increasing adoption of the sensor-based application, there is an exponential rise of the sensory data that eventually take the shape of the big data. However, the practicality of executing high end analytical operation over the resource-constrained big data has never being studied closely. After reviewing existing approaches, it is explored that there is no cost effective schemes of big data analytics over large scale sensory data processiing that can be directly used as a service. Therefore, the propsoed system introduces a holistic architecture where streamed data after performing extraction of knowedge can be offered in the form of services. Implemented in MATLAB, the proposed study uses a very simplistic approach considering energy constrained of the sensor nodes to find that proposed system offers better accuracy, reduced mining duration (i.e. faster response time), and reduced memory dependencies to prove that it offers cost effective analytical solution in contrast to existing system.

Download Full-text

CRIC: Computing Resource Information Catalogue as a unified topology system for a large scale, heterogeneous and dynamic computing infrastructure

EPJ Web of Conferences ◽

10.1051/epjconf/202024503032 ◽

2020 ◽

Vol 245 ◽

pp. 03032

Author(s):

Alexey Anisenkov ◽

Julia Andreeva ◽

Alessandro Di Girolamo ◽

Panos Paparrigopoulos ◽

Boris Vasilev

Keyword(s):

Large Scale ◽

Heterogeneous Computing ◽

Third Party ◽

Resource Information ◽

Independent Information ◽

Recent Developments ◽

Level Information ◽

Suitable Framework ◽

High Level ◽

Computing Infrastructure

CRIC is a high-level information system which provides flexible, reliable and complete topology and configuration description for a large scale distributed heterogeneous computing infrastructure. CRIC aims to facilitate distributed computing operations for the LHC experiments and consolidate WLCG topology information. It aggregates information coming from various low-level information sources and complements topology description with experimentspecific data structures and settings required by the LHC VOs in order to exploit computing resources. Being an experiment-oriented but still experiment-independent information middleware, CRIC offers a generic solution, in the form of a suitable framework with appropriate interfaces implemented, which can be successfully applied on the global WLCG level or at the level of a particular LHC experiment. For example there are CRIC instances for CMS[11] and ATLAS[10]. CRIC can even be used for a special task. For example, a dedicated CRIC instance has been built to support transfer tests performed by DOMA Third Party Copy working group. Moreover, extensibility and flexibility of the system allow CRIC to follow technology evolution and easily implement concepts required to describe new types of computing and storage resources. The contribution describes the overall CRIC architecture, the plug-in based implementation of the CRIC components as well as recent developments and future plans.

Download Full-text

On the Dynamic Shifting of the MapReduce Timeout

Advances in Data Mining and Database Management - Managing and Processing Big Data in Cloud Computing ◽

10.4018/978-1-4666-9767-6.ch001 ◽

2016 ◽

pp. 1-22

Author(s):

Bunjamin Memishi ◽

Shadi Ibrahim ◽

Maria S. Perez ◽

Gabriel Antoniu

Keyword(s):

Big Data ◽

Data Processing ◽

Response Time ◽

Case Studies ◽

Large Scale ◽

Failure Detection ◽

Mapreduce Framework ◽

Big Data Applications ◽

Design Ideas

MapReduce has become a relevant framework for Big Data processing in the cloud. At large-scale clouds, failures do occur and may incur unwanted performance degradation to Big Data applications. As the reliability of MapReduce depends on how well they detect and handle failures, this book chapter investigates the problem of failure detection in the MapReduce framework. The case studies of this contribution reveal that the current static timeout value is not adequate and demonstrate significant variations in the application's response time with different timeout values. While arguing that comparatively little attention has been devoted to the failure detection in the framework, the chapter presents design ideas for a new adaptive timeout.

Download Full-text

Utilization of Crab Waste for Cost-Effective Bioproduction of Prodigiosin

Marine Drugs ◽

10.3390/md18110523 ◽

2020 ◽

Vol 18 (11) ◽

pp. 523 ◽

Cited By ~ 1

Author(s):

Van Bon Nguyen ◽

Dai Nam Nguyen ◽

Anh Dzung Nguyen ◽

Van Anh Ngo ◽

That Quang Ton ◽

...

Keyword(s):

Large Scale ◽

Cost Effective ◽

Uv Spectra ◽

Culture Broth ◽

High Yield ◽

Spectra Analysis ◽

Fermentation Time ◽

Flight Mass Spectrometry ◽

Short Period ◽

High Level

This study aimed to establish the culture process for the cost-effective production of prodigiosin (PG) from demineralized crab shell powder (de-CSP), a fishery processing byproduct created via fermentation. Among the tested PG-producing strains, Serratia marcescens TNU02 was demonstrated to be the most active strain. Various ratios of protein/de-CSP were used as the sources of C/N for PG biosynthesis. The PG yield was significantly enhanced when the casein/de-CSP ratio was controlled in the range of 3/7 to 4/6. TNU02 produced PG with a high yield (5100 mg/L) in a 15 L bioreactor system containing 4.5 L of a newly-designed liquid medium containing 1.6% C/N source (protein/de-CSP ratio of 3/7), 0.02% (NH4)2SO4, 0.1% K2HPO4, and an initial pH of 6.15, at 27 °C for 8 h in dark conditions. The red pigment was purified from the culture broth and then quantified as being PG by specific Matrix-Assisted Laser Desorption Ionization-Time of Flight Mass Spectrometry (MALDI-TOF MS) and UV spectra analysis. The purified PG demonstrated moderate antioxidant and effective inhibition against four cancerous cell lines. Notably, this study was the first to report on using crab wastes for PG bioproduction with high-level productivity (5100 mg/L) in a large scale (4.5 L per pilot) in a short period of fermentation time (8 h). The salt compositions, including (NH4)2SO4 and K2HPO4, were also a novel finding for the enhancement of PG yield by S. marcescens in this report.

Download Full-text

Analyzing Big Data with the Hybrid Interval Regression Methods

The Scientific World JOURNAL ◽

10.1155/2014/243921 ◽

2014 ◽

Vol 2014 ◽

pp. 1-8 ◽

Cited By ~ 1

Author(s):

Chia-Hui Huang ◽

Keng-Chieh Yang ◽

Han-Ying Kao

Keyword(s):

Support Vector Machine ◽

Big Data ◽

Information Technologies ◽

Large Scale ◽

Cloud Services ◽

Support Vector ◽

Large Scale Data ◽

Big Data Applications ◽

Smooth Support Vector Machine ◽

Scale Data

Big data is a new trend at present, forcing the significant impacts on information technologies. In big data applications, one of the most concerned issues is dealing with large-scale data sets that often require computation resources provided by public cloud services. How to analyze big data efficiently becomes a big challenge. In this paper, we collaborate interval regression with the smooth support vector machine (SSVM) to analyze big data. Recently, the smooth support vector machine (SSVM) was proposed as an alternative of the standard SVM that has been proved more efficient than the traditional SVM in processing large-scale data. In addition the soft margin method is proposed to modify the excursion of separation margin and to be effective in the gray zone that the distribution of data becomes hard to be described and the separation margin between classes.

Download Full-text