scholarly journals Large-Scale Data Computing Performance Comparisons on SYCL Heterogeneous Parallel Processing Layer Implementations

2020 ◽  
Vol 10 (5) ◽  
pp. 1656
Author(s):  
Woosuk Shin ◽  
Kwan-Hee Yoo ◽  
Nakhoon Baek

Today, many big data applications require massively parallel tasks to compute complicated mathematical operations. To perform parallel tasks, platforms like CUDA (Compute Unified Device Architecture) and OpenCL (Open Computing Language) are widely used and developed to enhance the throughput of massively parallel tasks. There is also a need for high-level abstractions and platform-independence over those massively parallel computing platforms. Recently, Khronos group announced SYCL (C++ Single-source Heterogeneous Programming for OpenCL), a new cross-platform abstraction layer, to provide an efficient way for single-source heterogeneous computing, with C++-template-level abstractions. However, since there has been no official implementation of SYCL, we currently have several different implementations from various vendors. In this paper, we analyse the characteristics of those SYCL implementations. We also show performance measures of those SYCL implementations, especially for well-known massively parallel tasks. We show that each implementation has its own strength in computing different types of mathematical operations, along with different sizes of data. Our analysis is available for fundamental measurements of the abstract-level cost-effective use of massively parallel computations, especially for big-data applications.

Author(s):  
Joaquin Vanschoren ◽  
Ugo Vespier ◽  
Shengfa Miao ◽  
Marvin Meeng ◽  
Ricardo Cachucho ◽  
...  

Sensors are increasingly being used to monitor the world around us. They measure movements of structures such as bridges, windmills, and plane wings, human’s vital signs, atmospheric conditions, and fluctuations in power and water networks. In many cases, this results in large networks with different types of sensors, generating impressive amounts of data. As the volume and complexity of data increases, their effective use becomes more challenging, and novel solutions are needed both on a technical as well as a scientific level. Founded on several real-world applications, this chapter discusses the challenges involved in large-scale sensor data analysis and describes practical solutions to address them. Due to the sheer size of the data and the large amount of computation involved, these are clearly “Big Data” applications.


Web Services ◽  
2019 ◽  
pp. 953-978
Author(s):  
Krishnan Umachandran ◽  
Debra Sharon Ferdinand-James

Continued technological advancements of the 21st Century afford massive data generation in sectors of our economy to include the domains of agriculture, manufacturing, and education. However, harnessing such large-scale data, using modern technologies for effective decision-making appears to be an evolving science that requires knowledge of Big Data management and analytics. Big data in agriculture, manufacturing, and education are varied such as voluminous text, images, and graphs. Applying Big data science techniques (e.g., functional algorithms) for extracting intelligence data affords decision markers quick response to productivity, market resilience, and student enrollment challenges in today's unpredictable markets. This chapter serves to employ data science for potential solutions to Big Data applications in the sectors of agriculture, manufacturing and education to a lesser extent, using modern technological tools such as Hadoop, Hive, Sqoop, and MongoDB.


Author(s):  
Krishnan Umachandran ◽  
Debra Sharon Ferdinand-James

Continued technological advancements of the 21st Century afford massive data generation in sectors of our economy to include the domains of agriculture, manufacturing, and education. However, harnessing such large-scale data, using modern technologies for effective decision-making appears to be an evolving science that requires knowledge of Big Data management and analytics. Big data in agriculture, manufacturing, and education are varied such as voluminous text, images, and graphs. Applying Big data science techniques (e.g., functional algorithms) for extracting intelligence data affords decision markers quick response to productivity, market resilience, and student enrollment challenges in today's unpredictable markets. This chapter serves to employ data science for potential solutions to Big Data applications in the sectors of agriculture, manufacturing and education to a lesser extent, using modern technological tools such as Hadoop, Hive, Sqoop, and MongoDB.


2013 ◽  
Vol 10 (11) ◽  
pp. 14535-14555
Author(s):  
L. Chen ◽  
Y. Zhong ◽  
G. Wei ◽  
Z. Shen

Abstract. The identification of priority management areas (PMAs) is essential for the control of non-point source (NPS) pollution, especially for a large-scale watershed. However, previous studies have typically focused on small-scale catchments adjacent to specific assessment points; thus, the interactions between multiple river points remain poorly understood. In this study, a multiple-assessment-point PMA (MAP-PMA) framework was proposed by integrating the upstream sources and the downstream transport aspects of NPS pollution. Based on the results, the integration of the upstream input changes was vital for the final PMAs map, especially for downstream areas. Contrary to conventional wisdom, this research recommended that the NPS pollutants could be best controlled among the upstream high-level PMAs when protecting the water quality of the entire watershed. The MAP-PMA framework provided a more cost-effective tool for the establishment of conservation practices, especially for a large-scale watershed.


Author(s):  
Manujakshi B. C ◽  
K. B. Ramesh

With increasing adoption of the sensor-based application, there is an exponential rise of the sensory data that eventually take the shape of the big data. However, the practicality of executing high end analytical operation over the resource-constrained big data has never being studied closely. After reviewing existing approaches, it is explored that there is no cost effective schemes of big data analytics over large scale sensory data processiing that can be directly used as a service. Therefore, the propsoed system introduces a holistic architecture where streamed data after performing extraction of knowedge can be offered in the form of services. Implemented in MATLAB, the proposed study uses a very simplistic approach considering energy constrained of the sensor nodes to find that proposed system offers better accuracy, reduced mining duration (i.e. faster response time), and reduced memory dependencies to prove that it offers cost effective analytical solution in contrast to existing system.


2020 ◽  
Vol 245 ◽  
pp. 03032
Author(s):  
Alexey Anisenkov ◽  
Julia Andreeva ◽  
Alessandro Di Girolamo ◽  
Panos Paparrigopoulos ◽  
Boris Vasilev

CRIC is a high-level information system which provides flexible, reliable and complete topology and configuration description for a large scale distributed heterogeneous computing infrastructure. CRIC aims to facilitate distributed computing operations for the LHC experiments and consolidate WLCG topology information. It aggregates information coming from various low-level information sources and complements topology description with experimentspecific data structures and settings required by the LHC VOs in order to exploit computing resources. Being an experiment-oriented but still experiment-independent information middleware, CRIC offers a generic solution, in the form of a suitable framework with appropriate interfaces implemented, which can be successfully applied on the global WLCG level or at the level of a particular LHC experiment. For example there are CRIC instances for CMS[11] and ATLAS[10]. CRIC can even be used for a special task. For example, a dedicated CRIC instance has been built to support transfer tests performed by DOMA Third Party Copy working group. Moreover, extensibility and flexibility of the system allow CRIC to follow technology evolution and easily implement concepts required to describe new types of computing and storage resources. The contribution describes the overall CRIC architecture, the plug-in based implementation of the CRIC components as well as recent developments and future plans.


Author(s):  
Bunjamin Memishi ◽  
Shadi Ibrahim ◽  
Maria S. Perez ◽  
Gabriel Antoniu

MapReduce has become a relevant framework for Big Data processing in the cloud. At large-scale clouds, failures do occur and may incur unwanted performance degradation to Big Data applications. As the reliability of MapReduce depends on how well they detect and handle failures, this book chapter investigates the problem of failure detection in the MapReduce framework. The case studies of this contribution reveal that the current static timeout value is not adequate and demonstrate significant variations in the application's response time with different timeout values. While arguing that comparatively little attention has been devoted to the failure detection in the framework, the chapter presents design ideas for a new adaptive timeout.


Marine Drugs ◽  
2020 ◽  
Vol 18 (11) ◽  
pp. 523 ◽  
Author(s):  
Van Bon Nguyen ◽  
Dai Nam Nguyen ◽  
Anh Dzung Nguyen ◽  
Van Anh Ngo ◽  
That Quang Ton ◽  
...  

This study aimed to establish the culture process for the cost-effective production of prodigiosin (PG) from demineralized crab shell powder (de-CSP), a fishery processing byproduct created via fermentation. Among the tested PG-producing strains, Serratia marcescens TNU02 was demonstrated to be the most active strain. Various ratios of protein/de-CSP were used as the sources of C/N for PG biosynthesis. The PG yield was significantly enhanced when the casein/de-CSP ratio was controlled in the range of 3/7 to 4/6. TNU02 produced PG with a high yield (5100 mg/L) in a 15 L bioreactor system containing 4.5 L of a newly-designed liquid medium containing 1.6% C/N source (protein/de-CSP ratio of 3/7), 0.02% (NH4)2SO4, 0.1% K2HPO4, and an initial pH of 6.15, at 27 °C for 8 h in dark conditions. The red pigment was purified from the culture broth and then quantified as being PG by specific Matrix-Assisted Laser Desorption Ionization-Time of Flight Mass Spectrometry (MALDI-TOF MS) and UV spectra analysis. The purified PG demonstrated moderate antioxidant and effective inhibition against four cancerous cell lines. Notably, this study was the first to report on using crab wastes for PG bioproduction with high-level productivity (5100 mg/L) in a large scale (4.5 L per pilot) in a short period of fermentation time (8 h). The salt compositions, including (NH4)2SO4 and K2HPO4, were also a novel finding for the enhancement of PG yield by S. marcescens in this report.


2014 ◽  
Vol 2014 ◽  
pp. 1-8 ◽  
Author(s):  
Chia-Hui Huang ◽  
Keng-Chieh Yang ◽  
Han-Ying Kao

Big data is a new trend at present, forcing the significant impacts on information technologies. In big data applications, one of the most concerned issues is dealing with large-scale data sets that often require computation resources provided by public cloud services. How to analyze big data efficiently becomes a big challenge. In this paper, we collaborate interval regression with the smooth support vector machine (SSVM) to analyze big data. Recently, the smooth support vector machine (SSVM) was proposed as an alternative of the standard SVM that has been proved more efficient than the traditional SVM in processing large-scale data. In addition the soft margin method is proposed to modify the excursion of separation margin and to be effective in the gray zone that the distribution of data becomes hard to be described and the separation margin between classes.


Sign in / Sign up

Export Citation Format

Share Document