High Performance Big Data Graph Analytics Leveraging Near Memory System

Research Directions for Big Data Graph Analytics

2015 IEEE International Congress on Big Data ◽

10.1109/bigdatacongress.2015.132 ◽

2015 ◽

Cited By ~ 10

Author(s):

John A. Miller ◽

Lakshmish Ramaswamy ◽

Krys J. Kochut ◽

Arash Fard

Keyword(s):

Big Data ◽

Research Directions ◽

Graph Analytics ◽

Data Graph

Download Full-text

Predicting Congestion States from Basic Safety Messages by Using Big-Data Graph Analytics

Transportation Research Record Journal of the Transportation Research Board ◽

10.3141/2500-07 ◽

2015 ◽

Vol 2500 (1) ◽

pp. 59-66 ◽

Cited By ~ 3

Author(s):

Meenakshy Vasudevan ◽

Daniel Negron ◽

Matthew Feltz ◽

Jennifer Mallette ◽

Karl Wunderlich

Keyword(s):

Big Data ◽

Real Time ◽

Communication Systems ◽

Full Potential ◽

Connected Vehicle ◽

Graph Analytics ◽

Pilot Model ◽

Connected Vehicle Technology ◽

Vehicle Technology ◽

Data Graph

In a connected-vehicle environment, wireless subsecond data exchange connects vehicles, the infrastructure, and travelers’ mobile devices. These data have the promise to transform the geographic scope, precision, and latency of transportation system control; fulfillment of that promise could result in significant safety, mobility, and environmental benefits. However, the new data influx also has the potential to overburden legacy computational and communication systems. Although connected-vehicle technology can facilitate ubiquitous system coverage, the existing prediction methods, computational platforms, and data management methods are insufficient to process the data within a reasonable time frame for real-time predictions. An investigation of the ways in which advanced (big-data) analytics might be applied to realize the full potential of connected-vehicle technology is particularly relevant now as this technology evolves from research to deployment. This paper presents an approach combining big-data graph analytics with high-performance computing to predict traffic congestion by analyzing nearly 4 billion basic safety messages generated by the safety pilot model deployment conducted in 2012–2013. This paper provides an alternative approach for predicting congestion in 30.5-m segments anywhere on the network at 1-min intervals 30 to 60 min before actual congestion over a time window of 1 h. Despite sparseness of data, the proposed framework predicted highly congested locations 40% of the time. Severity of congestion was predicted with an accuracy of 77%. This combination of rapid computation and predictive accuracy may provide significant value in future real-time decision support systems that leverage connected-vehicle data.

Download Full-text

Directions for Big Data Graph Analytics Research

Services Transactions on Big Data ◽

10.29268/stbd.2015.2.1.2 ◽

2015 ◽

Vol 2 (1) ◽

pp. 15-27 ◽

Cited By ~ 2

Author(s):

John A. Miller ◽

◽

Lakshmish Ramaswamy ◽

Krys J. Kochut ◽

Arash Fard ◽

...

Keyword(s):

Big Data ◽

Graph Analytics ◽

Data Graph

Download Full-text

High Performance NAND Flash Memory System with a Data Buffer

IEICE Transactions on Fundamentals of Electronics Communications and Computer Sciences ◽

10.1587/transfun.e96.a.2645 ◽

2013 ◽

Vol E96.A (12) ◽

pp. 2645-2651 ◽

Cited By ~ 1

Author(s):

Jung-Hoon LEE ◽

Bo-Sung JUNG

Keyword(s):

Flash Memory ◽

High Performance ◽

Memory System ◽

Nand Flash ◽

Nand Flash Memory ◽

Data Buffer

Download Full-text

Perspectives on High-Performance Computing in a Big Data World

Proceedings of the 28th International Symposium on High-Performance Parallel and Distributed Computing - HPDC '19 ◽

10.1145/3307681.3325410 ◽

2019 ◽

Author(s):

Geoffrey C. Fox

Keyword(s):

Big Data ◽

High Performance Computing ◽

High Performance ◽

Performance Computing

Download Full-text

Computational storage: an efficient and scalable platform for big data and HPC applications

Journal Of Big Data ◽

10.1186/s40537-019-0265-5 ◽

2019 ◽

Vol 6 (1) ◽

Cited By ~ 2

Author(s):

Mahdi Torabzadehkashi ◽

Siavash Rezaei ◽

Ali HeydariGorji ◽

Hosein Bobarshad ◽

Vladimir Alves ◽

...

Keyword(s):

Big Data ◽

High Performance ◽

Distributed Processing ◽

Data Access ◽

Distributed Applications ◽

Process Data ◽

Storage Devices ◽

Hadoop Mapreduce ◽

Big Data Applications ◽

Application Processor

AbstractIn the era of big data applications, the demand for more sophisticated data centers and high-performance data processing mechanisms is increasing drastically. Data are originally stored in storage systems. To process data, application servers need to fetch them from storage devices, which imposes the cost of moving data to the system. This cost has a direct relation with the distance of processing engines from the data. This is the key motivation for the emergence of distributed processing platforms such as Hadoop, which move process closer to data. Computational storage devices (CSDs) push the “move process to data” paradigm to its ultimate boundaries by deploying embedded processing engines inside storage devices to process data. In this paper, we introduce Catalina, an efficient and flexible computational storage platform, that provides a seamless environment to process data in-place. Catalina is the first CSD equipped with a dedicated application processor running a full-fledged operating system that provides filesystem-level data access for the applications. Thus, a vast spectrum of applications can be ported for running on Catalina CSDs. Due to these unique features, to the best of our knowledge, Catalina CSD is the only in-storage processing platform that can be seamlessly deployed in clusters to run distributed applications such as Hadoop MapReduce and HPC applications in-place without any modifications on the underlying distributed processing framework. For the proof of concept, we build a fully functional Catalina prototype and a CSD-equipped platform using 16 Catalina CSDs to run Intel HiBench Hadoop and HPC benchmarks to investigate the benefits of deploying Catalina CSDs in the distributed processing environments. The experimental results show up to 2.2× improvement in performance and 4.3× reduction in energy consumption, respectively, for running Hadoop MapReduce benchmarks. Additionally, thanks to the Neon SIMD engines, the performance and energy efficiency of DFT algorithms are improved up to 5.4× and 8.9×, respectively.

Download Full-text