Java Virtual Machine performance analysis with Java instruction level parallelism and advanced folding scheme

Design and performance analysis of a distributed Java Virtual Machine

IEEE Transactions on Parallel and Distributed Systems ◽

10.1109/tpds.2002.1011415 ◽

2002 ◽

Vol 13 (6) ◽

pp. 611-627 ◽

Cited By ~ 10

Author(s):

M. Surdeanu ◽

D. Moldovan

Keyword(s):

Performance Analysis ◽

Virtual Machine ◽

Java Virtual Machine ◽

And Performance

Download Full-text

Performance Analysis of Java Virtual Machine for Machine Learning Workloads using Apache Spark

Proceedings of the International Conference on Informatics and Analytics - ICIA-16 ◽

10.1145/2980258.2982117 ◽

2016 ◽

Author(s):

N. Hema ◽

K. G. Srinivasa ◽

Saravanan Chidambaram ◽

Sandeep Saraswat ◽

Sujoy Saraswati ◽

...

Keyword(s):

Machine Learning ◽

Performance Analysis ◽

Virtual Machine ◽

Apache Spark ◽

Java Virtual Machine

Download Full-text

Performance analysis of languages working on Java Virtual Machine based on Java, Scala and Kotlin

Journal of Computer Sciences Institute ◽

10.35784/jcsi.1609 ◽

2020 ◽

Vol 15 ◽

pp. 189-195

Author(s):

Katarzyna Buszewicz

Keyword(s):

Performance Analysis ◽

Virtual Machine ◽

Performance Testing ◽

Java Virtual Machine ◽

Performance Tests ◽

Literature Study ◽

Runtime Environment

This article presents the results of a literature study related to the construction and operation of Java Virtual Machine, as well as performance tests of selected languages using the aforementioned runtime environment on the example of Java, Scala and Kotlin. Performance testing was carried out using two applications built using the Apache Maven archetype with the built-in Java Microbenchmark Harness library.

Download Full-text

Virtual Machine Performance Analysis and Prediction

2020 International Conference on Communications, Computing, Cybersecurity, and Informatics (CCCI) ◽

10.1109/ccci49893.2020.9256518 ◽

2020 ◽

Author(s):

Yuegang Li ◽

Dongyang Ou ◽

Congfeng Jiang ◽

Jing Shen ◽

Shuangshuang Guo ◽

...

Keyword(s):

Performance Analysis ◽

Virtual Machine ◽

Machine Performance

Download Full-text

Fine-Grained Nested Virtual Machine Performance Analysis through First Level Hypervisor Tracing

2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID) ◽

10.1109/ccgrid.2017.20 ◽

2017 ◽

Cited By ~ 1

Author(s):

Hani Nemati ◽

Suchakrapani Datt Sharma ◽

Michel R. Dagenais

Keyword(s):

Performance Analysis ◽

Virtual Machine ◽

Fine Grained ◽

Machine Performance

Download Full-text

Dalvik virtual machine performance improvement based on hybrid concurrent model

Journal of Computer Applications ◽

10.3724/sp.j.1087.2012.01727 ◽

2013 ◽

Vol 32 (6) ◽

pp. 1727-1729

Author(s):

Qian LI ◽

Ping XIAO

Keyword(s):

Performance Improvement ◽

Virtual Machine ◽

Machine Performance

Download Full-text

Microarchitectural Characterization on a Mobile Workload

Applied Sciences ◽

10.3390/app11031225 ◽

2021 ◽

Vol 11 (3) ◽

pp. 1225

Author(s):

Woohyong Lee ◽

Jiyoung Lee ◽

Bo Kyung Park ◽

R. Young Chul Kim

Keyword(s):

Performance Monitoring ◽

Performance Metrics ◽

Performance Comparison ◽

Instruction Level Parallelism ◽

Data Set ◽

Performance Events ◽

Hardware Performance Counters ◽

On Chip ◽

The Comparative Study ◽

Level Parallelism

Geekbench is one of the most referenced cross-platform benchmarks in the mobile world. Most of its workloads are synthetic but some of them aim to simulate real-world behavior. In the mobile world, its microarchitectural behavior has been reported rarely since the hardware profiling features are limited to the public. As a popular mobile performance workload, it is hard to find Geekbench’s microarchitecture characteristics in mobile devices. In this paper, a thorough experimental study of Geekbench performance characterization is reported with detailed performance metrics. This study also identifies mobile system on chip (SoC) microarchitecture impacts, such as the cache subsystem, instruction-level parallelism, and branch performance. After the study, we could understand the bottleneck of workloads, especially in the cache sub-system. This means that the change of data set size directly impacts performance score significantly in some systems and will ruin the fairness of the CPU benchmark. In the experiment, Samsung’s Exynos9820-based platform was used as the tested device with Android Native Development Kit (NDK) built binaries. The Exynos9820 is a superscalar processor capable of dual issuing some instructions. To help performance analysis, we enable the capability to collect performance events with performance monitoring unit (PMU) registers. The PMU is a set of hardware performance counters which are built into microprocessors to store the counts of hardware-related activities. Throughout the experiment, functional and microarchitectural performance profiles were fully studied. This paper describes the details of the mobile performance studies above. In our experiment, the ARM DS5 tool was used for collecting runtime PMU profiles including OS-level performance data. After the comparative study is completed, users will understand more about the mobile architecture behavior, and this will help to evaluate which benchmark is preferable for fair performance comparison.

Download Full-text

UltraSynth: Insights of a CGRA Integration into a Control Engineering Environment

Journal of Signal Processing Systems ◽

10.1007/s11265-021-01641-7 ◽

2021 ◽

Author(s):

Dennis Wolf ◽

Andreas Engel ◽

Tajas Ruschke ◽

Andreas Koch ◽

Christian Hochberger

Keyword(s):

Computing System ◽

Coarse Grained ◽

Instruction Level Parallelism ◽

Control Engineering ◽

Processing Elements ◽

Actual Application ◽

Reconfigurable Arrays ◽

Engineering Environment ◽

On Chip ◽

Level Parallelism

AbstractCoarse Grained Reconfigurable Arrays (CGRAs) or Architectures are a concept for hardware accelerators based on the idea of distributing workload over Processing Elements. These processors exploit instruction level parallelism, while being energy efficient due to their simplistic internal structure. However, the incorporation into a complete computing system raises severe challenges at the hardware and software level. This article evaluates a CGRA integrated into a control engineering environment targeting a Xilinx Zynq System on Chip (SoC) in detail. Besides the actual application execution performance, the practicability of the configuration toolchain is validated. Challenges of the real-world integration are discussed and practical insights are highlighted.

Download Full-text