Virtual Laboratories for Biodiversity Modelling: An Australian perspective

Biodiversity Information Science and Standards ◽

10.3897/biss.3.37440 ◽

2019 ◽

Vol 3 ◽

Author(s):

Sarah Richmond ◽

Chantal Huijbers

Keyword(s):

Cloud Computing ◽

High Performance ◽

Best Practice ◽

Spatial Scales ◽

Easy Access ◽

Next Generation ◽

Ecological Data ◽

Research Infrastructures ◽

Computing Infrastructure ◽

Digital Research

Recent technologies have enabled consistent and continuous collection of ecological data at high resolutions across large spatial scales. The challenge remains, however, to bring these data together and expose them to methods and tools to analyse the interaction between biodiversity and the environment. These challenges are mostly associated with the accessibility, visibility and interoperability of data, and the technical computation needed to interpret the data. Australia has invested in digital research infrastructures through the National Collaborative Research Infrastructure Strategy (NCRIS). Here we present two platforms that provide easy access to global biodiversity, climate and environmental datasets integrated with a suite of analytical tools and linked to high-performance cloud computing infrastructure. The Biodiversity and Climate Change Virtual Laboratory (BCCVL) is a point-and-click online platform for modelling species responses to environmental conditions, which provides an easy introduction into the scientific concepts of models without the need for the user to understand the underlying code. For ecologists who write their own modelling scripts, we have developed ecocloud: a new online environment that provides access to data connected with command-line analysis tools like RStudio and Jupyter Notebooks as well as a virtual desktop environment using Australia’s national cloud computing infrastructure. ecocloud is built through collaborations among key facilities within the ecosciences domain, establishing a collective long-term vision of creating an ecosystem of infrastructure that provides the capability to enable reliable prediction of future environmental outcomes. Underpinning these tools is an innovative training program, ecoEd, which provides cohesive training and skill development to enhance the translation of Australia’s digital research infrastructures to the ecoscience community by educating and upskilling the next generation of environmental scientists and managers. Both of these platforms are built using a best-practice microservice model that allows for complete flexibility, scalability and stability in a cloud environment. Both the BCCVL and ecocloud are open-source developments and provide opportunities for interoperability with other platforms (e.g. Atlas of Living Austalia). In Australia, the same technical infrastructure is also used for a platform for the humanities and social science domain, indicating that the underlying technologies are not domain specific. We therefore welcome collaborations with other organisations to further develop these platforms for the wider bio- and ecoinformatics community. This presentation will showcase the tools, services, and underpinning infrastructure alongside our training and engagement framework as an exemplar in building platforms for next generation biodiversity science.

Download Full-text

Designing a Forensic-Enabling Cloud Ecosystem

Cloud Technology ◽

10.4018/978-1-4666-6539-2.ch026 ◽

2015 ◽

pp. 566-579

Author(s):

Keyun Ruan

Keyword(s):

Cloud Computing ◽

Research And Development ◽

Technological Innovation ◽

Design Thinking ◽

Legal Framework ◽

Next Generation ◽

Generation Computing ◽

Major Transition ◽

Computing Infrastructure

Cloud computing is a major transition, and it comes at a unique historical and strategic time for applying foundational design thinking to secure the next-generation computing infrastructure and enable waves of business and technological innovation. In this chapter, the researcher summarizes six key research and development areas for designing a forensic-enabling cloud ecosystem, including architecture and matrix, standardization and strategy, evidence segregation, security and forensic integration, legal framework, and privacy.

Download Full-text

Virtualizing high-end GPGPUs on ARM clusters for the next generation of high performance cloud computing

Cluster Computing ◽

10.1007/s10586-013-0341-0 ◽

2014 ◽

Vol 17 (1) ◽

pp. 139-152 ◽

Cited By ~ 20

Author(s):

Raffaele Montella ◽

Giulio Giunta ◽

Giuliano Laccetti

Keyword(s):

Cloud Computing ◽

High Performance ◽

Next Generation

Download Full-text

Best practice regarding the three P's: profiling, portability and provenance when running HPC geoscientific applications

10.5194/gmd-2017-242 ◽

2017 ◽

Author(s):

Wendy Sharples ◽

Ilya Zhukov ◽

Markus Geimer ◽

Klaus Goergen ◽

Stefan Kollet ◽

...

Keyword(s):

High Performance ◽

Best Practice ◽

Complex Model ◽

Workflow Engine ◽

Next Generation ◽

Control Framework ◽

Code Migration ◽

Substantial Investment ◽

Model Components ◽

Performance Analysis Tools

Abstract. Geoscientific modeling is constantly evolving, with next generation geoscientific models and applications placing high demands on high performance computing (HPC) resources. These demands are being met by new developments in HPC architectures, software libraries, and infrastructures. New HPC developments require new programming paradigms leading to substantial investment in model porting, tuning, and refactoring of complicated legacy code in order to use these resources effectively. In addition to the challenge of new massively parallel HPC systems, reproducibility of simulation and analysis results is of great concern, as the next generation geoscientific models are based on complex model implementations and profiling, modeling and data processing workflows. Thus, in order to reduce both the duration and the cost of code migration, aid in the development of new models or model components, while ensuring reproducibility and sustainability over the complete data life cycle, a streamlined approach to profiling, porting, and provenance tracking is necessary.We propose a run control framework (RCF) integrated with a workflow engine which encompasses all stages of the modeling chain: 1. preprocess input, 2. compilation of code (including code instrumentation with performance analysis tools), 3. simulation run, 4. postprocess and analysis, to address these issues.Within this RCF, the workflow engine is used to create and manage benchmark or simulation parameter combinations and performs the documentation and data organization for reproducibility. This approach automates the process of porting and tuning, profiling, testing, and running a geoscientific model. We show that in using our run control framework, testing, benchmarking, profiling, and running models is less time consuming and more robust, resulting in more efficient use of HPC resources, more strategic code development, and enhanced data integrity and reproducibility.

Download Full-text

The Impact of High-Performance Computing Best Practice Applied to Next-Generation Sequencing Workflows

10.1101/017665 ◽

2015 ◽

Cited By ~ 3

Author(s):

Pierre Carrier ◽

Bill Long ◽

Richard Walsh ◽

Jef Dawson ◽

Carlos P. Sosa ◽

...

Keyword(s):

Next Generation Sequencing ◽

High Performance Computing ◽

High Performance ◽

Best Practice ◽

Distributed Memory ◽

Rna Seq ◽

Next Generation ◽

The Impact ◽

Performance Computing ◽

Generation Sequencing

High Performance Computing (HPC) Best Practice offers opportunities to implement lessons learned in areas such as computational chemistry and physics in genomics workflows, specifically Next-Generation Sequencing (NGS) workflows. In this study we will briefly describe how distributed-memory parallelism can be an important enhancement to the performance and resource utilization of NGS workflows. We will illustrate this point by showing results on the parallelization of the Inchworm module of the Trinity RNA-Seq pipeline for de novo transcriptome assembly. We show that these types of applications can scale to thousands of cores. Time scaling as well as memory scaling will be discussed at length using two RNA-Seq datasets, targeting the Mus musculus (mouse) and the Axolotl (Mexican salamander). Details about the efficient MPI communication and the impact on performance will also be shown. We hope to demonstrate that this type of parallelization approach can be extended to most types of bioinformatics workflows, with substantial benefits. The efficient, distributed-memory parallel implementation eliminates memory bottlenecks and dramatically accelerates NGS analysis. We further include a summary of programming paradigms available to the bioinformatics community, such as C++/MPI.

Download Full-text

Novel Software Containers for Engineering and Scientific Simulations in the Cloud

International Journal of Grid and High Performance Computing ◽

10.4018/ijghpc.2016010103 ◽

2016 ◽

Vol 8 (1) ◽

pp. 38-49 ◽

Cited By ~ 1

Author(s):

Wolfgang Gentzsch ◽

Burak Yenier

Keyword(s):

Cloud Computing ◽

High Performance ◽

Data Transfer ◽

Current Status ◽

Loss Of Control ◽

On Demand ◽

Scientific Simulations ◽

Software Licensing ◽

Lock In ◽

Computing Infrastructure

The adoption of cloud computing for engineering and scientific applications is still lagging behind, although many cloud providers today offer powerful computing infrastructure as a service, and enterprises are already making routine use of it. Reasons for this slow adoption are many: complex access to clouds, inflexible software licensing, time-consuming big data transfer, loss of control over their assets, service provider lock-in, to name a few. But recently, with the advent of the UberCloud's novel high-performance software container technology, many of these roadblocks are currently being removed. In this paper the authors describe the current status and landscape of clouds for engineers and scientists, the benefits and challenges, and how UberCloud is providing an online solution platform and container technology which reduce or even remove many of the current roadblock, and thus offer every engineer and scientist additional compute power on demand, in an easily accessible way.

Download Full-text

A run control framework to streamline profiling, porting, and tuning simulation runs and provenance tracking of geoscientific applications

Geoscientific Model Development ◽

10.5194/gmd-11-2875-2018 ◽

2018 ◽

Vol 11 (7) ◽

pp. 2875-2895

Author(s):

Wendy Sharples ◽

Ilya Zhukov ◽

Markus Geimer ◽

Klaus Goergen ◽

Sebastian Luehrs ◽

...

Keyword(s):

High Performance ◽

Best Practice ◽

Ad Hoc ◽

Complex Model ◽

Workflow Engine ◽

Next Generation ◽

Control Framework ◽

Code Migration ◽

Practice Approach ◽

Performance Analysis Tools

Abstract. Geoscientific modeling is constantly evolving, with next-generation geoscientific models and applications placing large demands on high-performance computing (HPC) resources. These demands are being met by new developments in HPC architectures, software libraries, and infrastructures. In addition to the challenge of new massively parallel HPC systems, reproducibility of simulation and analysis results is of great concern. This is due to the fact that next-generation geoscientific models are based on complex model implementations and profiling, modeling, and data processing workflows. Thus, in order to reduce both the duration and the cost of code migration, aid in the development of new models or model components, while ensuring reproducibility and sustainability over the complete data life cycle, an automated approach to profiling, porting, and provenance tracking is necessary. We propose a run control framework (RCF) integrated with a workflow engine as a best practice approach to automate profiling, porting, provenance tracking, and simulation runs. Our RCF encompasses all stages of the modeling chain: (1) preprocess input, (2) compilation of code (including code instrumentation with performance analysis tools), (3) simulation run, and (4) postprocessing and analysis, to address these issues. Within this RCF, the workflow engine is used to create and manage benchmark or simulation parameter combinations and performs the documentation and data organization for reproducibility. In this study, we outline this approach and highlight the subsequent developments scheduled for implementation born out of the extensive profiling of ParFlow. We show that in using our run control framework, testing, benchmarking, profiling, and running models is less time consuming and more robust than running geoscientific applications in an ad hoc fashion, resulting in more efficient use of HPC resources, more strategic code development, and enhanced data integrity and reproducibility.

Download Full-text

Designing a Forensic-Enabling Cloud Ecosystem

Cybercrime and Cloud Forensics ◽

10.4018/978-1-4666-2662-1.ch014 ◽

2013 ◽

pp. 331-344 ◽

Cited By ~ 3

Author(s):

Keyun Ruan

Keyword(s):

Cloud Computing ◽

Research And Development ◽

Technological Innovation ◽

Design Thinking ◽

Legal Framework ◽

Next Generation ◽

Generation Computing ◽

Major Transition ◽

Computing Infrastructure

Download Full-text

ArchaeoGRID Science Gateways for Easy Access to Distributed Computing Infrastructure for Large Data Storage and Analysis in Archaeology and History

Environmental Information Systems ◽

10.4018/978-1-5225-7033-2.ch040 ◽

2019 ◽

pp. 912-933

Author(s):

Giuliano Pelfer

Keyword(s):

Distributed Computing ◽

Data Storage ◽

High Performance ◽

Large Data ◽

Easy Access ◽

Science Gateway ◽

Spatio Temporal ◽

High Level ◽

Distributed Computing Infrastructure ◽

Computing Infrastructure

This article describes how archaeological and historical research grew as a multidisciplinary and interdisciplinary activity due to availability of larger amount of data within the reconstruction of global historical and archaeological contexts at a global spatio-temporal scale. The increased information, also integrated with data from the Earth Sciences, has had an effect on the exponential increase of complex sets of data and of refined methods of analysis. For such purposes, this article discusses the ArchaeoGRID Science Gateway paradigm for accessing ArchaeoGRID Cyberinfrastructure (CI), a Distributed Computing Infrastructure (DCI), that can supply storage and computing resources for managing and analyzing large amount of archaeological and historical data. In fact, ArchaeoGRID Science Gateway is emerging as high-level web environment that makes easier the access, in a transparent way, to DCI, as local high-performance computing, Grids and Clouds, from no specialized Virtual Research Communities (VRC) of archaeologists and historians.

Download Full-text

ArchaeoGRID Science Gateways for Easy Access to Distributed Computing Infrastructure for Large Data Storage and Analysis in Archaeology and History

International Journal of Computational Methods in Heritage Science ◽

10.4018/ijcmhs.2018010105 ◽

2018 ◽

Vol 2 (1) ◽

pp. 61-78

Author(s):

Giuliano Pelfer

Keyword(s):

Distributed Computing ◽

Data Storage ◽

High Performance ◽

Large Data ◽

Easy Access ◽

Science Gateway ◽

Spatio Temporal ◽

High Level ◽

Distributed Computing Infrastructure ◽

Computing Infrastructure

Download Full-text

A GPU Accelerated High Performance Cloud Computing Infrastructure for Grid Computing Based Virtual Environmental Laboratory

Advances in Grid Computing ◽

10.5772/14594 ◽

2011 ◽

Cited By ~ 12

Author(s):

Francisco Giunta ◽

Raffaele Montella ◽

Giuliano Laccetti ◽

Florin Isaila ◽

Francisco Javier Garca Blas

Keyword(s):

Cloud Computing ◽

Grid Computing ◽

High Performance ◽

Environmental Laboratory ◽

Computing Infrastructure

Download Full-text