scholarly journals Galaxy-Kubernetes integration: scaling bioinformatics workflows in the cloud

2018 ◽  
Author(s):  
Pablo Moreno ◽  
Luca Pireddu ◽  
Pierrick Roger ◽  
Nuwan Goonasekera ◽  
Enis Afgan ◽  
...  

SummaryMaking reproducible, auditable and scalable data-processing analysis workflows is an important challenge in the field of bioinformatics. Recently, software containers and cloud computing introduced a novel solution to address these challenges. They simplify software installation, management and reproducibility by packaging tools and their dependencies. In this work we implemented a cloud provider agnostic and scalable container orchestration setup for the popular Galaxy workflow environment. This solution enables Galaxy to run on and offload jobs to most cloud providers (e.g. Amazon Web Services, Google Cloud or OpenStack, among others) through the Kubernetes container orchestrator.AvailabilityAll code has been contributed to the Galaxy Project and is available (since Galaxy 17.05) at https://github.com/galaxyproject/ in the galaxy and galaxy-kubernetes repositories. https://public.phenomenal-h2020.eu/ is an example deployment.Suppl. InformationSupplementary Files are available [email protected], European Molecular Biology Laboratory, EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK, Tel: +44-1223-494267, Fax: +44-1223-484696.

2020 ◽  
Vol 10 (24) ◽  
pp. 9148
Author(s):  
Germán Moltó ◽  
Diana M. Naranjo ◽  
J. Damian Segrelles

Cloud computing instruction requires hands-on experience with a myriad of distributed computing services from a public cloud provider. Tracking the progress of the students, especially for online courses, requires one to automatically gather evidence and produce learning analytics in order to further determine the behavior and performance of students. With this aim, this paper describes the experience from an online course in cloud computing with Amazon Web Services on the creation of an open-source data processing tool to systematically obtain learning analytics related to the hands-on activities carried out throughout the course. These data, combined with the data obtained from the learning management system, have allowed the better characterization of the behavior of students in the course. Insights from a population of more than 420 online students through three academic years have been assessed, the dataset has been released for increased reproducibility. The results corroborate that course length has an impact on online students dropout. In addition, a gender analysis pointed out that there are no statistically significant differences in the final marks between genders, but women show an increased degree of commitment with the activities planned in the course.


2020 ◽  
Vol 17 (8) ◽  
pp. 3581-3585
Author(s):  
M. S. Roobini ◽  
Selvasurya Sampathkumar ◽  
Shaik Khadar Basha ◽  
Anitha Ponraj

In the last decade cloud computing transformed the way in which we build applications. The boom in cloud computing helped to develop new software design and architecture. Helping the developers to focus more on the business logic than the infrastructure. FaaS (function as a service) compute model it gave developers to concentrate only on the application code and rest of the factors will be taken care by the cloud provider. Here we present a serverless architecture of a web application built using AWS services and provide detail analysis of lambda function and micro service software design implemented using these AWS services.


2019 ◽  
Author(s):  
Emily K.W. Lo ◽  
Remy M. Schwab ◽  
Zak Burke ◽  
Patrick Cahan

AbstractSummaryAccessibility and usability of compute-intensive bioinformatics tools can be increased with simplified web-based graphic user interfaces. However, deploying such tools as web applications presents additional barriers, including the complexity of developing a usable interface, network latency in transferring large datasets, and cost, which we encountered in developing a web-based version of our command-line tool CellNet. Learning and generalizing from this experience, we have devised a lightweight framework, Radiator, to facilitate deploying bioinformatics tools as web applications. To achieve reproducibility, usability, consistent accessibility, throughput, and cost-efficiency, Radiator is designed to be deployed on the cloud. Here, we describe the internals of Radiator and how to use it.Availability and ImplementationCode for Radiator and the CellNet Web Application are freely available at https://github.com/pcahan1 under the MIT license. The CellNet WebApp, Radiator, and Radiator-derived applications can be launched through public Amazon Machine Images from the cloud provider Amazon Web Services (AWS) (https://aws.amazon.com/).


Sensors ◽  
2021 ◽  
Vol 21 (7) ◽  
pp. 2433
Author(s):  
Christopher Kelly ◽  
Nikolaos Pitropakis ◽  
Alexios Mylonas ◽  
Sean McKeown ◽  
William J. Buchanan

In 2019, the majority of companies used at least one cloud computing service and it is expected that by the end of 2021, cloud data centres will process 94% of workloads. The financial and operational advantages of moving IT infrastructure to specialised cloud providers are clearly compelling. However, with such volumes of private and personal data being stored in cloud computing infrastructures, security concerns have risen. Motivated to monitor and analyze adversarial activities, we deploy multiple honeypots on the popular cloud providers, namely Amazon Web Services (AWS), Google Cloud Platform (GCP) and Microsoft Azure, and operate them in multiple regions. Logs were collected over a period of three weeks in May 2020 and then comparatively analysed, evaluated and visualised. Our work revealed heterogeneous attackers’ activity on each cloud provider, both when one considers the volume and origin of attacks, as well as the targeted services and vulnerabilities. Our results highlight the attempt of threat actors to abuse popular services, which were widely used during the COVID-19 pandemic for remote working, such as remote desktop sharing. Furthermore, the attacks seem to exit not only from countries that are commonly found to be the source of attacks, such as China, Russia and the United States, but also from uncommon ones such as Vietnam, India and Venezuela. Our results provide insights on the adversarial activity during our experiments, which can be used to inform the Situational Awareness operations of an organisation.


2020 ◽  
Author(s):  
Fernando Mora-Márquez ◽  
José Luis Vázquez-Poletti ◽  
Unai López de Heredia

AbstractNGScloud was a bioinformatic system developed to perform de novo RNAseq analysis of non-model species by exploiting the cloud computing capabilities of Amazon Web Services. The rapid changes undergone in the way this cloud computing service operates, along with the continuous release of novel bioinformatic applications to analyze next generation sequencing data, have made the software obsolete. NGScloud2 is an enhanced and expanded version of NGScloud that permits the access to ad hoc cloud computing infrastructure, scaled according to the complexity of each experiment. NGScloud2 presents major technical improvements, such as the possibility of running spot instances and the most updated AWS instances types, that can lead to significant cost savings. As compared to its initial implementation, this improved version updates and includes common applications for de novo RNAseq analysis, and incorporates tools to operate workflows of bioinformatic analysis of reference-based RNAseq, RADseq and functional annotation. NGScloud2 optimizes the access to Amazon’s large computing infrastructures to easily run popular bioinformatic software applications, otherwise inaccessible to non-specialized users lacking suitable hardware infrastructures. The correct performance of the pipelines for de novo RNAseq, reference-based RNAseq, RADseq and functional annotation was tested with real experimental data. NGScloud2 code, instructions for software installation and use are available at https://github.com/GGFHF/NGScloud2. NGScloud2 includes a companion package, NGShelper that contains python utilities to post-process the output of the pipelines for downstream analysis at https://github.com/GGFHF/NGShelper.


2019 ◽  
Vol 15 ◽  
pp. 117693431988997
Author(s):  
Polyane Wercelens ◽  
Waldeyr da Silva ◽  
Fernanda Hondo ◽  
Klayton Castro ◽  
Maria Emília Walter ◽  
...  

Scientific workflows can be understood as arrangements of managed activities executed by different processing entities. It is a regular Bioinformatics approach applying workflows to solve problems in Molecular Biology, notably those related to sequence analyses. Due to the nature of the raw data and the in silico environment of Molecular Biology experiments, apart from the research subject, 2 practical and closely related problems have been studied: reproducibility and computational environment. When aiming to enhance the reproducibility of Bioinformatics experiments, various aspects should be considered. The reproducibility requirements comprise the data provenance, which enables the acquisition of knowledge about the trajectory of data over a defined workflow, the settings of the programs, and the entire computational environment. Cloud computing is a booming alternative that can provide this computational environment, hiding technical details, and delivering a more affordable, accessible, and configurable on-demand environment for researchers. Considering this specific scenario, we proposed a solution to improve the reproducibility of Bioinformatics workflows in a cloud computing environment using both Infrastructure as a Service (IaaS) and Not only SQL (NoSQL) database systems. To meet the goal, we have built 3 typical Bioinformatics workflows and ran them on 1 private and 2 public clouds, using different types of NoSQL database systems to persist the provenance data according to the Provenance Data Model (PROV-DM). We present here the results and a guide for the deployment of a cloud environment for Bioinformatics exploring the characteristics of various NoSQL database systems to persist provenance data.


PeerJ ◽  
2021 ◽  
Vol 9 ◽  
pp. e11237
Author(s):  
Fernando Mora-Márquez ◽  
José Luis Vázquez-Poletti ◽  
Unai López de Heredia

Background NGScloud was a bioinformatic system developed to perform de novo RNAseq analysis of non-model species by exploiting the cloud computing capabilities of Amazon Web Services. The rapid changes undergone in the way this cloud computing service operates, along with the continuous release of novel bioinformatic applications to analyze next generation sequencing data, have made the software obsolete. NGScloud2 is an enhanced and expanded version of NGScloud that permits the access to ad hoc cloud computing infrastructure, scaled according to the complexity of each experiment. Methods NGScloud2 presents major technical improvements, such as the possibility of running spot instances and the most updated AWS instances types, that can lead to significant cost savings. As compared to its initial implementation, this improved version updates and includes common applications for de novo RNAseq analysis, and incorporates tools to operate workflows of bioinformatic analysis of reference-based RNAseq, RADseq and functional annotation. NGScloud2 optimizes the access to Amazon’s large computing infrastructures to easily run popular bioinformatic software applications, otherwise inaccessible to non-specialized users lacking suitable hardware infrastructures. Results The correct performance of the pipelines for de novo RNAseq, reference-based RNAseq, RADseq and functional annotation was tested with real experimental data, providing workflow performance estimates and tips to make optimal use of NGScloud2. Further, we provide a qualitative comparison of NGScloud2 vs. the Galaxy framework. NGScloud2 code, instructions for software installation and use are available at https://github.com/GGFHF/NGScloud2. NGScloud2 includes a companion package, NGShelper that contains Python utilities to post-process the output of the pipelines for downstream analysis at https://github.com/GGFHF/NGShelper.


Sign in / Sign up

Export Citation Format

Share Document