scholarly journals NGScloud2: optimized bioinformatic analysis using Amazon Web Services

PeerJ ◽  
2021 ◽  
Vol 9 ◽  
pp. e11237
Author(s):  
Fernando Mora-Márquez ◽  
José Luis Vázquez-Poletti ◽  
Unai López de Heredia

Background NGScloud was a bioinformatic system developed to perform de novo RNAseq analysis of non-model species by exploiting the cloud computing capabilities of Amazon Web Services. The rapid changes undergone in the way this cloud computing service operates, along with the continuous release of novel bioinformatic applications to analyze next generation sequencing data, have made the software obsolete. NGScloud2 is an enhanced and expanded version of NGScloud that permits the access to ad hoc cloud computing infrastructure, scaled according to the complexity of each experiment. Methods NGScloud2 presents major technical improvements, such as the possibility of running spot instances and the most updated AWS instances types, that can lead to significant cost savings. As compared to its initial implementation, this improved version updates and includes common applications for de novo RNAseq analysis, and incorporates tools to operate workflows of bioinformatic analysis of reference-based RNAseq, RADseq and functional annotation. NGScloud2 optimizes the access to Amazon’s large computing infrastructures to easily run popular bioinformatic software applications, otherwise inaccessible to non-specialized users lacking suitable hardware infrastructures. Results The correct performance of the pipelines for de novo RNAseq, reference-based RNAseq, RADseq and functional annotation was tested with real experimental data, providing workflow performance estimates and tips to make optimal use of NGScloud2. Further, we provide a qualitative comparison of NGScloud2 vs. the Galaxy framework. NGScloud2 code, instructions for software installation and use are available at https://github.com/GGFHF/NGScloud2. NGScloud2 includes a companion package, NGShelper that contains Python utilities to post-process the output of the pipelines for downstream analysis at https://github.com/GGFHF/NGShelper.

2020 ◽  
Author(s):  
Fernando Mora-Márquez ◽  
José Luis Vázquez-Poletti ◽  
Unai López de Heredia

AbstractNGScloud was a bioinformatic system developed to perform de novo RNAseq analysis of non-model species by exploiting the cloud computing capabilities of Amazon Web Services. The rapid changes undergone in the way this cloud computing service operates, along with the continuous release of novel bioinformatic applications to analyze next generation sequencing data, have made the software obsolete. NGScloud2 is an enhanced and expanded version of NGScloud that permits the access to ad hoc cloud computing infrastructure, scaled according to the complexity of each experiment. NGScloud2 presents major technical improvements, such as the possibility of running spot instances and the most updated AWS instances types, that can lead to significant cost savings. As compared to its initial implementation, this improved version updates and includes common applications for de novo RNAseq analysis, and incorporates tools to operate workflows of bioinformatic analysis of reference-based RNAseq, RADseq and functional annotation. NGScloud2 optimizes the access to Amazon’s large computing infrastructures to easily run popular bioinformatic software applications, otherwise inaccessible to non-specialized users lacking suitable hardware infrastructures. The correct performance of the pipelines for de novo RNAseq, reference-based RNAseq, RADseq and functional annotation was tested with real experimental data. NGScloud2 code, instructions for software installation and use are available at https://github.com/GGFHF/NGScloud2. NGScloud2 includes a companion package, NGShelper that contains python utilities to post-process the output of the pipelines for downstream analysis at https://github.com/GGFHF/NGShelper.


2021 ◽  
Vol 22 (S10) ◽  
Author(s):  
Zhenmiao Zhang ◽  
Lu Zhang

Abstract Background Due to the complexity of microbial communities, de novo assembly on next generation sequencing data is commonly unable to produce complete microbial genomes. Metagenome assembly binning becomes an essential step that could group the fragmented contigs into clusters to represent microbial genomes based on contigs’ nucleotide compositions and read depths. These features work well on the long contigs, but are not stable for the short ones. Contigs can be linked by sequence overlap (assembly graph) or by the paired-end reads aligned to them (PE graph), where the linked contigs have high chance to be derived from the same clusters. Results We developed METAMVGL, a multi-view graph-based metagenomic contig binning algorithm by integrating both assembly and PE graphs. It could strikingly rescue the short contigs and correct the binning errors from dead ends. METAMVGL learns the two graphs’ weights automatically and predicts the contig labels in a uniform multi-view label propagation framework. In experiments, we observed METAMVGL made use of significantly more high-confidence edges from the combined graph and linked dead ends to the main graph. It also outperformed many state-of-the-art contig binning algorithms, including MaxBin2, MetaBAT2, MyCC, CONCOCT, SolidBin and GraphBin on the metagenomic sequencing data from simulation, two mock communities and Sharon infant fecal samples. Conclusions Our findings demonstrate METAMVGL outstandingly improves the short contig binning and outperforms the other existing contig binning tools on the metagenomic sequencing data from simulation, mock communities and infant fecal samples.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Lidong Guo ◽  
Mengyang Xu ◽  
Wenchao Wang ◽  
Shengqiang Gu ◽  
Xia Zhao ◽  
...  

Abstract Background Synthetic long reads (SLR) with long-range co-barcoding information are now widely applied in genomics research. Although several tools have been developed for each specific SLR technique, a robust standalone scaffolder with high efficiency is warranted for hybrid genome assembly. Results In this work, we developed a standalone scaffolding tool, SLR-superscaffolder, to link together contigs in draft assemblies using co-barcoding and paired-end read information. Our top-to-bottom scheme first builds a global scaffold graph based on Jaccard Similarity to determine the order and orientation of contigs, and then locally improves the scaffolds with the aid of paired-end information. We also exploited a screening algorithm to reduce the negative effect of misassembled contigs in the input assembly. We applied SLR-superscaffolder to a human single tube long fragment read sequencing dataset and increased the scaffold NG50 of its corresponding draft assembly 1349 fold. Moreover, benchmarking on different input contigs showed that this approach overall outperformed existing SLR scaffolders, providing longer contiguity and fewer misassemblies, especially for short contigs assembled by next-generation sequencing data. The open-source code of SLR-superscaffolder is available at https://github.com/BGI-Qingdao/SLR-superscaffolder. Conclusions SLR-superscaffolder can dramatically improve the contiguity of a draft assembly by integrating a hybrid assembly strategy.


2021 ◽  
Author(s):  
Jet van der Spek ◽  
Joery den Hoed ◽  
Lot Snijders Blok ◽  
Alexander J. M. Dingemans ◽  
Dick Schijven ◽  
...  

Interpretation of next-generation sequencing data of individuals with an apparent sporadic neurodevelopmental disorder (NDD) often focusses on pathogenic variants in genes associated with NDD, assuming full clinical penetrance with limited variable expressivity. Consequently, inherited variants in genes associated with dominant disorders may be overlooked when the transmitting parent is clinically unaffected. While de novo variants explain a substantial proportion of cases with NDDs, a significant number remains undiagnosed possibly explained by coding variants associated with reduced penetrance and variable expressivity. We characterized twenty families with inherited heterozygous missense or protein-truncating variants (PTVs) in CHD3, a gene in which de novo variants cause Snijders Blok-Campeau syndrome, characterized by intellectual disability, speech delay and recognizable facial features (SNIBCPS). Notably, the majority of the inherited CHD3 variants were maternally transmitted. Computational facial and human phenotype ontology-based comparisons demonstrated that the phenotypic features of probands with inherited CHD3 variants overlap with the phenotype previously associated with de novo variants in the gene, while carrier parents are mildly or not affected, suggesting variable expressivity. Additionally, similarly reduced expression levels of CHD3 protein in cells of an affected proband and of related healthy carriers with a CHD3 PTV, suggested that compensation of expression from the wildtype allele is unlikely to be an underlying mechanism. Our results point to a significant role of inherited variation in SNIBCPS, a finding that is critical for correct variant interpretation and genetic counseling and warrants further investigation towards understanding the broader contributions of such variation to the landscape of human disease.


2011 ◽  
Vol 7 (8) ◽  
pp. e1002147 ◽  
Author(s):  
Vincent A. Fusaro ◽  
Prasad Patil ◽  
Erik Gafni ◽  
Dennis P. Wall ◽  
Peter J. Tonellato

Author(s):  
Rizik M. H. Al-Sayyed ◽  
Wadi’ A. Hijawi ◽  
Anwar M. Bashiti ◽  
Ibrahim AlJarah ◽  
Nadim Obeid ◽  
...  

Cloud computing is one of the paradigms that have undertaken to deliver the utility computing concept. It views computing as a utility similar to water and electricity. We aim in this paper to make an investigation of two highly efficacious Cloud platforms: Microsoft Azure (Azure) and Amazon Web Services (AWS) from users’ perspectives the point of view of users. We highlight and compare in depth the features of Azure and AWS from users’ perspectives. The features which we shall focus on include (1) Pricing, (2) Availability, (3) Confidentiality, (4) Secrecy, (5) Tier Account and (6) Service Level Agreement (SLA). The study shows that Azure is more appropriate when considering Pricing and Availability (Error Rate) while AWS is more appropriate when considering Tier account. Our user survey study and its statistical analysis agreed with the arguments made for each of the six comparisons factors.


2020 ◽  
Author(s):  
Diego A. Pérez Montes ◽  
Juan A. Añel ◽  
Javier Rodeiro

<p><strong>CONDE (Climate simulation ON DEmand)</strong> is the final result of our work and research about climate and meteorological simulations over an HPC as a Service (HPCaaS) model. On our architecture we run very large climate ensemble simulations using a, adapted, WRF version that is executed on-demand and that can be deployed over different Cloud Computing environments (like Amazon Web Services, Microsoft Azure or Google Cloud) and that uses BOINC as middleware for the tasks execution and results gathering. Here, we also present as well some basic examples of applications and experiments to verify that the simulations ran in our system are correct and show valid results. </p>


Sign in / Sign up

Export Citation Format

Share Document