Enabling Large-Scale Bioinformatics Data Analysis with Cloud Computing

MS-PyCloud: An open-source, cloud computing-based pipeline for LC-MS/MS data analysis

10.1101/320887 ◽

2018 ◽

Cited By ~ 2

Author(s):

Li Chen ◽

Bai Zhang ◽

Michael Schnaubelt ◽

Punit Shah ◽

Paul Aiyetan ◽

...

Keyword(s):

Cloud Computing ◽

Data Analysis ◽

Open Source ◽

High Performance ◽

Large Scale ◽

Rapid Development ◽

Data File ◽

Data Sets ◽

Proteomics Data ◽

Amazon Web Services

ABSTRACTRapid development and wide adoption of mass spectrometry-based proteomics technologies have empowered scientists to study proteins and their modifications in complex samples on a large scale. This progress has also created unprecedented challenges for individual labs to store, manage and analyze proteomics data, both in the cost for proprietary software and high-performance computing, and the long processing time that discourages on-the-fly changes of data processing settings required in explorative and discovery analysis. We developed an open-source, cloud computing-based pipeline, MS-PyCloud, with graphical user interface (GUI) support, for LC-MS/MS data analysis. The major components of this pipeline include data file integrity validation, MS/MS database search for spectral assignment, false discovery rate estimation, protein inference, determination of protein post-translation modifications, and quantitation of specific (modified) peptides and proteins. To ensure the transparency and reproducibility of data analysis, MS-PyCloud includes open source software tools with comprehensive testing and versioning for spectrum assignments. Leveraging public cloud computing infrastructure via Amazon Web Services (AWS), MS-PyCloud scales seamlessly based on analysis demand to achieve fast and efficient performance. Application of the pipeline to the analysis of large-scale iTRAQ/TMT LC-MS/MS data sets demonstrated the effectiveness and high performance of MS-PyCloud. The software can be downloaded at: https://bitbucket.org/mschnau/ms-pycloud/downloads/

Download Full-text

Cloud Computing for BioLabs

Cloud Technology ◽

10.4018/978-1-4666-6539-2.ch058 ◽

2015 ◽

pp. 1272-1293

Author(s):

Abraham Pouliakis ◽

Aris Spathis ◽

Christine Kottaridi ◽

Antonia Mourtzikou ◽

Marilena Stamouli ◽

...

Keyword(s):

Cloud Computing ◽

Data Analysis ◽

Drug Design ◽

Large Scale ◽

Future Research ◽

New Paradigm ◽

Computing Power ◽

The Everyday ◽

Potential Applications ◽

Computational Resources

Cloud computing has quickly emerged as an exciting new paradigm providing models of computing and services. Via cloud computing technology, bioinformatics tools can be made available as services to anyone, anywhere, and via any device. Large bio-datasets, highly complex algorithms, computing power demanding analysis methods, and the sudden need for hardware and computational resources provide an ideal environment for large-scale bio-data analysis for cloud computing. Cloud computing is already applied in the fields of biology and biochemistry, via numerous paradigms providing novel ideas stimulating future research. The concept of BioCloud has rapidly emerged with applications related to genomics, drug design, biology tools on the cloud, bio-databases, cloud bio-computing, and numerous applications related to biology and biochemistry. In this chapter, the authors present research results related to biology-related laboratories (BioLabs) as well as potential applications for the everyday clinical routine.

Download Full-text

Rainbow: a tool for large-scale whole-genome sequencing data analysis using cloud computing

BMC Genomics ◽

10.1186/1471-2164-14-425 ◽

2013 ◽

Vol 14 (1) ◽

pp. 425 ◽

Cited By ~ 32

Author(s):

Shanrong Zhao ◽

Kurt Prenger ◽

Lance Smith ◽

Thomas Messina ◽

Hongtao Fan ◽

...

Keyword(s):

Cloud Computing ◽

Data Analysis ◽

Whole Genome Sequencing ◽

Genome Sequencing ◽

Large Scale ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Sequencing Data ◽

Sequencing Data Analysis

Download Full-text

A Review of Cloud Computing Bioinformatics Solutions for Next-Gen Sequencing Data Analysis and Research

Methods in Next Generation Sequencing ◽

10.1515/mngs-2015-0003 ◽

2015 ◽

Vol 2 (1) ◽

Cited By ~ 1

Author(s):

Konstantinos Krampis ◽

Claudia Wultsch

Keyword(s):

Cloud Computing ◽

Data Analysis ◽

Large Scale ◽

Storage Capacity ◽

Biological Research ◽

Sequencing Data ◽

Computing Services ◽

Informatics Infrastructure ◽

And Storage ◽

Sequencing Data Analysis

Abstract Research in biology has entered a digital era, where next-generation sequencing instruments generate multiple terabytes of data but are equipped with minimal computational and storage capacity that is not sufficient for large-scale, post-sequencing data analysis. Therefore, scientific value cannot be obtained from investment in a sequencing instrument, unless it is also combined with a significant expense for informatics infrastructure. An alternative option for laboratories is to outsource their informatics infrastructure, by leasing computational cycles and storage capacity from cloud computing services. Development of cloud-based bioinformatics tool suites can provide users with access to pre-configured software and on-demand computing resources for genomic data analysis, while at the same time lower the barrier for working with sequencing datasets, leading to broader adoption of genomic technologies for basic biological research. We conclude that along with the democratization of genome sequencing through the availability of lowcost, bench-top sequencers, cloud computing can in turn democratize access to computational capacity and informatics infrastructures required for sequencing data analysis.

Download Full-text

SAKU: A distributed system for data analysis in large-scale dataset based on cloud computing

2011 Eighth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD) ◽

10.1109/fskd.2011.6019711 ◽

2011 ◽

Cited By ~ 2

Author(s):

Lei Qin ◽

Bin Wu ◽

Qing Ke ◽

Yuxiao Dong

Keyword(s):

Cloud Computing ◽

Data Analysis ◽

Distributed System ◽

Large Scale ◽

Large Scale Dataset

Download Full-text

Using Exploratory Data Analysis and Big Data Analytics for Detecting Anomalies in Cloud Computing

Journal of Natural Sciences and Engineering ◽

10.14706/jonsae2021320 ◽

2021 ◽

Vol 3 (1) ◽

Author(s):

Ibrahim Muzaferija ◽

Zerina Mašetić ◽

Keyword(s):

Cloud Computing ◽

Big Data ◽

Data Analysis ◽

Anomaly Detection ◽

Data Analytics ◽

Large Scale ◽

Exploratory Data Analysis ◽

Big Data Analytics ◽

Detection Methods ◽

Exploratory Data

While leveraging cloud computing for large-scale distributed applications allows seamless scaling, many companies struggle following up with the amount of data generated in terms of efficient processing and anomaly detection, which is a necessary part of the management of modern applications. As the record of user behavior, weblogs surely become the research item related to anomaly detection. Many anomaly detection methods based on automated log analysis have been proposed. However, not in the context of big data applications where anomalous behavior needs to be detected in understanding phases prior to modeling a system for such use. Big Data Analytics often ignores anomalous point due to high volume of data. To address this problem, we propose a complemented methodology for Big Data Analytics – the Exploratory Data Analysis, which assists in gaining insight into data relationships without the classical hypothesis modeling. In that way, we can gain better understanding of the patterns and spot anomalies. Results show that Exploratory Data Analysis facilitates anomaly detection and the CRISP-DM Business Understanding phase, making it one of the key steps in the Data Understanding phase.

Download Full-text

Cloud Computing for BioLabs

Cloud Computing Applications for Quality Health Care Delivery - Advances in Healthcare Information Systems and Administration ◽

10.4018/978-1-4666-6118-9.ch012 ◽

2014 ◽

pp. 228-249 ◽

Cited By ~ 14

Author(s):

Abraham Pouliakis ◽

Aris Spathis ◽

Christine Kottaridi ◽

Antonia Mourtzikou ◽

Marilena Stamouli ◽

...

Keyword(s):

Cloud Computing ◽

Data Analysis ◽

Drug Design ◽

Large Scale ◽

Future Research ◽

New Paradigm ◽

Computing Power ◽

The Everyday ◽

Potential Applications ◽

Computational Resources

Cloud computing has quickly emerged as an exciting new paradigm providing models of computing and services. Via cloud computing technology, bioinformatics tools can be made available as services to anyone, anywhere, and via any device. Large bio-datasets, highly complex algorithms, computing power demanding analysis methods, and the sudden need for hardware and computational resources provide an ideal environment for large-scale bio-data analysis for cloud computing. Cloud computing is already applied in the fields of biology and biochemistry, via numerous paradigms providing novel ideas stimulating future research. The concept of BioCloud has rapidly emerged with applications related to genomics, drug design, biology tools on the cloud, bio-databases, cloud bio-computing, and numerous applications related to biology and biochemistry. In this chapter, the authors present research results related to biology-related laboratories (BioLabs) as well as potential applications for the everyday clinical routine.

Download Full-text

Public cloud computing for seismological research: Calculating large-scale noise cross-correlations using ALIYUN

Earthquake Science ◽

10.29382/eqs-2018-0227-2 ◽

2018 ◽

Vol 31 (5-6) ◽

pp. 227-233

Author(s):

Weitao Wang ◽

◽

Baoshan Wang ◽

Xiufen Zheng ◽

Keyword(s):

Cloud Computing ◽

Large Scale ◽

Public Cloud ◽

Cross Correlations

Download Full-text

A Novel Topology Optimization Theory and Parallel Data Analysis Model based Resource Scheduling Algorithm for Cloud Computing

Recent Advances in Electrical & Electronic Engineering (Formerly Recent Patents on Electrical & Electronic Engineering) ◽

10.2174/2352096511666180213111403 ◽

2018 ◽

Vol 11 (4) ◽

pp. 449-456

Author(s):

Yucheng Zhang ◽

Wenzhun Huang ◽

Ting Zhang ◽

Tuo Zhang

Keyword(s):

Cloud Computing ◽

Topology Optimization ◽

Data Analysis ◽

Scheduling Algorithm ◽

Resource Scheduling ◽

Optimization Theory ◽

Analysis Model ◽

Model Based ◽

Parallel Data

Download Full-text

Integrative Data Analysis from a Unifying Research Synthesis Perspective

10.1093/oso/9780190676001.003.0020 ◽

2018 ◽

Author(s):

Eun-Young Mun ◽

Anne E. Ray

Keyword(s):

Data Analysis ◽

Large Scale ◽

Research Synthesis ◽

Alcohol Intervention ◽

Data Set ◽

Integrative Data Analysis ◽

Level Data ◽

Model Complex ◽

Wide Range ◽

Individual Participant

Integrative data analysis (IDA) is a promising new approach in psychological research and has been well received in the field of alcohol research. This chapter provides a larger unifying research synthesis framework for IDA. Major advantages of IDA of individual participant-level data include better and more flexible ways to examine subgroups, model complex relationships, deal with methodological and clinical heterogeneity, and examine infrequently occurring behaviors. However, between-study heterogeneity in measures, designs, and samples and systematic study-level missing data are significant barriers to IDA and, more broadly, to large-scale research synthesis. Based on the authors’ experience working on the Project INTEGRATE data set, which combined individual participant-level data from 24 independent college brief alcohol intervention studies, it is also recognized that IDA investigations require a wide range of expertise and considerable resources and that some minimum standards for reporting IDA studies may be needed to improve transparency and quality of evidence.

Download Full-text