Enabling Large-Scale Bioinformatics Data Analysis with Cloud Computing

Author(s):  
J. Karlsson ◽  
O. Torreno ◽  
Daniel Ramet ◽  
Gunter Klambauer ◽  
M. Cano ◽  
...  
2018 ◽  
Author(s):  
Li Chen ◽  
Bai Zhang ◽  
Michael Schnaubelt ◽  
Punit Shah ◽  
Paul Aiyetan ◽  
...  

ABSTRACTRapid development and wide adoption of mass spectrometry-based proteomics technologies have empowered scientists to study proteins and their modifications in complex samples on a large scale. This progress has also created unprecedented challenges for individual labs to store, manage and analyze proteomics data, both in the cost for proprietary software and high-performance computing, and the long processing time that discourages on-the-fly changes of data processing settings required in explorative and discovery analysis. We developed an open-source, cloud computing-based pipeline, MS-PyCloud, with graphical user interface (GUI) support, for LC-MS/MS data analysis. The major components of this pipeline include data file integrity validation, MS/MS database search for spectral assignment, false discovery rate estimation, protein inference, determination of protein post-translation modifications, and quantitation of specific (modified) peptides and proteins. To ensure the transparency and reproducibility of data analysis, MS-PyCloud includes open source software tools with comprehensive testing and versioning for spectrum assignments. Leveraging public cloud computing infrastructure via Amazon Web Services (AWS), MS-PyCloud scales seamlessly based on analysis demand to achieve fast and efficient performance. Application of the pipeline to the analysis of large-scale iTRAQ/TMT LC-MS/MS data sets demonstrated the effectiveness and high performance of MS-PyCloud. The software can be downloaded at: https://bitbucket.org/mschnau/ms-pycloud/downloads/


2015 ◽  
pp. 1272-1293
Author(s):  
Abraham Pouliakis ◽  
Aris Spathis ◽  
Christine Kottaridi ◽  
Antonia Mourtzikou ◽  
Marilena Stamouli ◽  
...  

Cloud computing has quickly emerged as an exciting new paradigm providing models of computing and services. Via cloud computing technology, bioinformatics tools can be made available as services to anyone, anywhere, and via any device. Large bio-datasets, highly complex algorithms, computing power demanding analysis methods, and the sudden need for hardware and computational resources provide an ideal environment for large-scale bio-data analysis for cloud computing. Cloud computing is already applied in the fields of biology and biochemistry, via numerous paradigms providing novel ideas stimulating future research. The concept of BioCloud has rapidly emerged with applications related to genomics, drug design, biology tools on the cloud, bio-databases, cloud bio-computing, and numerous applications related to biology and biochemistry. In this chapter, the authors present research results related to biology-related laboratories (BioLabs) as well as potential applications for the everyday clinical routine.


BMC Genomics ◽  
2013 ◽  
Vol 14 (1) ◽  
pp. 425 ◽  
Author(s):  
Shanrong Zhao ◽  
Kurt Prenger ◽  
Lance Smith ◽  
Thomas Messina ◽  
Hongtao Fan ◽  
...  

Author(s):  
Konstantinos Krampis ◽  
Claudia Wultsch

Abstract Research in biology has entered a digital era, where next-generation sequencing instruments generate multiple terabytes of data but are equipped with minimal computational and storage capacity that is not sufficient for large-scale, post-sequencing data analysis. Therefore, scientific value cannot be obtained from investment in a sequencing instrument, unless it is also combined with a significant expense for informatics infrastructure. An alternative option for laboratories is to outsource their informatics infrastructure, by leasing computational cycles and storage capacity from cloud computing services. Development of cloud-based bioinformatics tool suites can provide users with access to pre-configured software and on-demand computing resources for genomic data analysis, while at the same time lower the barrier for working with sequencing datasets, leading to broader adoption of genomic technologies for basic biological research. We conclude that along with the democratization of genome sequencing through the availability of lowcost, bench-top sequencers, cloud computing can in turn democratize access to computational capacity and informatics infrastructures required for sequencing data analysis.


2021 ◽  
Vol 3 (1) ◽  
Author(s):  
Ibrahim Muzaferija ◽  
Zerina Mašetić ◽  

While leveraging cloud computing for large-scale distributed applications allows seamless scaling, many companies struggle following up with the amount of data generated in terms of efficient processing and anomaly detection, which is a necessary part of the management of modern applications. As the record of user behavior, weblogs surely become the research item related to anomaly detection. Many anomaly detection methods based on automated log analysis have been proposed. However, not in the context of big data applications where anomalous behavior needs to be detected in understanding phases prior to modeling a system for such use. Big Data Analytics often ignores anomalous point due to high volume of data. To address this problem, we propose a complemented methodology for Big Data Analytics – the Exploratory Data Analysis, which assists in gaining insight into data relationships without the classical hypothesis modeling. In that way, we can gain better understanding of the patterns and spot anomalies. Results show that Exploratory Data Analysis facilitates anomaly detection and the CRISP-DM Business Understanding phase, making it one of the key steps in the Data Understanding phase.


Author(s):  
Abraham Pouliakis ◽  
Aris Spathis ◽  
Christine Kottaridi ◽  
Antonia Mourtzikou ◽  
Marilena Stamouli ◽  
...  

Cloud computing has quickly emerged as an exciting new paradigm providing models of computing and services. Via cloud computing technology, bioinformatics tools can be made available as services to anyone, anywhere, and via any device. Large bio-datasets, highly complex algorithms, computing power demanding analysis methods, and the sudden need for hardware and computational resources provide an ideal environment for large-scale bio-data analysis for cloud computing. Cloud computing is already applied in the fields of biology and biochemistry, via numerous paradigms providing novel ideas stimulating future research. The concept of BioCloud has rapidly emerged with applications related to genomics, drug design, biology tools on the cloud, bio-databases, cloud bio-computing, and numerous applications related to biology and biochemistry. In this chapter, the authors present research results related to biology-related laboratories (BioLabs) as well as potential applications for the everyday clinical routine.


2018 ◽  
Vol 31 (5-6) ◽  
pp. 227-233
Author(s):  
Weitao Wang ◽  
◽  
Baoshan Wang ◽  
Xiufen Zheng ◽  

Author(s):  
Eun-Young Mun ◽  
Anne E. Ray

Integrative data analysis (IDA) is a promising new approach in psychological research and has been well received in the field of alcohol research. This chapter provides a larger unifying research synthesis framework for IDA. Major advantages of IDA of individual participant-level data include better and more flexible ways to examine subgroups, model complex relationships, deal with methodological and clinical heterogeneity, and examine infrequently occurring behaviors. However, between-study heterogeneity in measures, designs, and samples and systematic study-level missing data are significant barriers to IDA and, more broadly, to large-scale research synthesis. Based on the authors’ experience working on the Project INTEGRATE data set, which combined individual participant-level data from 24 independent college brief alcohol intervention studies, it is also recognized that IDA investigations require a wide range of expertise and considerable resources and that some minimum standards for reporting IDA studies may be needed to improve transparency and quality of evidence.


Sign in / Sign up

Export Citation Format

Share Document