BioMiner: Paving the Way for Personalized Medicine

Cancer Informatics ◽

10.4137/cin.s20910 ◽

2015 ◽

Vol 14 ◽

pp. CIN.S20910 ◽

Cited By ~ 5

Author(s):

Chris Bauer ◽

Karol Stec ◽

Alexander Glintschert ◽

Kristina Gruden ◽

Christian Schichor ◽

...

Keyword(s):

Personalized Medicine ◽

High Throughput ◽

Human Biology ◽

Supplementary File ◽

Data Sets ◽

Omics Data ◽

The Novel ◽

Web Based ◽

High Throughput Data ◽

Interdisciplinary Project

Personalized medicine is promising a revolution for medicine and human biology in the 21st century. The scientific foundation for this revolution is accomplished by analyzing biological high-throughput data sets from genomics, transcriptomics, proteomics, and metabolomics. Currently, access to these data has been limited to either rather simple Web-based tools, which do not grant much insight or analysis by trained specialists, without firsthand involvement of the physician. Here, we present the novel Web-based tool “BioMiner,” which was developed within the scope of an international and interdisciplinary project (SYSTHER†) and gives access to a variety of high-throughput data sets. It provides the user with convenient tools to analyze complex cross-omics data sets and grants enhanced visualization abilities. BioMiner incorporates transcriptomic and cross-omics high-throughput data sets, with a focus on cancer. A public instance of BioMiner along with the database is available at http://systherDB.microdiscovery.de/ , login and password: “systher”; a tutorial detailing the usage of BioMiner can be found in the Supplementary File.

Download Full-text

OmixAnalyzer – A Web-Based System for Management and Analysis of High-Throughput Omics Data Sets

Lecture Notes in Computer Science - Data Integration in the Life Sciences ◽

10.1007/978-3-642-39437-9_4 ◽

2013 ◽

pp. 46-53 ◽

Cited By ~ 1

Author(s):

Thomas Stoltmann ◽

Karin Zimmermann ◽

André Koschmieder ◽

Ulf Leser

Keyword(s):

High Throughput ◽

Data Sets ◽

Omics Data ◽

Web Based ◽

Web Based System

Download Full-text

multiSLIDE: a web server for exploring connected elements of biological pathways in multi-omics data

10.1101/812271 ◽

2019 ◽

Author(s):

Soumita Ghosh ◽

Abhik Datta ◽

Hyungwon Choi

Keyword(s):

Keyword Search ◽

Data Sets ◽

Omics Data ◽

Web Based ◽

Molecular Features ◽

External Data ◽

Cluster Data ◽

Wide Range ◽

Time Basis ◽

Gene Ontologies

AbstractEmerging multi-omics experiments pose new challenges for exploration of quantitative data sets. We present multiSLIDE, a web-based interactive tool for simultaneous heatmap visualization of interconnected molecular features in multi-omics data sets. multiSLIDE operates by keyword search for visualizing biologically connected molecular features, such as genes in pathways and Gene Ontologies, offering convenient functionalities to rearrange, filter, and cluster data sets on a web browser in a real time basis. Various built-in querying mechanisms make it adaptable to diverse omics types, and visualizations are fully customizable. We demonstrate the versatility of the tool through three example studies, each of which showcases its applicability to a wide range of multi-omics data sets, ability to visualize the links between molecules at different granularities of measurement units, and the interface to incorporate inter-molecular relationship from external data sources into the visualization. Online and standalone versions of multiSLIDE are available at https://github.com/soumitag/multiSLIDE.

Download Full-text

Optimizing performance of GATK workflows using Apache Arrow In-Memory data framework

BMC Genomics ◽

10.1186/s12864-020-07013-y ◽

2020 ◽

Vol 21 (S10) ◽

Author(s):

Tanveer Ahmad ◽

Nauman Ahmed ◽

Zaid Al-Ars ◽

H. Peter Hofstee

Keyword(s):

Data Processing ◽

Shared Memory ◽

High Throughput ◽

Data Representation ◽

Data Sets ◽

Computing Systems ◽

Disk Storage ◽

High Throughput Data ◽

Data Framework ◽

Development Framework

Abstract Background Immense improvements in sequencing technologies enable producing large amounts of high throughput and cost effective next-generation sequencing (NGS) data. This data needs to be processed efficiently for further downstream analyses. Computing systems need this large amounts of data closer to the processor (with low latency) for fast and efficient processing. However, existing workflows depend heavily on disk storage and access, to process this data incurs huge disk I/O overheads. Previously, due to the cost, volatility and other physical constraints of DRAM memory, it was not feasible to place large amounts of working data sets in memory. However, recent developments in storage-class memory and non-volatile memory technologies have enabled computing systems to place huge data in memory to process it directly from memory to avoid disk I/O bottlenecks. To exploit the benefits of such memory systems efficiently, proper formatted data placement in memory and its high throughput access is necessary by avoiding (de)-serialization and copy overheads in between processes. For this purpose, we use the newly developed Apache Arrow, a cross-language development framework that provides language-independent columnar in-memory data format for efficient in-memory big data analytics. This allows genomics applications developed in different programming languages to communicate in-memory without having to access disk storage and avoiding (de)-serialization and copy overheads. Implementation We integrate Apache Arrow in-memory based Sequence Alignment/Map (SAM) format and its shared memory objects store library in widely used genomics high throughput data processing applications like BWA-MEM, Picard and GATK to allow in-memory communication between these applications. In addition, this also allows us to exploit the cache locality of tabular data and parallel processing capabilities through shared memory objects. Results Our implementation shows that adopting in-memory SAM representation in genomics high throughput data processing applications results in better system resource utilization, low number of memory accesses due to high cache locality exploitation and parallel scalability due to shared memory objects. Our implementation focuses on the GATK best practices recommended workflows for germline analysis on whole genome sequencing (WGS) and whole exome sequencing (WES) data sets. We compare a number of existing in-memory data placing and sharing techniques like ramDisk and Unix pipes to show how columnar in-memory data representation outperforms both. We achieve a speedup of 4.85x and 4.76x for WGS and WES data, respectively, in overall execution time of variant calling workflows. Similarly, a speedup of 1.45x and 1.27x for these data sets, respectively, is achieved, as compared to the second fastest workflow. In some individual tools, particularly in sorting, duplicates removal and base quality score recalibration the speedup is even more promising. Availability The code and scripts used in our experiments are available in both container and repository form at: https://github.com/abs-tudelft/ArrowSAM.

Download Full-text

Protein Complex-Based Analysis Framework for High-Throughput Data Sets

Science Signaling ◽

10.1126/scisignal.2003629 ◽

2013 ◽

Vol 6 (264) ◽

pp. rs5-rs5 ◽

Cited By ~ 66

Author(s):

A. Vinayagam ◽

Y. Hu ◽

M. Kulkarni ◽

C. Roesel ◽

R. Sopko ◽

...

Keyword(s):

High Throughput ◽

Protein Complex ◽

Data Sets ◽

Analysis Framework ◽

High Throughput Data

Download Full-text

Introduction to the Development and Validation of Predictive Biomarker Models from High-Throughput Data Sets

Methods in Molecular Biology - Statistical Methods in Molecular Biology ◽

10.1007/978-1-60761-580-4_15 ◽

2009 ◽

pp. 435-470 ◽

Cited By ~ 6

Author(s):

Xutao Deng ◽

Fabien Campagne

Keyword(s):

High Throughput ◽

Predictive Biomarker ◽

Data Sets ◽

High Throughput Data ◽

Development And Validation

Download Full-text

Review of applications of high-throughput sequencing in personalized medicine: barriers and facilitators of future progress in research and clinical application

Briefings in Bioinformatics ◽

10.1093/bib/bby051 ◽

2019 ◽

Vol 20 (5) ◽

pp. 1795-1811 ◽

Cited By ~ 19

Author(s):

Gaye Lightbody ◽

Valeriia Haberland ◽

Fiona Browne ◽

Laura Taggart ◽

Huiru Zheng ◽

...

Keyword(s):

Personalized Medicine ◽

High Throughput ◽

High Performance ◽

High Throughput Sequencing ◽

Technology Selection ◽

Patient Treatment ◽

Full Potential ◽

Omics Data ◽

Full Genome Sequencing ◽

Data Governance

Abstract There has been an exponential growth in the performance and output of sequencing technologies (omics data) with full genome sequencing now producing gigabases of reads on a daily basis. These data may hold the promise of personalized medicine, leading to routinely available sequencing tests that can guide patient treatment decisions. In the era of high-throughput sequencing (HTS), computational considerations, data governance and clinical translation are the greatest rate-limiting steps. To ensure that the analysis, management and interpretation of such extensive omics data is exploited to its full potential, key factors, including sample sourcing, technology selection and computational expertise and resources, need to be considered, leading to an integrated set of high-performance tools and systems. This article provides an up-to-date overview of the evolution of HTS and the accompanying tools, infrastructure and data management approaches that are emerging in this space, which, if used within in a multidisciplinary context, may ultimately facilitate the development of personalized medicine.

Download Full-text

multiSLIDE is a web server for exploring connected elements of biological pathways in multi-omics data

Nature Communications ◽

10.1038/s41467-021-22650-x ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Soumita Ghosh ◽

Abhik Datta ◽

Hyungwon Choi

Keyword(s):

Keyword Search ◽

Data Sets ◽

Omics Data ◽

Web Browser ◽

Web Based ◽

Molecular Features ◽

Cluster Data ◽

Wide Range ◽

Or Genes ◽

Simultaneous Visualization

AbstractQuantitative multi-omics data are difficult to interpret and visualize due to large volume of data, complexity among data features, and heterogeneity of information represented by different omics platforms. Here, we present multiSLIDE, a web-based interactive tool for the simultaneous visualization of interconnected molecular features in heatmaps of multi-omics data sets. multiSLIDE visualizes biologically connected molecular features by keyword search of pathways or genes, offering convenient functionalities to query, rearrange, filter, and cluster data on a web browser in real time. Various querying mechanisms make it adaptable to diverse omics types, and visualizations are customizable. We demonstrate the versatility of multiSLIDE through three examples, showcasing its applicability to a wide range of multi-omics data sets, by allowing users to visualize established links between molecules from different omics data, as well as incorporate custom inter-molecular relationship information into the visualization. Online and stand-alone versions of multiSLIDE are available at https://github.com/soumitag/multiSLIDE.

Download Full-text

Enabling Data Analysis on High-Throughput Data in Large Data Depository Using Web-Based Analysis Platform - A Case Study on Integrating QUEST with GenePattern in Epigenetics Research

2009 IEEE International Conference on Bioinformatics and Biomedicine ◽

10.1109/bibm.2009.84 ◽

2009 ◽

Author(s):

Terry Camerlengo ◽

Hatice Gulcin Ozer ◽

Pearlly Yan ◽

Jeffrey Parvin ◽

Tim Huang ◽

...

Keyword(s):

Data Analysis ◽

High Throughput ◽

Large Data ◽

Web Based ◽

High Throughput Data ◽

Analysis Platform

Download Full-text

Using Similarity Metrics to Quantify Differences in High-Throughput Data Sets: Application to X-ray Diffraction Patterns

ACS Combinatorial Science ◽

10.1021/acscombsci.6b00142 ◽

2016 ◽

Vol 19 (1) ◽

pp. 25-36 ◽

Cited By ~ 6

Author(s):

Efraín Hernández-Rivera ◽

Shawn P. Coleman ◽

Mark A. Tschopp

Keyword(s):

High Throughput ◽

Similarity Metrics ◽

Data Sets ◽

X Ray Diffraction ◽

X Ray ◽

High Throughput Data ◽

Diffraction Patterns

Download Full-text

Integrating functional genomics data

Biochemical Society Transactions ◽

10.1042/bst0311484 ◽

2003 ◽

Vol 31 (6) ◽

pp. 1484-1487 ◽

Cited By ~ 9

Author(s):

P. Kemmeren ◽

F.C.P. Holstege

Keyword(s):

Data Quality ◽

Functional Genomics ◽

High Throughput ◽

Functional Annotation ◽

Data Sets ◽

Functional Annotations ◽

High Throughput Data

Functional annotation of fully sequenced genomes is still a major issue. High-throughput data sets could be used to provide more and better functional annotations. However differences in data quality need to be taken into account. For this purpose these high-throughput data sets need to be integrated so that the data quality can be assessed, hypotheses can be prioritized and existing annotations can be improved and extended.

Download Full-text