scholarly journals BioMiner: Paving the Way for Personalized Medicine

2015 ◽  
Vol 14 ◽  
pp. CIN.S20910 ◽  
Author(s):  
Chris Bauer ◽  
Karol Stec ◽  
Alexander Glintschert ◽  
Kristina Gruden ◽  
Christian Schichor ◽  
...  

Personalized medicine is promising a revolution for medicine and human biology in the 21st century. The scientific foundation for this revolution is accomplished by analyzing biological high-throughput data sets from genomics, transcriptomics, proteomics, and metabolomics. Currently, access to these data has been limited to either rather simple Web-based tools, which do not grant much insight or analysis by trained specialists, without firsthand involvement of the physician. Here, we present the novel Web-based tool “BioMiner,” which was developed within the scope of an international and interdisciplinary project (SYSTHER†) and gives access to a variety of high-throughput data sets. It provides the user with convenient tools to analyze complex cross-omics data sets and grants enhanced visualization abilities. BioMiner incorporates transcriptomic and cross-omics high-throughput data sets, with a focus on cancer. A public instance of BioMiner along with the database is available at http://systherDB.microdiscovery.de/ , login and password: “systher”; a tutorial detailing the usage of BioMiner can be found in the Supplementary File.

2019 ◽  
Author(s):  
Soumita Ghosh ◽  
Abhik Datta ◽  
Hyungwon Choi

AbstractEmerging multi-omics experiments pose new challenges for exploration of quantitative data sets. We present multiSLIDE, a web-based interactive tool for simultaneous heatmap visualization of interconnected molecular features in multi-omics data sets. multiSLIDE operates by keyword search for visualizing biologically connected molecular features, such as genes in pathways and Gene Ontologies, offering convenient functionalities to rearrange, filter, and cluster data sets on a web browser in a real time basis. Various built-in querying mechanisms make it adaptable to diverse omics types, and visualizations are fully customizable. We demonstrate the versatility of the tool through three example studies, each of which showcases its applicability to a wide range of multi-omics data sets, ability to visualize the links between molecules at different granularities of measurement units, and the interface to incorporate inter-molecular relationship from external data sources into the visualization. Online and standalone versions of multiSLIDE are available at https://github.com/soumitag/multiSLIDE.


BMC Genomics ◽  
2020 ◽  
Vol 21 (S10) ◽  
Author(s):  
Tanveer Ahmad ◽  
Nauman Ahmed ◽  
Zaid Al-Ars ◽  
H. Peter Hofstee

Abstract Background Immense improvements in sequencing technologies enable producing large amounts of high throughput and cost effective next-generation sequencing (NGS) data. This data needs to be processed efficiently for further downstream analyses. Computing systems need this large amounts of data closer to the processor (with low latency) for fast and efficient processing. However, existing workflows depend heavily on disk storage and access, to process this data incurs huge disk I/O overheads. Previously, due to the cost, volatility and other physical constraints of DRAM memory, it was not feasible to place large amounts of working data sets in memory. However, recent developments in storage-class memory and non-volatile memory technologies have enabled computing systems to place huge data in memory to process it directly from memory to avoid disk I/O bottlenecks. To exploit the benefits of such memory systems efficiently, proper formatted data placement in memory and its high throughput access is necessary by avoiding (de)-serialization and copy overheads in between processes. For this purpose, we use the newly developed Apache Arrow, a cross-language development framework that provides language-independent columnar in-memory data format for efficient in-memory big data analytics. This allows genomics applications developed in different programming languages to communicate in-memory without having to access disk storage and avoiding (de)-serialization and copy overheads. Implementation We integrate Apache Arrow in-memory based Sequence Alignment/Map (SAM) format and its shared memory objects store library in widely used genomics high throughput data processing applications like BWA-MEM, Picard and GATK to allow in-memory communication between these applications. In addition, this also allows us to exploit the cache locality of tabular data and parallel processing capabilities through shared memory objects. Results Our implementation shows that adopting in-memory SAM representation in genomics high throughput data processing applications results in better system resource utilization, low number of memory accesses due to high cache locality exploitation and parallel scalability due to shared memory objects. Our implementation focuses on the GATK best practices recommended workflows for germline analysis on whole genome sequencing (WGS) and whole exome sequencing (WES) data sets. We compare a number of existing in-memory data placing and sharing techniques like ramDisk and Unix pipes to show how columnar in-memory data representation outperforms both. We achieve a speedup of 4.85x and 4.76x for WGS and WES data, respectively, in overall execution time of variant calling workflows. Similarly, a speedup of 1.45x and 1.27x for these data sets, respectively, is achieved, as compared to the second fastest workflow. In some individual tools, particularly in sorting, duplicates removal and base quality score recalibration the speedup is even more promising. Availability The code and scripts used in our experiments are available in both container and repository form at: https://github.com/abs-tudelft/ArrowSAM.


2013 ◽  
Vol 6 (264) ◽  
pp. rs5-rs5 ◽  
Author(s):  
A. Vinayagam ◽  
Y. Hu ◽  
M. Kulkarni ◽  
C. Roesel ◽  
R. Sopko ◽  
...  

2019 ◽  
Vol 20 (5) ◽  
pp. 1795-1811 ◽  
Author(s):  
Gaye Lightbody ◽  
Valeriia Haberland ◽  
Fiona Browne ◽  
Laura Taggart ◽  
Huiru Zheng ◽  
...  

Abstract There has been an exponential growth in the performance and output of sequencing technologies (omics data) with full genome sequencing now producing gigabases of reads on a daily basis. These data may hold the promise of personalized medicine, leading to routinely available sequencing tests that can guide patient treatment decisions. In the era of high-throughput sequencing (HTS), computational considerations, data governance and clinical translation are the greatest rate-limiting steps. To ensure that the analysis, management and interpretation of such extensive omics data is exploited to its full potential, key factors, including sample sourcing, technology selection and computational expertise and resources, need to be considered, leading to an integrated set of high-performance tools and systems. This article provides an up-to-date overview of the evolution of HTS and the accompanying tools, infrastructure and data management approaches that are emerging in this space, which, if used within in a multidisciplinary context, may ultimately facilitate the development of personalized medicine.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Soumita Ghosh ◽  
Abhik Datta ◽  
Hyungwon Choi

AbstractQuantitative multi-omics data are difficult to interpret and visualize due to large volume of data, complexity among data features, and heterogeneity of information represented by different omics platforms. Here, we present multiSLIDE, a web-based interactive tool for the simultaneous visualization of interconnected molecular features in heatmaps of multi-omics data sets. multiSLIDE visualizes biologically connected molecular features by keyword search of pathways or genes, offering convenient functionalities to query, rearrange, filter, and cluster data on a web browser in real time. Various querying mechanisms make it adaptable to diverse omics types, and visualizations are customizable. We demonstrate the versatility of multiSLIDE through three examples, showcasing its applicability to a wide range of multi-omics data sets, by allowing users to visualize established links between molecules from different omics data, as well as incorporate custom inter-molecular relationship information into the visualization. Online and stand-alone versions of multiSLIDE are available at https://github.com/soumitag/multiSLIDE.


2003 ◽  
Vol 31 (6) ◽  
pp. 1484-1487 ◽  
Author(s):  
P. Kemmeren ◽  
F.C.P. Holstege

Functional annotation of fully sequenced genomes is still a major issue. High-throughput data sets could be used to provide more and better functional annotations. However differences in data quality need to be taken into account. For this purpose these high-throughput data sets need to be integrated so that the data quality can be assessed, hypotheses can be prioritized and existing annotations can be improved and extended.


Sign in / Sign up

Export Citation Format

Share Document