Integrating functional genomics data

P. Kemmeren; F.C.P. Holstege

doi:10.1042/bst0311484

Integrating functional genomics data

Biochemical Society Transactions ◽

10.1042/bst0311484 ◽

2003 ◽

Vol 31 (6) ◽

pp. 1484-1487 ◽

Cited By ~ 9

Author(s):

P. Kemmeren ◽

F.C.P. Holstege

Keyword(s):

Data Quality ◽

Functional Genomics ◽

High Throughput ◽

Functional Annotation ◽

Data Sets ◽

Functional Annotations ◽

High Throughput Data

Functional annotation of fully sequenced genomes is still a major issue. High-throughput data sets could be used to provide more and better functional annotations. However differences in data quality need to be taken into account. For this purpose these high-throughput data sets need to be integrated so that the data quality can be assessed, hypotheses can be prioritized and existing annotations can be improved and extended.

Download Full-text

Optimizing performance of GATK workflows using Apache Arrow In-Memory data framework

BMC Genomics ◽

10.1186/s12864-020-07013-y ◽

2020 ◽

Vol 21 (S10) ◽

Author(s):

Tanveer Ahmad ◽

Nauman Ahmed ◽

Zaid Al-Ars ◽

H. Peter Hofstee

Keyword(s):

Data Processing ◽

Shared Memory ◽

High Throughput ◽

Data Representation ◽

Data Sets ◽

Computing Systems ◽

Disk Storage ◽

High Throughput Data ◽

Data Framework ◽

Development Framework

Abstract Background Immense improvements in sequencing technologies enable producing large amounts of high throughput and cost effective next-generation sequencing (NGS) data. This data needs to be processed efficiently for further downstream analyses. Computing systems need this large amounts of data closer to the processor (with low latency) for fast and efficient processing. However, existing workflows depend heavily on disk storage and access, to process this data incurs huge disk I/O overheads. Previously, due to the cost, volatility and other physical constraints of DRAM memory, it was not feasible to place large amounts of working data sets in memory. However, recent developments in storage-class memory and non-volatile memory technologies have enabled computing systems to place huge data in memory to process it directly from memory to avoid disk I/O bottlenecks. To exploit the benefits of such memory systems efficiently, proper formatted data placement in memory and its high throughput access is necessary by avoiding (de)-serialization and copy overheads in between processes. For this purpose, we use the newly developed Apache Arrow, a cross-language development framework that provides language-independent columnar in-memory data format for efficient in-memory big data analytics. This allows genomics applications developed in different programming languages to communicate in-memory without having to access disk storage and avoiding (de)-serialization and copy overheads. Implementation We integrate Apache Arrow in-memory based Sequence Alignment/Map (SAM) format and its shared memory objects store library in widely used genomics high throughput data processing applications like BWA-MEM, Picard and GATK to allow in-memory communication between these applications. In addition, this also allows us to exploit the cache locality of tabular data and parallel processing capabilities through shared memory objects. Results Our implementation shows that adopting in-memory SAM representation in genomics high throughput data processing applications results in better system resource utilization, low number of memory accesses due to high cache locality exploitation and parallel scalability due to shared memory objects. Our implementation focuses on the GATK best practices recommended workflows for germline analysis on whole genome sequencing (WGS) and whole exome sequencing (WES) data sets. We compare a number of existing in-memory data placing and sharing techniques like ramDisk and Unix pipes to show how columnar in-memory data representation outperforms both. We achieve a speedup of 4.85x and 4.76x for WGS and WES data, respectively, in overall execution time of variant calling workflows. Similarly, a speedup of 1.45x and 1.27x for these data sets, respectively, is achieved, as compared to the second fastest workflow. In some individual tools, particularly in sorting, duplicates removal and base quality score recalibration the speedup is even more promising. Availability The code and scripts used in our experiments are available in both container and repository form at: https://github.com/abs-tudelft/ArrowSAM.

Download Full-text

Protein Complex-Based Analysis Framework for High-Throughput Data Sets

Science Signaling ◽

10.1126/scisignal.2003629 ◽

2013 ◽

Vol 6 (264) ◽

pp. rs5-rs5 ◽

Cited By ~ 66

Author(s):

A. Vinayagam ◽

Y. Hu ◽

M. Kulkarni ◽

C. Roesel ◽

R. Sopko ◽

...

Keyword(s):

High Throughput ◽

Protein Complex ◽

Data Sets ◽

Analysis Framework ◽

High Throughput Data

Download Full-text

Introduction to the Development and Validation of Predictive Biomarker Models from High-Throughput Data Sets

Methods in Molecular Biology - Statistical Methods in Molecular Biology ◽

10.1007/978-1-60761-580-4_15 ◽

2009 ◽

pp. 435-470 ◽

Cited By ~ 6

Author(s):

Xutao Deng ◽

Fabien Campagne

Keyword(s):

High Throughput ◽

Predictive Biomarker ◽

Data Sets ◽

High Throughput Data ◽

Development And Validation

Download Full-text

Using Similarity Metrics to Quantify Differences in High-Throughput Data Sets: Application to X-ray Diffraction Patterns

ACS Combinatorial Science ◽

10.1021/acscombsci.6b00142 ◽

2016 ◽

Vol 19 (1) ◽

pp. 25-36 ◽

Cited By ~ 6

Author(s):

Efraín Hernández-Rivera ◽

Shawn P. Coleman ◽

Mark A. Tschopp

Keyword(s):

High Throughput ◽

Similarity Metrics ◽

Data Sets ◽

X Ray Diffraction ◽

X Ray ◽

High Throughput Data ◽

Diffraction Patterns

Download Full-text

BioMiner: Paving the Way for Personalized Medicine

Cancer Informatics ◽

10.4137/cin.s20910 ◽

2015 ◽

Vol 14 ◽

pp. CIN.S20910 ◽

Cited By ~ 5

Author(s):

Chris Bauer ◽

Karol Stec ◽

Alexander Glintschert ◽

Kristina Gruden ◽

Christian Schichor ◽

...

Keyword(s):

Personalized Medicine ◽

High Throughput ◽

Human Biology ◽

Supplementary File ◽

Data Sets ◽

Omics Data ◽

The Novel ◽

Web Based ◽

High Throughput Data ◽

Interdisciplinary Project

Personalized medicine is promising a revolution for medicine and human biology in the 21st century. The scientific foundation for this revolution is accomplished by analyzing biological high-throughput data sets from genomics, transcriptomics, proteomics, and metabolomics. Currently, access to these data has been limited to either rather simple Web-based tools, which do not grant much insight or analysis by trained specialists, without firsthand involvement of the physician. Here, we present the novel Web-based tool “BioMiner,” which was developed within the scope of an international and interdisciplinary project (SYSTHER†) and gives access to a variety of high-throughput data sets. It provides the user with convenient tools to analyze complex cross-omics data sets and grants enhanced visualization abilities. BioMiner incorporates transcriptomic and cross-omics high-throughput data sets, with a focus on cancer. A public instance of BioMiner along with the database is available at http://systherDB.microdiscovery.de/ , login and password: “systher”; a tutorial detailing the usage of BioMiner can be found in the Supplementary File.

Download Full-text

High throughput phenotype screening pipeline for functional genomics in Magnaporthe oryzae

Protocol Exchange ◽

10.1038/nprot.2007.168 ◽

2007 ◽

Cited By ~ 1

Author(s):

Sook-Young Park ◽

Myoung-Hwan Chi ◽

Junhyun Jeon ◽

Yong-Hwan Lee

Keyword(s):

Functional Genomics ◽

High Throughput ◽

Magnaporthe Oryzae ◽

Phenotype Screening

Download Full-text

Faculty Opinions recommendation of Gene knockdown by large circular antisense for high-throughput functional genomics.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.1025996.322881 ◽

2005 ◽

Author(s):

Jonathan Chernoff

Keyword(s):

Functional Genomics ◽

High Throughput ◽

Gene Knockdown

Download Full-text

Faculty Opinions recommendation of Finding disease genes: a fast and flexible approach for analyzing high-throughput data.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.13277014.14635129 ◽

2011 ◽

Author(s):

Alejandro Schaffer

Keyword(s):

High Throughput ◽

Disease Genes ◽

Flexible Approach ◽

High Throughput Data

Download Full-text

Robust and Efficient Parametric Spectral Density Estimation for High-Throughput Data

Technometrics ◽

10.1080/00401706.2021.1884134 ◽

2021 ◽

pp. 1-22

Author(s):

Martin Lysy ◽

Feiyu Zhu ◽

Bryan Yates ◽

Aleksander Labuda

Keyword(s):

Spectral Density ◽

Density Estimation ◽

High Throughput ◽

Spectral Density Estimation ◽

High Throughput Data

Download Full-text

Recent advances in CRISPR/Cas9 and applications for wheat functional genomics and breeding

aBIOTECH ◽

10.1007/s42994-021-00042-5 ◽

2021 ◽

Author(s):

Jun Li ◽

Yan Li ◽

Ligeng Ma

Keyword(s):

Functional Genomics ◽

Genome Editing ◽

Functional Annotation ◽

Genome Engineering ◽

Agronomic Traits ◽

Wheat Breeding ◽

Breeding Programs ◽

Base Editing ◽

The World ◽

World Food Security

AbstractCommon wheat (Triticum aestivum L.) is one of the three major food crops in the world; thus, wheat breeding programs are important for world food security. Characterizing the genes that control important agronomic traits and finding new ways to alter them are necessary to improve wheat breeding. Functional genomics and breeding in polyploid wheat has been greatly accelerated by the advent of several powerful tools, especially CRISPR/Cas9 genome editing technology, which allows multiplex genome engineering. Here, we describe the development of CRISPR/Cas9, which has revolutionized the field of genome editing. In addition, we emphasize technological breakthroughs (e.g., base editing and prime editing) based on CRISPR/Cas9. We also summarize recent applications and advances in the functional annotation and breeding of wheat, and we introduce the production of CRISPR-edited DNA-free wheat. Combined with other achievements, CRISPR and CRISPR-based genome editing will speed progress in wheat biology and promote sustainable agriculture.

Download Full-text