RNAactDrug: a comprehensive database of RNAs associated with drug sensitivity from multi-omics data

Qun Dong; Feng Li; Yanjun Xu; Jing Xiao; Yingqi Xu; Desi Shang; Chunlong Zhang; Haixiu Yang; Zihan Tian; Kai Mi; Xia Li; Yunpeng Zhang

doi:10.1093/bib/bbz142

RNAactDrug: a comprehensive database of RNAs associated with drug sensitivity from multi-omics data

Briefings in Bioinformatics ◽

10.1093/bib/bbz142 ◽

2019 ◽

Vol 21 (6) ◽

pp. 2167-2174 ◽

Cited By ~ 1

Author(s):

Qun Dong ◽

Feng Li ◽

Yanjun Xu ◽

Jing Xiao ◽

Yingqi Xu ◽

...

Keyword(s):

Large Scale ◽

Drug Sensitivity ◽

Integrated Analysis ◽

Omics Data ◽

Next Generation Sequencing Technology ◽

Rna Molecules ◽

Comprehensive Database ◽

Number Variation ◽

Association Data ◽

Search Facility

Abstract Drug sensitivity has always been at the core of individualized cancer chemotherapy. However, we have been overwhelmed by large-scale pharmacogenomic data in the era of next-generation sequencing technology, which makes it increasingly challenging for researchers, especially those without bioinformatic experience, to perform data integration, exploration and analysis. To bridge this gap, we developed RNAactDrug, a comprehensive database of RNAs associated with drug sensitivity from multi-omics data, which allows users to explore drug sensitivity and RNA molecule associations directly. It provides association data between drug sensitivity and RNA molecules including mRNAs, long non-coding RNAs (lncRNAs) and microRNAs (miRNAs) at four molecular levels (expression, copy number variation, mutation and methylation) from integrated analysis of three large-scale pharmacogenomic databases (GDSC, CellMiner and CCLE). RNAactDrug currently stores more than 4 924 200 associations of RNA molecules and drug sensitivity at four molecular levels covering more than 19 770 mRNAs, 11 119 lncRNAs, 438 miRNAs and 4155 drugs. A user-friendly interface enriched with various browsing sections augmented with advance search facility for querying the database is offered for users retrieving. RNAactDrug provides a comprehensive resource for RNA molecules acting in drug sensitivity, and it could be used to prioritize drug sensitivity–related RNA molecules, further promoting the identification of clinically actionable biomarkers in drug sensitivity and drug development more cost-efficiently by making this knowledge accessible to both basic researchers and clinical practitioners. Database URL: http://bio-bigdata.hrbmu.edu.cn/RNAactDrug.

Download Full-text

Integrated Analysis of Large-Scale Omics Data Revealed Relationship Between Tissue Specificity and Evolutionary Dynamics of Small RNAs in Maize (Zea mays)

Frontiers in Genetics ◽

10.3389/fgene.2020.00051 ◽

2020 ◽

Vol 11 ◽

Cited By ~ 1

Author(s):

Yu Xu ◽

Ting Zhang ◽

Yuchen Li ◽

Zhenyan Miao

Keyword(s):

Zea Mays ◽

Small Rnas ◽

Large Scale ◽

Tissue Specificity ◽

Evolutionary Dynamics ◽

Integrated Analysis ◽

Omics Data

Download Full-text

Integration of enzyme constraints in a genome-scale metabolic model of Aspergillus niger improves phenotype predictions

Microbial Cell Factories ◽

10.1186/s12934-021-01614-2 ◽

2021 ◽

Vol 20 (1) ◽

Author(s):

Jingru Zhou ◽

Yingping Zhuang ◽

Jianye Xia

Keyword(s):

Aspergillus Niger ◽

Large Scale ◽

Measurement Techniques ◽

Metabolic Model ◽

System Level ◽

Metabolic Phenotype ◽

Omics Data ◽

Prediction Ability ◽

Phenotype Prediction ◽

Genome Scale

Abstract Background Genome-scale metabolic model (GSMM) is a powerful tool for the study of cellular metabolic characteristics. With the development of multi-omics measurement techniques in recent years, new methods that integrating multi-omics data into the GSMM show promising effects on the predicted results. It does not only improve the accuracy of phenotype prediction but also enhances the reliability of the model for simulating complex biochemical phenomena, which can promote theoretical breakthroughs for specific gene target identification or better understanding the cell metabolism on the system level. Results Based on the basic GSMM model iHL1210 of Aspergillus niger, we integrated large-scale enzyme kinetics and proteomics data to establish a GSMM based on enzyme constraints, termed a GEM with Enzymatic Constraints using Kinetic and Omics data (GECKO). The results show that enzyme constraints effectively improve the model’s phenotype prediction ability, and extended the model’s potential to guide target gene identification through predicting metabolic phenotype changes of A. niger by simulating gene knockout. In addition, enzyme constraints significantly reduced the solution space of the model, i.e., flux variability over 40.10% metabolic reactions were significantly reduced. The new model showed also versatility in other aspects, like estimating large-scale $$k_{{cat}}$$ k cat values, predicting the differential expression of enzymes under different growth conditions. Conclusions This study shows that incorporating enzymes’ abundance information into GSMM is very effective for improving model performance with A. niger. Enzyme-constrained model can be used as a powerful tool for predicting the metabolic phenotype of A. niger by incorporating proteome data. In the foreseeable future, with the fast development of measurement techniques, and more precise and rich proteomics quantitative data being obtained for A. niger, the enzyme-constrained GSMM model will show greater application space on the system level.

Download Full-text

An integrated life cycle and water footprint assessment of nonfood crops based bioenergy production

Scientific Reports ◽

10.1038/s41598-021-83061-y ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Jun Li ◽

Fengyin Xiong ◽

Zhuo Chen

Keyword(s):

Life Cycle ◽

Environmental Sustainability ◽

Large Scale ◽

Water Footprint ◽

Arable Land ◽

Integrated Analysis ◽

Actual Performance ◽

Bioenergy Production ◽

Trade Offs ◽

Hybrid Pennisetum

AbstractBiomass gasification, especially distribution to power generation, is considered as a promising way to tackle global energy and environmental challenges. However, previous researches on integrated analysis of the greenhouse gases (GHG) abatement potentials associated with biomass electrification are sparse and few have taken the freshwater utilization into account within a coherent framework, though both energy and water scarcity are lying in the central concerns in China’s environmental policy. This study employs a Life cycle assessment (LCA) model to analyse the actual performance combined with water footprint (WF) assessment methods. The inextricable trade-offs between three representative energy-producing technologies are explored based on three categories of non-food crops (maize, sorghum and hybrid pennisetum) cultivated in marginal arable land. WF results demonstrate that the Hybrid pennisetum system has the largest impact on the water resources whereas the other two technology options exhibit the characteristics of environmental sustainability. The large variances in contribution ratio between the four sub-processes in terms of total impacts are reflected by the LCA results. The Anaerobic Digestion process is found to be the main contributor whereas the Digestate management process is shown to be able to effectively mitigate the negative environmental impacts with an absolute share. Sensitivity analysis is implemented to detect the impacts of loss ratios variation, as silage mass and methane, on final results. The methane loss has the largest influence on the Hybrid pennisetum system, followed by the Maize system. Above all, the Sorghum system demonstrates the best performance amongst the considered assessment categories. Our study builds a pilot reference for further driving large-scale project of bioenergy production and conversion. The synergy of combined WF-LCA method allows us to conduct a comprehensive assessment and to provide insights into environmental and resource management.

Download Full-text

Findings from the Section on Bioinformatics and Translational Informatics

Yearbook of Medical Informatics ◽

10.15265/iy-2016-050 ◽

2017 ◽

Vol 26 (01) ◽

pp. 188-192 ◽

Cited By ~ 1

Author(s):

H. Dauchel ◽

T. Lecroq

Keyword(s):

Intelligent Systems ◽

Large Scale ◽

Cancer Genomics ◽

Clinical Care ◽

Evaluation Process ◽

Omics Data ◽

Health Domain ◽

Medical Genomics ◽

Research Activities ◽

Translational Informatics

Summary Objective: To summarize excellent current research and propose a selection of best papers published in 2016 in the field of Bioinformatics and Translational Informatics with applications in the health domain and clinical care. Methods: We provide a synopsis of the articles selected for the IMIA Yearbook 2017, from which we attempt to derive a synthetic overview of current and future activities in the field. As in 2016, a first step of selection was performed by querying MEDLINE with a list of MeSH descriptors completed by a list of terms adapted to the section coverage. Each section editor evaluated separately the set of 951 articles returned and evaluation results were merged for retaining 15 candidate best papers for peer-review. Results: The selection and evaluation process of papers published in the Bioinformatics and Translational Informatics field yielded four excellent articles focusing this year on the secondary use and massive integration of multi-omics data for cancer genomics and non-cancer complex diseases. Papers present methods to study the functional impact of genetic variations, either at the level of the transcription or at the levels of pathway and network. Conclusions: Current research activities in Bioinformatics and Translational Informatics with applications in the health domain continue to explore new algorithms and statistical models to manage, integrate, and interpret large-scale genomic datasets. As addressed by some of the selected papers, future trends would include the question of the international collaborative sharing of clinical and omics data, and the implementation of intelligent systems to enhance routine medical genomics.

Download Full-text

Rapid evolution at the Drosophila telomere: transposable element dynamics at an intrinsically unstable locus

Genetics ◽

10.1093/genetics/iyaa027 ◽

2020 ◽

Vol 217 (2) ◽

Author(s):

Michael P McGurk ◽

Anne-Marie Dion-Côté ◽

Daniel A Barbash

Keyword(s):

Copy Number ◽

Large Scale ◽

Insertion Site ◽

Rapid Evolution ◽

Interspecific Variation ◽

High Rate ◽

Evolutionary Strategy ◽

Control Mechanisms ◽

Genetic Conflict ◽

Number Variation

AbstractDrosophila telomeres have been maintained by three families of active transposable elements (TEs), HeT-A, TAHRE, and TART, collectively referred to as HTTs, for tens of millions of years, which contrasts with an unusually high degree of HTT interspecific variation. While the impacts of conflict and domestication are often invoked to explain HTT variation, the telomeres are unstable structures such that neutral mutational processes and evolutionary tradeoffs may also drive HTT evolution. We leveraged population genomic data to analyze nearly 10,000 HTT insertions in 85 Drosophila melanogaster genomes and compared their variation to other more typical TE families. We observe that occasional large-scale copy number expansions of both HTTs and other TE families occur, highlighting that the HTTs are, like their feral cousins, typically repressed but primed to take over given the opportunity. However, large expansions of HTTs are not caused by the runaway activity of any particular HTT subfamilies or even associated with telomere-specific TE activity, as might be expected if HTTs are in strong genetic conflict with their hosts. Rather than conflict, we instead suggest that distinctive aspects of HTT copy number variation and sequence diversity largely reflect telomere instability, with HTT insertions being lost at much higher rates than other TEs elsewhere in the genome. We extend previous observations that telomere deletions occur at a high rate, and surprisingly discover that more than one-third do not appear to have been healed with an HTT insertion. We also report that some HTT families may be preferentially activated by the erosion of whole telomeres, implying the existence of HTT-specific host control mechanisms. We further suggest that the persistent telomere localization of HTTs may reflect a highly successful evolutionary strategy that trades away a stable insertion site in order to have reduced impact on the host genome. We propose that HTT evolution is driven by multiple processes, with niche specialization and telomere instability being previously underappreciated and likely predominant.

Download Full-text

Integrated Analysis of Drug Sensitivity and Selectivity to Predict Synergistic Drug Combinations and Target Coaddictions in Cancer

Methods in Molecular Biology - Systems Chemical Biology ◽

10.1007/978-1-4939-8891-4_12 ◽

2018 ◽

pp. 205-217 ◽

Cited By ~ 3

Author(s):

Alok Jaiswal ◽

Bhagwan Yadav ◽

Krister Wennerberg ◽

Tero Aittokallio

Keyword(s):

Drug Sensitivity ◽

Drug Combinations ◽

Integrated Analysis ◽

Sensitivity And Selectivity ◽

Synergistic Drug Combinations

Download Full-text

Learning Cancer Drug Sensitivities in Large-Scale Screens from Multi-omics Data with Local Low-Rank Structure

Computational Intelligence Methods for Bioinformatics and Biostatistics - Lecture Notes in Computer Science ◽

10.1007/978-3-030-63061-4_7 ◽

2020 ◽

pp. 67-79

Author(s):

The Tien Mai ◽

Leiv Rønneberg ◽

Zhi Zhao ◽

Manuela Zucknick ◽

Jukka Corander

Keyword(s):

Large Scale ◽

Low Rank ◽

Cancer Drug ◽

Omics Data

Download Full-text

Unsupervised integration of single-cell multi-omics datasets with disparities in cell-type representation

10.1101/2021.11.09.467903 ◽

2021 ◽

Author(s):

Pinar Demetci ◽

Rebecca Santorella ◽

Bjorn Sandstede ◽

Ritambhara Singh

Keyword(s):

Single Cell ◽

Optimal Transport ◽

Integrated Analysis ◽

Omics Data ◽

Cell Alignment ◽

Cell Type ◽

Type Representation ◽

Cellular Processes ◽

Self Tuning ◽

Unsupervised Algorithms

Integrated analysis of multi-omics data allows the study of how different molecular views in the genome interact to regulate cellular processes; however, with a few exceptions, applying multiple sequencing assays on the same single cell is not possible. While recent unsupervised algorithms align single-cell multi-omic datasets, these methods have been primarily benchmarked on co-assay experiments rather than the more common single-cell experiments taken from separately sampled cell populations. Therefore, most existing methods perform subpar alignments on such datasets. Here, we improve our previous work Single Cell alignment using Optimal Transport (SCOT) by using unbalanced optimal transport to handle disproportionate cell-type representation and differing sample sizes across single-cell measurements. We show that our proposed method, SCOTv2, consistently yields quality alignments on five real-world single-cell datasets with varying cell-type proportions and is computationally tractable. Additionally, we extend SCOTv2 to integrate multiple ($M\geq2$) single-cell measurements and present a self-tuning heuristic process to select hyperparameters in the absence of any orthogonal correspondence information.

Download Full-text

Single-cell RNA counting at allele- and isoform-resolution using Smart-seq3

10.1101/817924 ◽

2019 ◽

Cited By ~ 6

Author(s):

Michael Hagemann-Jensen ◽

Christoph Ziegenhain ◽

Ping Chen ◽

Daniel Ramsköld ◽

Gert-Jan Hendriks ◽

...

Keyword(s):

Single Cell ◽

Large Scale ◽

Cell Types ◽

Mouse Strains ◽

Rna Molecules ◽

Counting Strategy ◽

Long Read ◽

Sequencing Strategy ◽

Transcriptome Coverage ◽

Scale Characterization

AbstractLarge-scale sequencing of RNAs from individual cells can reveal patterns of gene, isoform and allelic expression across cell types and states1. However, current single-cell RNA-sequencing (scRNA-seq) methods have limited ability to count RNAs at allele- and isoform resolution, and long-read sequencing techniques lack the depth required for large-scale applications across cells2,3. Here, we introduce Smart-seq3 that combines full-length transcriptome coverage with a 5’ unique molecular identifier (UMI) RNA counting strategy that enabled in silico reconstruction of thousands of RNA molecules per cell. Importantly, a large portion of counted and reconstructed RNA molecules could be directly assigned to specific isoforms and allelic origin, and we identified significant transcript isoform regulation in mouse strains and human cell types. Moreover, Smart-seq3 showed a dramatic increase in sensitivity and typically detected thousands more genes per cell than Smart-seq2. Altogether, we developed a short-read sequencing strategy for single-cell RNA counting at isoform and allele-resolution applicable to large-scale characterization of cell types and states across tissues and organisms.

Download Full-text

Accuracy and efficiency of germline variant calling pipelines for human genome data

10.1101/2020.03.27.011767 ◽

2020 ◽

Cited By ~ 1

Author(s):

Sen Zhao ◽

Oleg Agafonov ◽

Abdulrahman Azab ◽

Tomasz Stokowy ◽

Eivind Hovig

Keyword(s):

Large Scale ◽

Variant Calling ◽

Performance Comparison ◽

Next Generation Sequencing Technology ◽

Genome Data ◽

Germline Variant ◽

Causal Variants ◽

Variant Detection ◽

Variant Analysis ◽

Human Genome Data

AbstractAdvances in next-generation sequencing technology has enabled whole genome sequencing (WGS) to be widely used for identification of causal variants in a spectrum of genetic-related disorders, and provided new insight into how genetic polymorphisms affect disease phenotypes. The development of different bioinformatics pipelines has continuously improved the variant analysis of WGS data, however there is a necessity for a systematic performance comparison of these pipelines to provide guidance on the application of WGS-based scientific and clinical genomics. In this study, we evaluated the performance of three variant calling pipelines (GATK, DRAGEN™ and DeepVariant) using Genome in a Bottle Consortium, “synthetic-diploid” and simulated WGS datasets. DRAGEN™ and DeepVariant show a better accuracy in SNPs and indels calling, with no significant differences in their F1-score. DRAGEN™ platform offers accuracy, flexibility and a highly-efficient running speed, and therefore superior advantage in the analysis of WGS data on a large scale. The combination of DRAGEN™ and DeepVariant also provides a good balance of accuracy and efficiency as an alternative solution for germline variant detection in further applications. Our results facilitate the standardization of benchmarking analysis of bioinformatics pipelines for reliable variant detection, which is critical in genetics-based medical research and clinical application.

Download Full-text