Who is this gene and what does it do? A toolkit for munging transcriptomics data in python

Mapping Intimacies ◽

10.1101/299107 ◽

2018 ◽

Cited By ~ 2

Author(s):

Charles K. Fisher ◽

Aaron M. Smith ◽

Jonathan R. Walsh

Keyword(s):

Gene Expression ◽

Gene Ontology ◽

Transcriptional Regulation ◽

Prior Knowledge ◽

Gene Expression Data ◽

Healthy Tissue ◽

Expression Data ◽

Batch Effects ◽

Expression Levels ◽

Transcriptomics Data

AbstractTranscriptional regulation is extremely complicated. Unfortunately, so is working with transcriptional data. Genes can be referred to using a multitude of different identifiers and are assigned to an ever increasing number of categories. Gene expression data may be available in a variety of units (e.g, counts, RPKMs, TPMs). Batch effects dominate signal, but metadata may not be available. Most of the tools are written in R. Here, we introduce a library, genemunge, that makes it easier to work with transcriptional data in python. This includes translating between various types of gene names, accessing Gene Ontology (GO) information, obtaining expression levels of genes in healthy tissue, correcting for batch effects, and using prior knowledge to select sets of genes for further analysis. Code for genemunge is freely available on Github.

Download Full-text

A gene ontology-based microarray gene expression data analysis for diagnosing pseudomyxoma peritonei

2014 IEEE 7th International Workshop on Computational Intelligence and Applications (IWCIA) ◽

10.1109/iwcia.2014.6988083 ◽

2014 ◽

Cited By ~ 1

Author(s):

Hideki Katagiri ◽

Takeshi Uno ◽

Kosuke Kato ◽

Hiroaki Kuwano ◽

Kohei Sasaki ◽

...

Keyword(s):

Gene Expression ◽

Gene Ontology ◽

Data Analysis ◽

Gene Expression Data ◽

Pseudomyxoma Peritonei ◽

Microarray Gene Expression Data ◽

Expression Data ◽

Gene Expression Data Analysis ◽

Microarray Gene Expression ◽

Microarray Gene

Download Full-text

Uncovering Potential Therapeutic Targets in Colorectal Cancer by Deciphering Mutational Status and Expression of Druggable Oncogenes

Cancers ◽

10.3390/cancers11070983 ◽

2019 ◽

Vol 11 (7) ◽

pp. 983 ◽

Cited By ~ 3

Author(s):

Otília Menyhart ◽

Tatsuhiko Kakisaka ◽

Lőrinc Sándor Pongor ◽

Hiroyuki Uetake ◽

Ajay Goel ◽

...

Keyword(s):

Gene Expression ◽

Colorectal Cancer ◽

Gene Expression Data ◽

Drug Targets ◽

Therapeutic Targets ◽

Independent Set ◽

Driver Mutations ◽

Expression Data ◽

Expression Levels ◽

Mutational Status

Background: Numerous driver mutations have been identified in colorectal cancer (CRC), but their relevance to the development of targeted therapies remains elusive. The secondary effects of pathogenic driver mutations on downstream signaling pathways offer a potential approach for the identification of therapeutic targets. We aimed to identify differentially expressed genes as potential drug targets linked to driver mutations. Methods: Somatic mutations and the gene expression data of 582 CRC patients were utilized, incorporating the mutational status of 39,916 and the expression levels of 20,500 genes. To uncover candidate targets, the expression levels of various genes in wild-type and mutant cases for the most frequent disruptive mutations were compared with a Mann–Whitney test. A survival analysis was performed in 2100 patients with transcriptomic gene expression data. Up-regulated genes associated with worse survival were filtered for potentially actionable targets. The most significant hits were validated in an independent set of 171 CRC patients. Results: Altogether, 426 disruptive mutation-associated upregulated genes were identified. Among these, 95 were linked to worse recurrence-free survival (RFS). Based on the druggability filter, 37 potentially actionable targets were revealed. We selected seven genes and validated their expression in 171 patient specimens. The best independently validated combinations were DUSP4 (p = 2.6 × 10−12) in ACVR2A mutated (7.7%) patients; BMP4 (p = 1.6 × 10−04) in SOX9 mutated (8.1%) patients; TRIB2 (p = 1.35 × 10−14) in ACVR2A mutated patients; VSIG4 (p = 2.6 × 10−05) in ANK3 mutated (7.6%) patients, and DUSP4 (p = 7.1 × 10−04) in AMER1 mutated (8.2%) patients. Conclusions: The results uncovered potentially druggable genes in colorectal cancer. The identified mutations could enable future patient stratification for targeted therapy.

Download Full-text

Gene Ontology Analysis of 3D Microarray Gene Expression Data using Hybrid PSO Optimization

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.k1261.0981119 ◽

2019 ◽

Vol 8 (11) ◽

pp. 3890-3896

Keyword(s):

Gene Expression ◽

Gene Ontology ◽

Gene Expression Data ◽

Biological Significance ◽

High Volume ◽

Microarray Gene Expression Data ◽

Expression Data ◽

Data Mining Technique ◽

Microarray Gene Expression ◽

3D Gene

At present, triclustering is the well known data mining technique for analysis of 3D gene expression data (GST). Triclustering is a simultaneously clustering of subset of Gene (G), subset of Sample (S), and over a subset of Time point (T). Triclustering approach identifies a coherent pattern in the 3D gene expression data using Mean Correlation Value (MCV). In this chapter, Hybrid PSO based algorithm is developed for triclustering of 3D gene expression data. This algorithm can effectively find the coherent pattern with high volume of a tricluster. The experimental study is conducted on yeast cycle dataset to study the biological significance of the coherent tricluster using gene ontology tool

Download Full-text

Inferring time-lagged causality using the derivative of single-cell expression

10.1101/2021.02.03.429525 ◽

2021 ◽

Author(s):

Huan-Huan Wei ◽

Hui Lu ◽

Hongyu Zhao

Keyword(s):

Gene Expression ◽

Causal Inference ◽

Single Cell ◽

Causal Relationship ◽

Gene Expression Data ◽

Expression Data ◽

Causal Relationships ◽

Expression Levels ◽

Gene Pairs ◽

Time Lagged

AbstractMany computational methods have been developed for inferring causality among genes using cross-sectional gene expression data, such as single-cell RNA sequencing (scRNA-seq) data. However, due to the limitations of scRNA-seq technologies, time-lagged causal relationships may be missed by existing methods. In this work, we propose a method, called causal inference with time-lagged information (CITL), to infer time-lagged causal relationships from scRNA-seq data by assessing conditional independence between the changing and current expression levels of genes. CITL estimates the changing expression levels of genes by “RNA velocity”. We demonstrate the accuracy and stability of CITL for inferring time-lagged causality on simulation data against other leading approaches. We have applied CITL to real scRNA data and inferred 878 pairs of time-lagged causal relationships, with many of these inferred results supported by the literature.Author summaryComputational causal inference is a promising way to survey causal relationships between genes efficiently. Though many causal inference methods have been applied to gene expression data, none considers the time-lagged causal relationship, which means that some genes may take some time to affect their target genes with several reactions. If relationships between genes are time-lagged, the existing methods’ assumptions will be violated. The relationships will be challenging to recognize. We demonstrate that this is indeed the case through simulation. Therefore, we develop a method for inferring time-lagged causal relationships of single-cell gene expression data. We assume that a time-lagged causal relationship should present a strong association between the cause and the effect’s changing. To calculate such correlation, we first estimate the derivative of gene expression using the information from unspliced transcripts. Then, we use conditional independent tests to search gene pairs satisfying our assumption. Our results suggest that we could accurately infer time-lagged causal gene pairs validated by published literature. This method may complement gene regulatory analysis and provide candidate gene pairs for further controlled experiments.

Download Full-text

Incorporating Gene Ontology Information in Gene Expression Data Clustering Using Multiobjective Evolutionary Optimization: Application in Yeast Cell Cycle Data

Multi-Objective Optimization ◽

10.1007/978-981-13-1471-1_3 ◽

2018 ◽

pp. 55-78

Author(s):

Anirban Mukhopadhyay

Keyword(s):

Gene Expression ◽

Cell Cycle ◽

Gene Ontology ◽

Yeast Cell ◽

Gene Expression Data ◽

Data Clustering ◽

Yeast Cell Cycle ◽

Expression Data ◽

Gene Expression Data Clustering ◽

Gene Ontology Information

Download Full-text

Putative biomarkers for predicting tumor sample purity based on gene expression data

BMC Genomics ◽

10.1186/s12864-019-6412-8 ◽

2019 ◽

Vol 20 (1) ◽

Author(s):

Yuanyuan Li ◽

David M. Umbach ◽

Adrienna Bingham ◽

Qi-Jing Li ◽

Yuan Zhuang ◽

...

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Supervised Machine Learning ◽

Tumor Type ◽

Expression Data ◽

Expression Levels ◽

Gene Set ◽

Tumor Purity ◽

Tumor Types ◽

Cancerous Cells

Abstract Background Tumor purity is the percent of cancer cells present in a sample of tumor tissue. The non-cancerous cells (immune cells, fibroblasts, etc.) have an important role in tumor biology. The ability to determine tumor purity is important to understand the roles of cancerous and non-cancerous cells in a tumor. Methods We applied a supervised machine learning method, XGBoost, to data from 33 TCGA tumor types to predict tumor purity using RNA-seq gene expression data. Results Across the 33 tumor types, the median correlation between observed and predicted tumor-purity ranged from 0.75 to 0.87 with small root mean square errors, suggesting that tumor purity can be accurately predicted υσινγ expression data. We further confirmed that expression levels of a ten-gene set (CSF2RB, RHOH, C1S, CCDC69, CCL22, CYTIP, POU2AF1, FGR, CCL21, and IL7R) were predictive of tumor purity regardless of tumor type. We tested whether our set of ten genes could accurately predict tumor purity of a TCGA-independent data set. We showed that expression levels from our set of ten genes were highly correlated (ρ = 0.88) with the actual observed tumor purity. Conclusions Our analyses suggested that the ten-gene set may serve as a biomarker for tumor purity prediction using gene expression data.

Download Full-text

Integrating Gene Ontology Based Grouping and Ranking into the Machine Learning Algorithm for Gene Expression Data Analysis

10.1007/978-3-030-87101-7_20 ◽

2021 ◽

pp. 205-214

Author(s):

Malik Yousef ◽

Ahmet Sayıcı ◽

Burcu Bakir-Gungor

Keyword(s):

Gene Expression ◽

Machine Learning ◽

Gene Ontology ◽

Data Analysis ◽

Gene Expression Data ◽

Learning Algorithm ◽

Machine Learning Algorithm ◽

Expression Data ◽

Gene Expression Data Analysis

Download Full-text

TNorm: An Unsupervised Batch Effects Correction Method for Gene Expression Data Classification

Neural Information Processing - Lecture Notes in Computer Science ◽

10.1007/978-3-319-26532-2_45 ◽

2015 ◽

pp. 411-420

Author(s):

Praisan Padungweang ◽

Worrawat Engchuan ◽

Jonathan H. Chan

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Data Classification ◽

Correction Method ◽

Expression Data ◽

Batch Effects

Download Full-text

ExAtlas: An interactive online tool for meta-analysis of gene expression data

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720015500195 ◽

2015 ◽

Vol 13 (06) ◽

pp. 1550019 ◽

Cited By ~ 37

Author(s):

Alexei A. Sharov ◽

David Schlessinger ◽

Minoru S. H. Ko

Keyword(s):

Gene Expression ◽

Gene Ontology ◽

Gene Expression Data ◽

Fixed Effects ◽

Expression Profiles ◽

Meta Analysis ◽

Data Sets ◽

Expression Data ◽

Gene Set ◽

Public Data

We have developed ExAtlas, an on-line software tool for meta-analysis and visualization of gene expression data. In contrast to existing software tools, ExAtlas compares multi-component data sets and generates results for all combinations (e.g. all gene expression profiles versus all Gene Ontology annotations). ExAtlas handles both users’ own data and data extracted semi-automatically from the public repository (GEO/NCBI database). ExAtlas provides a variety of tools for meta-analyses: (1) standard meta-analysis (fixed effects, random effects, z-score, and Fisher’s methods); (2) analyses of global correlations between gene expression data sets; (3) gene set enrichment; (4) gene set overlap; (5) gene association by expression profile; (6) gene specificity; and (7) statistical analysis (ANOVA, pairwise comparison, and PCA). ExAtlas produces graphical outputs, including heatmaps, scatter-plots, bar-charts, and three-dimensional images. Some of the most widely used public data sets (e.g. GNF/BioGPS, Gene Ontology, KEGG, GAD phenotypes, BrainScan, ENCODE ChIP-seq, and protein–protein interaction) are pre-loaded and can be used for functional annotations.

Download Full-text

Detecting Essential Proteins Based on Network Topology, Gene Expression Data, and Gene Ontology Information

IEEE/ACM Transactions on Computational Biology and Bioinformatics ◽

10.1109/tcbb.2016.2615931 ◽

2018 ◽

Vol 15 (1) ◽

pp. 109-116 ◽

Cited By ~ 14

Author(s):

Wei Zhang ◽

Jia Xu ◽

Yuanyuan Li ◽

Xiufen Zou

Keyword(s):

Gene Expression ◽

Gene Ontology ◽

Gene Expression Data ◽

Network Topology ◽

Expression Data ◽

Essential Proteins ◽

Gene Ontology Information

Download Full-text