A simple analytical formula to compute the residual Mutual Information between pairs of data vectors

Mapping Intimacies ◽

10.1101/041988 ◽

2016 ◽

Author(s):

Jens Kleinjung ◽

Ton C.C. Coolen

Keyword(s):

Mutual Information ◽

Expression Profiles ◽

Analytical Formula ◽

Gene Expression Profiles ◽

Quantitative Measure ◽

Random Number Generation ◽

Input Alignment ◽

Supplementary Material ◽

Or Gene ◽

Simple Analytical Formula

ABSTRACTSummaryThe Mutual Information of pairs of data vectors, for example sequence alignment positions or gene expression profiles, is a quantitative measure of the interdependence between the data. However, data vectors based on a finite number of samples retain non-zero Mutual Information values even for completely random data, which is referred to as background or residual Mutual Information. Estimates of the residual Mutual Information have so far been obtained through heuristic or numerical approximations. Here we introduce a simple analytical formula for the computation of the residual Mutual Information that yields precise values and does not require the joint probabilities between the vector elements as input.Availability and ImplementationA C program arMI is available at http://mathbio.crick.ac.uk/wiki/Software#arMI. Using an input alignment in FASTA format or alternatively an internally created random alignment of specified length and depth, the program computes three types of Mutual information: (i) Shannon’s Mutual Information between all pairs of alignment columns; (ii) the numerical residual Mutual Information by using the same formula on the randomised (shuffled) data; (iii) the analytical residual Mutual Information introduced here. The package depends on the GNU Scientific Library, which is used for vector and matrix operations, factorial expressions and random number generation (Galassi et al., 2009). Reference alignments and result data are included in the program package in the folder ‘tests’. The R environment was used for statistics and plotting (R Core Team, 2014)[email protected] MaterialA detailed derivation of the analytical formula is given in the Supplementary Material.

Download Full-text

ANISEED 2019: 4D exploration of genetic data for an extended range of tunicates

Nucleic Acids Research ◽

10.1093/nar/gkz955 ◽

2019 ◽

Cited By ~ 3

Author(s):

Justine Dardaillon ◽

Delphine Dauga ◽

Paul Simion ◽

Emmanuel Faure ◽

Takeshi A Onuma ◽

...

Keyword(s):

Gene Expression ◽

Expression Profiles ◽

Model Organism ◽

Gene Expression Profiles ◽

Sister Group ◽

Model Organism Database ◽

Oikopleura Dioica ◽

Genetic Features ◽

Taxonomic Range ◽

Or Gene

Abstract ANISEED (https://www.aniseed.cnrs.fr) is the main model organism database for the worldwide community of scientists working on tunicates, the vertebrate sister-group. Information provided for each species includes functionally-annotated gene and transcript models with orthology relationships within tunicates, and with echinoderms, cephalochordates and vertebrates. Beyond genes the system describes other genetic elements, including repeated elements and cis-regulatory modules. Gene expression profiles for several thousand genes are formalized in both wild-type and experimentally-manipulated conditions, using formal anatomical ontologies. These data can be explored through three complementary types of browsers, each offering a different view-point. A developmental browser summarizes the information in a gene- or territory-centric manner. Advanced genomic browsers integrate the genetic features surrounding genes or gene sets within a species. A Genomicus synteny browser explores the conservation of local gene order across deuterostome. This new release covers an extended taxonomic range of 14 species, including for the first time a non-ascidian species, the appendicularian Oikopleura dioica. Functional annotations, provided for each species, were enhanced through a combination of manual curation of gene models and the development of an improved orthology detection pipeline. Finally, gene expression profiles and anatomical territories can be explored in 4D online through the newly developed Morphonet morphogenetic browser.

Download Full-text

Comparison between Pearson Correlation Coefficient and Mutual Information as a Similarity Measure of Gene Expression Profiles

Japanese Journal of Biometrics ◽

10.5691/jjb.33.125 ◽

2013 ◽

Vol 33 (2) ◽

pp. 125-143 ◽

Cited By ~ 3

Author(s):

Daisuke Horyu ◽

Takeshi Hayashi

Keyword(s):

Gene Expression ◽

Mutual Information ◽

Correlation Coefficient ◽

Similarity Measure ◽

Expression Profiles ◽

Pearson Correlation ◽

Gene Expression Profiles ◽

Pearson Correlation Coefficient

Download Full-text

Comparison Analysis of Gene Expression Profiles Proximity Metrics

Symmetry ◽

10.3390/sym13101812 ◽

2021 ◽

Vol 13 (10) ◽

pp. 1812

Author(s):

Sergii Babichev ◽

Lyudmyla Yasinska-Damri ◽

Igor Liakh ◽

Bohdan Durnyak

Keyword(s):

Gene Expression ◽

Mutual Information ◽

Shannon Entropy ◽

Desirability Function ◽

Expression Profiles ◽

Cluster Structure ◽

Gene Expression Profiles ◽

Quality Criterion ◽

Information Maximization ◽

Mutual Information Maximization

The problems of gene regulatory network (GRN) reconstruction and the creation of disease diagnostic effective systems based on genes expression data are some of the current directions of modern bioinformatics. In this manuscript, we present the results of the research focused on the evaluation of the effectiveness of the most used metrics to estimate the gene expression profiles’ proximity, which can be used to extract the groups of informative gene expression profiles while taking into account the states of the investigated samples. Symmetry is very important in the field of both genes’ and/or proteins’ interaction since it undergirds essentially all interactions between molecular components in the GRN and extraction of gene expression profiles, which allows us to identify how the investigated biological objects (disease, state of patients, etc.) contribute to the further reconstruction of GRN in terms of both the symmetry and understanding the mechanism of molecular element interaction in a biological organism. Within the framework of our research, we have investigated the following metrics: Mutual information maximization (MIM) using various methods of Shannon entropy calculation, Pearson’s χ2 test and correlation distance. The accuracy of the investigated samples classification was used as the main quality criterion to evaluate the appropriate metric effectiveness. The random forest classifier (RF) was used during the simulation process. The research results have shown that results of the use of various methods of Shannon entropy within the framework of the MIM metric disagree with each other. As a result, we have proposed the modified mutual information maximization (MMIM) proximity metric based on the joint use of various methods of Shannon entropy calculation and the Harrington desirability function. The results of the simulation have also shown that the correlation proximity metric is less effective in comparison to both the MMIM metric and Pearson’s χ2 test. Finally, we propose the hybrid proximity metric (HPM) that considers both the MMIM metric and Pearson’s χ2 test. The proposed metric was investigated within the framework of one-cluster structure effectiveness evaluation. To our mind, the main benefit of the proposed HPM is in increasing the objectivity of mutually similar gene expression profiles extraction due to the joint use of the various effective proximity metrics that can contradict with each other when they are used alone.

Download Full-text

Comparing migraine with and without aura to healthy controls using RNA sequencing

Cephalalgia ◽

10.1177/0333102419851812 ◽

2019 ◽

Vol 39 (11) ◽

pp. 1435-1444 ◽

Cited By ~ 2

Author(s):

Lisette JA Kogelman ◽

Katrine Falkenberg ◽

Gisli H Halldorsson ◽

Lau U Poulsen ◽

Jacob Worm ◽

...

Keyword(s):

Gene Expression ◽

Migraine With Aura ◽

Migraine Without Aura ◽

Expression Profiles ◽

Venous Blood ◽

Gene Expression Profiles ◽

Differentially Expressed ◽

Healthy Controls ◽

The Difference ◽

Or Gene

Background Migraine mechanisms are *These authors contributed equally to this work. only partly known. Some studies have previously described genes differentially expressed between blood from migraineurs and controls. The objective of this study was to describe gene expression in subtypes of migraine outside of attack and in healthy controls. Methods We extensively phenotyped 17 migraine without aura and nine migraine with aura female patients, and 20 age-matched female controls. Cubital venous blood was RNA sequenced. Genes differentially expressed between migraineurs (migraine without aura and migraine with aura) and controls, and between migraine without aura and migraine with aura were identified using a case-control design. A co-expression network was constructed to investigate the difference between migraineurs and healthy controls at the network level. Results We found two differentially expressed genes: NMNAT2 and RETN. Both were differentially expressed between migraine with aura and controls, but they could not be replicated in an independent cohort. Co-expression network analysis resulted in one cluster of highly interconnected genes that was nominally significantly associated with migraine; however, no pathways or gene ontology terms were detected. Conclusions We showed no clear distinct difference in gene expression profiles of peripheral blood of migraineurs and controls and were not able to replicate findings from previous studies. A larger sample size may be needed to detect minor differences.

Download Full-text

An atlas of the tissue and blood metagenome in cancer reveals novel links between bacteria, viruses and cancer

Microbiome ◽

10.1186/s40168-021-01039-4 ◽

2021 ◽

Vol 9 (1) ◽

Author(s):

Sven Borchmann

Keyword(s):

Massively Parallel Sequencing ◽

Current Knowledge ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Tumor Stage ◽

Genome Project ◽

Host Genome ◽

Cancer Genes ◽

Large Patient ◽

Or Gene

Abstract Background Host tissue infections by bacteria and viruses can cause cancer. Known viral carcinogenic mechanisms are disruption of the host genome via genomic integration and expression of oncogenic viral proteins. An important bacterial carcinogenic mechanism is chronic inflammation. Massively parallel sequencing now routinely generates datasets large enough to contain detectable traces of bacterial and viral nucleic acids of taxa that colonize the examined tissue or are integrated into the host genome. However, this hidden resource has not been comprehensively studied in large patient cohorts. Methods In the present study, 3025 whole genome sequencing datasets and, where available, corresponding RNA-seq datasets are leveraged to gain insight into novel links between viruses, bacteria, and cancer. Datasets were obtained from multiple International Cancer Genome Consortium studies, with additional controls added from the 1000 genome project. A customized pipeline based on KRAKEN was developed and validated to identify bacterial and viral sequences in the datasets. Raw results were stringently filtered to reduce false positives and remove likely contaminants. Results The resulting map confirms known links and expands current knowledge by identifying novel associations. Moreover, the detection of certain bacteria or viruses is associated with profound differences in patient and tumor phenotypes, such as patient age, tumor stage, survival, and somatic mutations in cancer genes or gene expression profiles. Conclusions Overall, these results provide a detailed, unprecedented map of links between viruses, bacteria, and cancer that can serve as a reference for future studies and further experimental validation.

Download Full-text

An atlas of the tissue and blood metagenome in cancer reveals novel links between bacteria, viruses and cancer

10.1101/773200 ◽

2019 ◽

Author(s):

Sven Borchmann

Keyword(s):

Massively Parallel Sequencing ◽

Current Knowledge ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Tumor Stage ◽

Cancer Genes ◽

Large Patient ◽

Or Gene ◽

Novel Associations ◽

Insight Into

ABSTRACTHost tissue infections by bacteria and viruses can cause cancer. Massively parallel sequencing now routinely generates datasets large enough to contain detectable traces of bacterial and viral nucleic acids of taxa that colonize the examined tissue or are integrated into the host genome. However, this hidden resource has not been comprehensively studied in large patient cohorts.In the present study, 3000 whole genome sequencing datasets are leveraged to gain insight into novel links between viruses, bacteria and cancer. The resulting map confirms known links and expands current knowledge by identifying novel associations. Moreover, the detection of certain bacteria or viruses is associated with profound differences in patient and tumor phenotypes, such as patient age, tumor stage, survival, somatic mutations in cancer genes or gene expression profiles.Overall, these results provide a detailed, unprecedented map of links between viruses, bacteria and cancer that can serve as a reference for future studies.

Download Full-text

1327: Gene Expression Profiles in Benign Prostatic Hyperplasia

The Journal of Urology ◽

10.1016/s0022-5347(18)38552-5 ◽

2004 ◽

Vol 171 (4S) ◽

pp. 349-350

Author(s):

Gaelle Fromont ◽

Michel Vidaud ◽

Alain Latil ◽

Guy Vallancien ◽

Pierre Validire ◽

...

Keyword(s):

Gene Expression ◽

Benign Prostatic Hyperplasia ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Prostatic Hyperplasia

Download Full-text

cDNA microarray analysis of gene expression profiles in human placenta: up-regulation of the transcript encoding muscle subunit of glycogen phosphorylase in preeclampsia

Journal of the Society for Gynecologic Investigation ◽

10.1016/s1071-5576(03)00154-0 ◽

2003 ◽

Vol 10 (8) ◽

pp. 496-502 ◽

Cited By ~ 21

Author(s):

S Tsoi

Keyword(s):

Gene Expression ◽

Microarray Analysis ◽

Cdna Microarray ◽

Human Placenta ◽

Glycogen Phosphorylase ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Cdna Microarray Analysis

Download Full-text

Intrinsic Gene Expression Profiles of Gliomas Are a Better Predictor of Survival than Histology

Yearbook of Neurology and Neurosurgery ◽

10.1016/s0513-5117(10)79306-6 ◽

2010 ◽

Vol 2010 ◽

pp. 113-114

Author(s):

J. Uhm

Keyword(s):

Gene Expression ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Intrinsic Gene

Download Full-text

Stromal Cells Derived from Non-Small Cell Lung Cancer and Normal Lung Tissue Display Mesenchymal Stem Cell Characteristics and Differ in Their Gene Expression Profiles and Functional Behaviour

Pneumologie ◽

10.1055/s-0029-1213954 ◽

2009 ◽

Vol 63 (S 01) ◽

Author(s):

S Gottschling ◽

A Jauch ◽

M Granzow ◽

R Kuner ◽

T Muley ◽

...

Keyword(s):

Gene Expression ◽

Lung Cancer ◽

Stem Cell ◽

Mesenchymal Stem Cell ◽

Stromal Cells ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Small Cell ◽

Normal Lung ◽

Small Cell Lung

Download Full-text