Linking Phenotypes and Genotypes with Matrix Factorizations (Preprint)
BACKGROUND Background: Phenotype is defined as the composite of an organism’s observable characteristics or traits, such as human’s eye colors, behaviors and disease symptoms. Genotype is the genetic makeup of a cell, an organism, or an individual usually with reference to a specific characteristic under consideration. Thus phenotype can be regarded as the macroscopic description of an organism while genotype is its microscopic expression. OBJECTIVE Objective: Identification of phenotype-genotype associations is the primary step explaining the pathogenesis of human complex diseases. It is also of key importance for the development of Genomic medicine, sometimes also known as personalized medicine, which is a way to customize medical care to an individual body’s unique genetic makeup. METHODS Methods: In this paper, we propose a unified computational framework, called PheGe , to bridge phenotypes and genotypes. PheGe utilizes phenotype similarity network, genotype similarity network and known phenotype-genotype associations to explore the potential associations among other unlinked phenotypes and genotypes. RESULTS Results: As by-products, PheGe can also discover the phenotype and genotype groups, such that the phenotypes or genotypes within the same group are highly correlated with each other. We also validate the effectiveness of PheGe on a real-world data set, where we discover some interesting phenotype-genotype associations and phenotype/genotype groups. CONCLUSIONS Conclusions: Our method can reveal potential phenotype clusters and genotype clusters and their unknown associations through a variety of phenotype similarities, genotype similarities, as well as known phenotype-genotype associations.