scholarly journals Scalable Phylogenetic Profiling using MinHash Uncovers Likely Eukaryotic Sexual Reproduction Genes

2019 ◽  
Author(s):  
David Moi ◽  
Laurent Kilchoer ◽  
Pablo S. Aguilar ◽  
Christophe Dessimoz

AbstractPhylogenetic profiling is a computational method to predict genes involved in the same biological process by identifying protein families which tend to be jointly lost or retained across the tree of life. Phylogenetic profiling has customarily been more widely used with prokaryotes than eukaryotes, because the method is thought to require many diverse genomes. There are now many eukaryotic genomes available, but these are considerably larger, and typical phylogenetic profiling methods require quadratic time or worse in the number of genes. We introduce a fast, scalable phylogenetic profiling approach entitled HogProf, which leverages hierarchical orthologous groups for the construction of large profiles and locality-sensitive hashing for efficient retrieval of similar profiles. We show that the approach outperforms Enhanced Phylogenetic Tree, a phylogeny-based method, and use the tool to reconstruct networks and query for interactors of the kinetochore complex as well as conserved proteins involved in sexual reproduction: Hap2, Spo11 and Gex1. HogProf enables large-scale phylogenetic profiling across the three domains of life, and will be useful to predict biological pathways among the hundreds of thousands of eukaryotic species that will become available in the coming few years. HogProf is available at https://github.com/DessimozLab/HogProf.

2020 ◽  
Vol 63 (6) ◽  
pp. 537-540
Author(s):  
Fee O.H. Smulders ◽  
Kelcie L. Chiquillo ◽  
Demian A. Willette ◽  
Paul H. Barber ◽  
Marjolijn J.A. Christianen

AbstractThe dioecious seagrass species Halophila stipulacea reproduces mainly through fast clonal growth, underlying its invasive behavior. Here, we provide morphological evidence to show that the first findings of fruits in the Caribbean were misidentified. Consequently, H. stipulacea reproduction is likely still only asexual in the Caribbean. Therefore, we introduce an identification key of H. stipulacea reproductive structures to encourage careful identification and quantification throughout its invasive range. Until large-scale seed production in invaded habitats is reported, the apparent low rate of sexual reproduction needs to be considered in current studies investigating the invasion capacity of this species.


2007 ◽  
Vol 283 (3) ◽  
pp. 1229-1233 ◽  
Author(s):  
Claudia Ben-Dov ◽  
Britta Hartmann ◽  
Josefin Lundgren ◽  
Juan Valcárcel

Alternative splicing of mRNA precursors allows the synthesis of multiple mRNAs from a single primary transcript, significantly expanding the information content and regulatory possibilities of higher eukaryotic genomes. High-throughput enabling technologies, particularly large-scale sequencing and splicing-sensitive microarrays, are providing unprecedented opportunities to address key questions in this field. The picture emerging from these pioneering studies is that alternative splicing affects most human genes and a significant fraction of the genes in other multicellular organisms, with the potential to greatly influence the evolution of complex genomes. A combinatorial code of regulatory signals and factors can deploy physiologically coherent programs of alternative splicing that are distinct from those regulated at other steps of gene expression. Pre-mRNA splicing and its regulation play important roles in human pathologies, and genome-wide analyses in this area are paving the way for improved diagnostic tools and for the identification of novel and more specific pharmaceutical targets.


2008 ◽  
Vol 105 (46) ◽  
pp. 17700-17705 ◽  
Author(s):  
Richard Llewellyn ◽  
David S. Eisenberg

As genome sequencing outstrips the rate of high-quality, low-throughput biochemical and genetic experimentation, accurate annotation of protein function becomes a bottleneck in the progress of the biomolecular sciences. Most gene products are now annotated by homology, in which an experimentally determined function is applied to a similar sequence. This procedure becomes error-prone between more divergent sequences and can contaminate biomolecular databases. Here, we propose a computational method of assignment of function, termed Generalized Functional Linkages (GFL), that combines nonhomology-based methods with other types of data. Functional linkages describe pairwise relationships between proteins that work together to perform a biological task. GFL provides a Bayesian framework that improves annotation by arbitrating a competition among biological process annotations to best describe the target protein. GFL addresses the unequal strengths of functional linkages among proteins, the quality of existing annotations, and the similarity among them while incorporating available knowledge about the cellular location or individual molecular function of the target protein. We demonstrate GFL with functional linkages defined by an algorithm known as zorch that quantifies connectivity in protein–protein interaction networks. Even when using proteins linked only by indirect or high-throughput interactions, GFL predicts the biological processes of many proteins in Saccharomyces cerevisiae, improving the accuracy of annotation by 20% over majority voting.


2018 ◽  
Author(s):  
Valerie Wood ◽  
Antonia Lock ◽  
Midori A. Harris ◽  
Kim Rutherford ◽  
Jürg Bähler ◽  
...  

AbstractThe first decade of genome sequencing stimulated an explosion in the characterization of unknown proteins. More recently, the pace of functional discovery has slowed, leaving around 20% of the proteins even in well-studied model organisms without informative descriptions of their biological roles. Remarkably, many uncharacterized proteins are conserved from yeasts to human, suggesting that they contribute to fundamental biological processes. To fully understand biological systems in health and disease, we need to account for every part of the system. Unstudied proteins thus represent a collective blind spot that limits the progress of both basic and applied biosciences.We use a simple yet powerful metric based on Gene Ontology (GO) biological process terms to define characterized and uncharacterized proteins for human, budding yeast, and fission yeast. We then identify a set of conserved but unstudied proteins in S. pombe, and classify them based on a combination of orthogonal attributes determined by large-scale experimental and comparative methods. Finally, we explore possible reasons why these proteins remain neglected, and propose courses of action to raise their profile and thereby reap the benefits of completing the catalog of proteins’ biological roles.


EcoSal Plus ◽  
2021 ◽  
Author(s):  
Nicholas Backes ◽  
Gregory J. Phillips

Over the last decade, the study of CRISPR-Cas systems has progressed from a newly discovered bacterial defense mechanism to a diverse suite of genetic tools that have been applied across all domains of life. While the initial applications of CRISPR-Cas technology fulfilled a need to more precisely edit eukaryotic genomes, creative “repurposing” of this adaptive immune system has led to new approaches for genetic analysis of microorganisms, including improved gene editing, conditional gene regulation, plasmid curing and manipulation, and other novel uses.


2021 ◽  
pp. jmedgenet-2021-108193
Author(s):  
Ido Shalev ◽  
Judith Somekh ◽  
Alal Eran

BackgroundLoss of tectonin β-propeller repeat-containing 2 (TECPR2) function has been implicated in an array of neurodegenerative disorders, yet its physiological function remains largely unknown. Understanding TECPR2 function is essential for developing much needed precision therapeutics for TECPR2-related diseases.MethodsWe leveraged considerable amounts of functional data to obtain a comprehensive perspective of the role of TECPR2 in health and disease. We integrated expression patterns, population variation, phylogenetic profiling, protein-protein interactions and regulatory network data for a minimally biased multimodal functional analysis. Genes and proteins linked to TECPR2 via multiple lines of evidence were subject to functional enrichment analyses to identify molecular mechanisms involving TECPR2.ResultsTECPR2 was found to be part of a tight neurodevelopmental gene expression programme that includes KIF1A, ATXN1, TOM1L2 and FA2H, all implicated in neurological diseases. Functional enrichment analyses of TECPR2-related genes converged on a role in late autophagy and ribosomal processes. Large-scale population variation data demonstrated that this role is non-redundant.ConclusionsTECPR2 might serve as an indicator for the energy balance between protein synthesis and autophagy, and a marker for diseases associated with their imbalance, such as Alzheimer’s disease and Huntington’s disease. Specifically, we speculate that TECPR2 plays an important role as a proteostasis regulator during synaptogenesis, highlighting its importance in developing neurons. By advancing our understanding of TECPR2 function, this work provides an essential stepping stone towards the development of precision diagnostics and targeted treatment options for TECPR2-related disorders.


Genetics ◽  
1992 ◽  
Vol 132 (4) ◽  
pp. 1195-1198 ◽  
Author(s):  
D B Goldstein

Abstract The life cycle of eukaryotic, sexual species is divided into haploid and diploid phases. In multicellular animals and seed plants, the diploid phase is dominant, and the haploid phase is reduced to one, or a very few cells, which are dependent on the diploid form. In other eukaryotic species, however, the haploid phase may dominate or the phases may be equally developed. Even though an alternation between haploid and diploid forms is fundamental to sexual reproduction in eukaryotes, relatively little is known about the evolutionary forces that influence the dominance of haploidy or diploidy. An obvious genetic factor that might result in selection for a dominant diploid phase is heterozygote advantage, since only the diploid phase can be heterozygous. In this paper, I analyze a model designed to determine whether heterozygote advantage could lead to the evolution of a dominant diploid phase. The main result is that heterozygote advantage can lead to an increase in the dominance of the diploid phase, but only if the diploid phase is already sufficiently dominant. Because the diploid phase is unlikely to be increased in organisms that are primarily haploid, I conclude that heterozygote advantage is not a sufficient explanation of the dominance of the diploid phase in higher plants and animals.


2020 ◽  
Author(s):  
Ido Shalev ◽  
Judith Somekh ◽  
Alal Eran

Abstract BackgroundLoss of tectonin β-propeller repeat-containing 2 (TECPR2) function has been implicated in an array of neurodegenerative disorders, yet its physiological function remains largely unknown. Understanding TECPR2 function is essential for developing much needed precision therapeutics for TECPR2-related diseases. MethodsWe leveraged the considerable amounts of functional data to obtain a comprehensive perspective of the role of TECPR2 in health and disease. We integrated expression patterns, population variation, phylogenetic profiling, protein-protein interactions, and regulatory network data for a minimally biased multimodal functional analysis. Genes and proteins linked to TECPR2 via multiple lines of evidence were subject to functional enrichment analyses to identify molecular mechanisms involving TECPR2.ResultsTECPR2 was found to be part of a tight neurodevelopmental gene expression program that includes KIF1A, ATXN1, TOM1L2, and FA2H, all implicated in neurological diseases. Functional enrichment analyses of TECPR2-related genes converged on a role in late autophagy and ribosomal processes. Large-scale population variation data demonstrated that this role is nonredundant. ConclusionsTECPR2 might serve as an indicator for the energy balance between protein synthesis and autophagy, and a marker for diseases associated with their imbalance, such as Alzheimer’s disease, Huntington’s disease, and various cancers. Our work further suggests that TECPR2 plays a role as a synaptic proteostasis regulator during synaptogenesis, highlighting its importance in developing neurons. By advancing our understanding of TECPR2 function, this work provides an essential stepping stone towards the development of precision diagnostics and targeted treatment options for TECPR2-related disorders.


Author(s):  
Bao Bing-Kun ◽  
Yan Shuicheng

Graph-based learning provides a useful approach for modeling data in image annotation problems. In this chapter, the authors introduce how to construct a region-based graph to annotate large scale multi-label images. It has been well recognized that analysis in semantic region level may greatly improve image annotation performance compared to that in whole image level. However, the region level approach increases the data scale to several orders of magnitude and lays down new challenges to most existing algorithms. To this end, each image is firstly encoded as a Bag-of-Regions based on multiple image segmentations. And then, all image regions are constructed into a large k-nearest-neighbor graph with efficient Locality Sensitive Hashing (LSH) method. At last, a sparse and region-aware image-based graph is fed into the multi-label extension of the Entropic graph regularized semi-supervised learning algorithm (Subramanya & Bilmes, 2009). In combination they naturally yield the capability in handling large-scale dataset. Extensive experiments on NUS-WIDE (260k images) and COREL-5k datasets well validate the effectiveness and efficiency of the framework for region-aware and scalable multi-label propagation.


Sign in / Sign up

Export Citation Format

Share Document