Evidence of evolutionary selection for co-translational folding

Mapping Intimacies ◽

10.1101/121871 ◽

2017 ◽

Author(s):

William M. Jacobs ◽

Eugene I. Shakhnovich

Keyword(s):

Self Assembly ◽

E Coli ◽

Evolutionary Selection ◽

Fitness Effects ◽

Open Questions ◽

Genome Wide ◽

A Genome ◽

Evolutionarily Conserved ◽

Domain Boundaries ◽

Selection For

Recent experiments and simulations have demonstrated that proteins can fold on the ribosome. However, the extent and generality of fitness effects resulting from co-translational folding remain open questions. Here we report a genome-wide analysis that uncovers evidence of evolutionary selection for co-translational folding. We describe a robust statistical approach to identify loci within genes that are both significantly enriched in slowly translated codons and evolutionarily conserved. Surprisingly, we find that domain boundaries can explain only a small fraction of these conserved loci. Instead, we propose that regions enriched in slowly translated codons are associated with co-translational folding intermediates, which may be smaller than a single domain. We show that the intermediates predicted by a native-centric model of co-translational folding account for the majority of these loci across more than 500 E. coli proteins. By making a direct connection to protein folding, this analysis provides strong evidence that many synonymous substitutions have been selected to optimize translation rates at specific locations within genes. More generally, our results indicate that kinetics, and not just thermodynamics, can significantly alter the efficiency of self-assembly in a biological context.

Download Full-text

Evidence of evolutionary selection for cotranslational folding

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1705772114 ◽

2017 ◽

Vol 114 (43) ◽

pp. 11434-11439 ◽

Cited By ~ 48

Author(s):

William M. Jacobs ◽

Eugene I. Shakhnovich

Keyword(s):

Self Assembly ◽

Evolutionary Selection ◽

Fitness Effects ◽

Open Questions ◽

Genome Wide ◽

A Genome ◽

Evolutionarily Conserved ◽

Domain Boundaries ◽

Cotranslational Folding ◽

Selection For

Recent experiments and simulations have demonstrated that proteins can fold on the ribosome. However, the extent and generality of fitness effects resulting from cotranslational folding remain open questions. Here we report a genome-wide analysis that uncovers evidence of evolutionary selection for cotranslational folding. We describe a robust statistical approach to identify loci within genes that are both significantly enriched in slowly translated codons and evolutionarily conserved. Surprisingly, we find that domain boundaries can explain only a small fraction of these conserved loci. Instead, we propose that regions enriched in slowly translated codons are associated with cotranslational folding intermediates, which may be smaller than a single domain. We show that the intermediates predicted by a native-centric model of cotranslational folding account for the majority of these loci across more than 500 Escherichia coli proteins. By making a direct connection to protein folding, this analysis provides strong evidence that many synonymous substitutions have been selected to optimize translation rates at specific locations within genes. More generally, our results indicate that kinetics, and not just thermodynamics, can significantly alter the efficiency of self-assembly in a biological context.

Download Full-text

Fast and efficient Rmap assembly using the Bi-labelled de Bruijn graph

Algorithms for Molecular Biology ◽

10.1186/s13015-021-00182-9 ◽

2021 ◽

Vol 16 (1) ◽

Author(s):

Kingshuk Mukherjee ◽

Massimiliano Rossi ◽

Leena Salmela ◽

Christina Boucher

Keyword(s):

Single Molecule ◽

De Bruijn Graph ◽

Anabas Testudineus ◽

E Coli ◽

Genome Wide ◽

A Genome ◽

De Bruijn ◽

Optical Maps ◽

Definition Of ◽

Numeric Representation

AbstractGenome wide optical maps are high resolution restriction maps that give a unique numeric representation to a genome. They are produced by assembling hundreds of thousands of single molecule optical maps, which are called Rmaps. Unfortunately, there are very few choices for assembling Rmap data. There exists only one publicly-available non-proprietary method for assembly and one proprietary software that is available via an executable. Furthermore, the publicly-available method, by Valouev et al. (Proc Natl Acad Sci USA 103(43):15770–15775, 2006), follows the overlap-layout-consensus (OLC) paradigm, and therefore, is unable to scale for relatively large genomes. The algorithm behind the proprietary method, Bionano Genomics’ Solve, is largely unknown. In this paper, we extend the definition of bi-labels in the paired de Bruijn graph to the context of optical mapping data, and present the first de Bruijn graph based method for Rmap assembly. We implement our approach, which we refer to as rmapper, and compare its performance against the assembler of Valouev et al. (Proc Natl Acad Sci USA 103(43):15770–15775, 2006) and Solve by Bionano Genomics on data from three genomes: E. coli, human, and climbing perch fish (Anabas Testudineus). Our method was able to successfully run on all three genomes. The method of Valouev et al. (Proc Natl Acad Sci USA 103(43):15770–15775, 2006) only successfully ran on E. coli. Moreover, on the human genome rmapper was at least 130 times faster than Bionano Solve, used five times less memory and produced the highest genome fraction with zero mis-assemblies. Our software, rmapper is written in C++ and is publicly available under GNU General Public License at https://github.com/kingufl/Rmapper.

Download Full-text

A Genome-Wide Scan for Evidence of Selection in a Maize Population Under Long-Term Artificial Selection for Ear Number

Genetics ◽

10.1534/genetics.113.160655 ◽

2013 ◽

Vol 196 (3) ◽

pp. 829-840 ◽

Cited By ~ 47

Author(s):

Timothy M. Beissinger ◽

Candice N. Hirsch ◽

Brieanne Vaillancourt ◽

Shweta Deshpande ◽

Kerrie Barry ◽

...

Keyword(s):

Artificial Selection ◽

Maize Population ◽

Genome Wide ◽

A Genome ◽

Selection For ◽

Ear Number ◽

Genome Wide Scan

Download Full-text

Elucidating acetate tolerance in E. coli using a genome-wide approach

Metabolic Engineering ◽

10.1016/j.ymben.2010.12.001 ◽

2011 ◽

Vol 13 (2) ◽

pp. 214-224 ◽

Cited By ~ 49

Author(s):

Nicholas R. Sandoval ◽

Tirzah Y. Mills ◽

Min Zhang ◽

Ryan T. Gill

Keyword(s):

E Coli ◽

Genome Wide ◽

A Genome

Download Full-text

Chromatin architectural proteins regulate flowering time by precluding gene looping

Science Advances ◽

10.1126/sciadv.abg3097 ◽

2021 ◽

Vol 7 (24) ◽

pp. eabg3097

Author(s):

Bo Zhao ◽

Yanpeng Xi ◽

Junghyun Kim ◽

Sibum Sung

Keyword(s):

Chromatin Structure ◽

Cellular Processes ◽

Genome Wide ◽

A Genome ◽

Evolutionarily Conserved ◽

Architectural Proteins ◽

Floral Repressor ◽

Flanking Regions ◽

Genome Wide Study

Chromatin structure is critical for gene expression and many other cellular processes. In Arabidopsis thaliana, the floral repressor FLC adopts a self-loop chromatin structure via bridging of its flanking regions. This local gene loop is necessary for active FLC expression. However, the molecular mechanism underlying the formation of this class of gene loops is unknown. Here, we report the characterization of a group of linker histone-like proteins, named the GH1-HMGA family in Arabidopsis, which act as chromatin architecture modulators. We demonstrate that these family members redundantly promote the floral transition through the repression of FLC. A genome-wide study revealed that this family preferentially binds to the 5′ and 3′ ends of gene bodies. The loss of this binding increases FLC expression by stabilizing the FLC 5′ to 3′ gene looping. Our study provides mechanistic insights into how a family of evolutionarily conserved proteins regulates the formation of local gene loops.

Download Full-text

sPepFinder expedites genome-wide identification of small proteins in bacteria

10.1101/2020.05.05.079178 ◽

2020 ◽

Author(s):

Lei Li ◽

Yanjie Chao

Keyword(s):

De Novo ◽

Bacterial Species ◽

Computational Prediction ◽

Ribosome Profiling ◽

Support Vector ◽

Initiation Rate ◽

E Coli ◽

Small Proteins ◽

Genome Wide ◽

A Genome

ABSTRACTSmall proteins shorter than 50 amino acids have been long overlooked. A number of small proteins have been identified in several model bacteria using experimental approaches and assigned important functions in diverse cellular processes. The recent development of ribosome profiling technologies has allowed a genome-wide identification of small proteins and small ORFs (smORFs), but our incomplete understanding of small proteins hinders de novo computational prediction of smORFs in non-model bacterial species. Here, we have identified several sequence features for smORFs by a systematic analysis of all the known small proteins in E. coli, among which the translation initiation rate is the strongest determinant. By integrating these features into a support vector machine learning model, we have developed a novel sPepFinder algorithm that can predict conserved smORFs in bacterial genomes with a high accuracy of 92.8%. De novo prediction in E. coli has revealed several novel smORFs with evidence of translation supported by ribosome profiling. Further application of sPepFinder in 549 bacterial species has led to the identification of > 100,000 novel smORFs, many of which are conserved at the amino acid and nucleotide levels under purifying selection. Overall, we have established sPepFinder as a valuable tool to identify novel smORFs in both model and non-model bacterial organisms, and provided a large resource of small proteins for functional characterizations.

Download Full-text

Each of 3,323 metabolic innovations in the evolution ofE. coliarose through the horizontal transfer of a single DNA segment

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1718997115 ◽

2018 ◽

Vol 116 (1) ◽

pp. 187-192 ◽

Cited By ~ 11

Author(s):

Tin Yau Pang ◽

Martin J. Lercher

Keyword(s):

E Coli ◽

New Genes ◽

Metabolic Adaptations ◽

Genome Wide ◽

A Genome ◽

Individual Strain ◽

History Of ◽

Phenotype Space ◽

Genome Content ◽

Metabolic Models

Even closely related prokaryotes often show an astounding diversity in their ability to grow in different nutritional environments. It has been hypothesized that complex metabolic adaptations—those requiring the independent acquisition of multiple new genes—can evolve via selectively neutral intermediates. However, it is unclear whether this neutral exploration of phenotype space occurs in nature, or what fraction of metabolic adaptations is indeed complex. Here, we reconstruct metabolic models for the ancestors of a phylogeny of 53Escherichia colistrains, linking genotypes to phenotypes on a genome-wide, macroevolutionary scale. Based on the ancestral and extant metabolic models, we identify 3,323 phenotypic innovations in the history of theE. coliclade that arose through changes in accessory genome content. Of these innovations, 1,998 allow growth in previously inaccessible environments, while 1,325 increase biomass yield. Strikingly, every observed innovation arose through the horizontal acquisition of a single DNA segment less than 30 kb long. Although we found no evidence for the contribution of selectively neutral processes, 10.6% of metabolic innovations were facilitated by horizontal gene transfers on earlier phylogenetic branches, consistent with a stepwise adaptation to successive environments. Ninety-eight percent of metabolic phenotypes accessible to the combinedE. colipangenome can be bestowed on any individual strain by transferring a single DNA segment from one of the extant strains. These results demonstrate an amazing ability of theE. colilineage to adapt to novel environments through single horizontal gene transfers (followed by regulatory adaptations), an ability likely mirrored in other clades of generalist bacteria.

Download Full-text

Dbf4-Dependent Kinase (DDK)-Mediated Proteolysis of CENP-A Prevents Mislocalization of CENP-A in Saccharomyces cerevisiae

G3 Genes|Genome|Genetics ◽

10.1534/g3.120.401131 ◽

2020 ◽

Vol 10 (6) ◽

pp. 2057-2068 ◽

Cited By ~ 3

Author(s):

Jessica R. Eisenstatt ◽

Lars Boeckmann ◽

Wei-Chun Au ◽

Valerie Garcia ◽

Levi Bursch ◽

...

Keyword(s):

Dna Replication ◽

Replication Initiation ◽

Centromeric Chromatin ◽

Dna Replication Initiation ◽

Link Type ◽

Genome Wide ◽

A Genome ◽

Evolutionarily Conserved ◽

Histone H3 Variant

The evolutionarily conserved centromeric histone H3 variant (Cse4 in budding yeast, CENP-A in humans) is essential for faithful chromosome segregation. Mislocalization of CENP-A to non-centromeric chromatin contributes to chromosomal instability (CIN) in yeast, fly, and human cells and CENP-A is highly expressed and mislocalized in cancers. Defining mechanisms that prevent mislocalization of CENP-A is an area of active investigation. Ubiquitin-mediated proteolysis of overexpressed Cse4 (GALCSE4) by E3 ubiquitin ligases such as Psh1 prevents mislocalization of Cse4, and psh1Δ strains display synthetic dosage lethality (SDL) with GALCSE4. We previously performed a genome-wide screen and identified five alleles of CDC7 and DBF4 that encode the Dbf4-dependent kinase (DDK) complex, which regulates DNA replication initiation, among the top twelve hits that displayed SDL with GALCSE4. We determined that cdc7-7 strains exhibit defects in ubiquitin-mediated proteolysis of Cse4 and show mislocalization of Cse4. Mutation of MCM5 (mcm5-bob1) bypasses the requirement of Cdc7 for replication initiation and rescues replication defects in a cdc7-7 strain. We determined that mcm5-bob1 does not rescue the SDL and defects in proteolysis of GALCSE4 in a cdc7-7 strain, suggesting a DNA replication-independent role for Cdc7 in Cse4 proteolysis. The SDL phenotype, defects in ubiquitin-mediated proteolysis, and the mislocalization pattern of Cse4 in a cdc7-7 psh1Δ strain were similar to that of cdc7-7 and psh1Δ strains, suggesting that Cdc7 regulates Cse4 in a pathway that overlaps with Psh1. Our results define a DNA replication initiation-independent role of DDK as a regulator of Psh1-mediated proteolysis of Cse4 to prevent mislocalization of Cse4.

Download Full-text

Evolutionary switches between two serine codon sets are driven by selection

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1615832113 ◽

2016 ◽

Vol 113 (46) ◽

pp. 13109-13113 ◽

Cited By ~ 16

Author(s):

Igor B. Rogozin ◽

Frida Belinky ◽

Vladimir Pavlenko ◽

Svetlana A. Shabalina ◽

David M. Kristensen ◽

...

Keyword(s):

Amino Acid ◽

Great Majority ◽

Purifying Selection ◽

Nucleotide Substitutions ◽

Closely Related Species ◽

Published Evidence ◽

Genome Wide ◽

A Genome ◽

Evolutionarily Conserved ◽

Frequent Reversal

Serine is the only amino acid that is encoded by two disjoint codon sets so that a tandem substitution of two nucleotides is required to switch between the two sets. Previously published evidence suggests that, for the most evolutionarily conserved serines, the codon set switch occurs by simultaneous substitution of two nucleotides. Here we report a genome-wide reconstruction of the evolution of serine codons in triplets of closely related species from diverse prokaryotes and eukaryotes. The results indicate that the great majority of codon set switches proceed by two consecutive nucleotide substitutions, via a threonine or cysteine intermediate, and are driven by selection. These findings imply a strong pressure of purifying selection in protein evolution, which in the case of serine codon set switches occurs via an initial deleterious substitution quickly followed by a second, compensatory substitution. The result is frequent reversal of amino acid replacements and, at short evolutionary distances, pervasive homoplasy.

Download Full-text

e-Membranome: a Database for Genome-Wide Analysis of Escherichia coli Outer Membrane Proteins

Current Pharmaceutical Biotechnology ◽

10.2174/1389201021666200610105549 ◽

2020 ◽

Vol 21 ◽

Author(s):

Kang Mo Lee ◽

Seung-Hak Cho ◽

Cheorl-Ho Kim ◽

Jong Hyun Kim ◽

Sung Soon Kim

Keyword(s):

Escherichia Coli ◽

Membrane Proteins ◽

Outer Membrane ◽

Outer Membrane Proteins ◽

3D Structure ◽

Glycan Array ◽

Epitope Region ◽

E Coli ◽

Genome Wide ◽

A Genome

Objectives: Lectin-like adhesins of enteric bacterial pathogens such as Escherichia coli are an attractive target for vaccine or drug development. Here, we have developed e-Membranome as a database of genome-wide putative adhesins in Escherichia coli (E. coli). Methods: The outer membrane adhesins were predicted from the annotated genes of Escherichia coli strains using the PSORTb program. Further analysis was performed using Interproscan and the String database. The candidate proteins can be investigated for homology modeling of the three-dimensional (3D) structure (I-TASSER version 5.1), epitope region (ABCpred), and the glycan array. Results: e-Membranome is implemented using the Django (version 2.2.5) framework. The Web Application Server Apache Tomcat 6.0 is integrated in the platform on Ubuntu Linux (version 16.04). MySQL database (version 5.7) is used as a database engine. The information of homology model of the 3D structure, epitope region, and affinity information from the glycan array will be stored in the e-Membranome database. As a case study, we performed a genome-wide screening of outer membrane-embedded proteins from the annotated genes of E. coli using the e-Membranome pipeline. Conclusion: This platform is expected to be a valuable resource for advancing research of outer membrane proteins for the construction of lectin-glycan interaction network of E. coli. In addition, the e-Membranome pipeline can be extended to other similar biological systems that need to address host-pathogen interactions.

Download Full-text