scholarly journals Protein-coding changes preceded cis-regulatory gains in a newly evolved transcription circuit

Science ◽  
2020 ◽  
Vol 367 (6473) ◽  
pp. 96-100 ◽  
Author(s):  
Candace S. Britton ◽  
Trevor R. Sorrells ◽  
Alexander D. Johnson

Changes in both the coding sequence of transcriptional regulators and in the cis-regulatory sequences recognized by these regulators have been implicated in the evolution of transcriptional circuits. However, little is known about how they evolved in concert. We describe an evolutionary pathway in fungi where a new transcriptional circuit (a-specific gene repression by the homeodomain protein Matα2) evolved by coding changes in this ancient regulator, followed millions of years later by cis-regulatory sequence changes in the genes of its future regulon. By analyzing a group of species that has acquired the coding changes but not the cis-regulatory sites, we show that the coding changes became necessary for the regulator’s deeply conserved function, thereby poising the regulator to jump-start formation of the new circuit.

2019 ◽  
Author(s):  
Candace S. Britton ◽  
Trevor R. Sorrells ◽  
Alexander D. Johnson

AbstractWhile changes in both the coding-sequence of transcriptional regulators and in the cis-regulatory sequences recognized by them have been implicated in the evolution of transcriptional circuits, little is known of how they evolve in concert. We describe an evolutionary pathway in fungi where a new transcriptional circuit (a-specific gene repression by Matα2) evolved by coding changes in an ancient master regulator, followed millions of years later by cis-regulatory sequence changes in the genes of its future regulon. We discerned this order of events by analyzing a group of species in which the coding changes in the regulator are present, but the cis-regulatory changes in the target genes are not. In this group we show that the coding changes became necessary for the regulator’s deeply conserved function and were therefore preserved. We propose that the changes first arose without altering the overall function of the regulator (although changing the details of its mechanism) and were later co-opted to “jump start” the formation of the new circuit.


1985 ◽  
Vol 5 (6) ◽  
pp. 1370-1378
Author(s):  
G K Lamppa ◽  
G Morelli ◽  
N H Chua

A genomic clone for a major chlorophyll a/b-binding polypeptide of the light-harvesting complex has been sequenced from wheat. This gene, whAB1.6, encodes a 70-nucleotide 5'-nontranslated spacer, a 34-amino-acid NH2-terminal extension, i.e., the transit peptide, and a mature coding protein of 232 amino acid residues. The exact molecular weight of the precursor polypeptide is 28,560. The transit peptide is basic and is rich in serines. No intervening sequences are found in this gene. The transcription start site of the whAB1.6 gene occurs at AAAC as determined by S1 nuclease analysis. Putative regulatory sequences occur upstream of the gene at -25 (TTTAAATA) and at -72 (CCAACCA). Northern blots show a single RNA species estimated to be 1,100 nucleotides. Heterogeneity of the RNA population is demonstrated in S1 nuclease analyses with a 5'-end-labeled fragment that extends 191 nucleotides into the mature protein coding sequence. At least seven different transcripts can be recognized. The highest levels of RNA transcribed from the whAB1.6 gene are found in the basal segments of the wheat leaf, whereas other chlorophyll a/b-binding transcripts in the cell show a different pattern of abundance. As a control, we show that roots do not contain chlorophyll a/b-binding RNA. The most abundant RNA species shows an interrupted homology with the whAB1.6 gene at the start of the mature protein coding sequence; another species shows homology beginning at the start of the transit peptide and does not include the nontranslated region. Chlorophyll a/b-binding polypeptides accumulate toward the tip of the leaf as shown by Western blot analysis of total thylakoid proteins.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Zain M. Patel ◽  
Timothy R. Hughes

Abstract Background Mammalian genomes contain millions of putative regulatory sequences, which are delineated by binding of multiple transcription factors. The degree to which spacing and orientation constraints among transcription factor binding sites contribute to the recognition and identity of regulatory sequence is an unresolved but important question that impacts our understanding of genome function and evolution. Global mechanisms that underlie phenomena including the size of regulatory sequences, their uniqueness, and their evolutionary turnover remain poorly described. Results Here, we ask whether models incorporating different degrees of spacing and orientation constraints among transcription factor binding sites are broadly consistent with several global properties of regulatory sequence. These properties include length, sequence diversity, turnover rate, and dominance of specific TFs in regulatory site identity and cell type specification. Models with and without spacing and orientation constraints are generally consistent with all observed properties of regulatory sequence, and with regulatory sequences being fundamentally small (~ 1 nucleosome). Uniqueness of regulatory regions and their rapid evolutionary turnover are expected under all models examined. An intriguing issue we identify is that the complexity of eukaryotic regulatory sites must scale with the number of active transcription factors, in order to accomplish observed specificity. Conclusions Models of transcription factor binding with or without spacing and orientation constraints predict that regulatory sequences should be fundamentally short, unique, and turn over rapidly. We posit that the existence of master regulators may be, in part, a consequence of evolutionary pressure to limit the complexity and increase evolvability of regulatory sites.


2018 ◽  
Author(s):  
Koh Onimaru ◽  
Osamu Nishimura ◽  
Shigehiro Kuraku

Genotype-phenotype mapping is one of the fundamental challenges in biology. The difficulties stem in part from the large amount of sequence information and the puzzling genomic code, particularly of non-protein-coding regions such as gene regulatory sequences. However, recently deep learning–based methods were shown to have the ability to decipher the gene regulatory code of genomes. Still, prediction accuracy needs improvement. Here, we report the design of convolution layers that efficiently process genomic sequence information and developed a software, DeepGMAP, to train and compare different deep learning-based models (https://github.com/koonimaru/DeepGMAP). First, we demonstrate that our convolution layers, termed forward- and reverse-sequence scan (FRSS) layers, enhance the power to predict gene regulatory sequences. Second, we assessed previous studies and identified problems associated with data structures that caused overfitting. Finally, we introduce several visualization methods that provide insights into the syntax of gene regulatory sequences.


2020 ◽  
Vol 89 (1) ◽  
pp. 213-234 ◽  
Author(s):  
Andrew Field ◽  
Karen Adelman

Cell-type- and condition-specific profiles of gene expression require coordination between protein-coding gene promoters and cis-regulatory sequences called enhancers. Enhancers can stimulate gene activity at great genomic distances from their targets, raising questions about how enhancers communicate with specific gene promoters and what molecular mechanisms underlie enhancer function. Characterization of enhancer loci has identified the molecular features of active enhancers that accompany the binding of transcription factors and local opening of chromatin. These characteristics include coactivator recruitment, histone modifications, and noncoding RNA transcription. However, it remains unclear which of these features functionally contribute to enhancer activity. Here, we discuss what is known about how enhancers regulate their target genes and how enhancers and promoters communicate. Further, we describe recent data demonstrating many similarities between enhancers and the gene promoters they control, and we highlight unanswered questions in the field, such as the potential roles of transcription at enhancers.


1985 ◽  
Vol 5 (6) ◽  
pp. 1370-1378 ◽  
Author(s):  
G K Lamppa ◽  
G Morelli ◽  
N H Chua

A genomic clone for a major chlorophyll a/b-binding polypeptide of the light-harvesting complex has been sequenced from wheat. This gene, whAB1.6, encodes a 70-nucleotide 5'-nontranslated spacer, a 34-amino-acid NH2-terminal extension, i.e., the transit peptide, and a mature coding protein of 232 amino acid residues. The exact molecular weight of the precursor polypeptide is 28,560. The transit peptide is basic and is rich in serines. No intervening sequences are found in this gene. The transcription start site of the whAB1.6 gene occurs at AAAC as determined by S1 nuclease analysis. Putative regulatory sequences occur upstream of the gene at -25 (TTTAAATA) and at -72 (CCAACCA). Northern blots show a single RNA species estimated to be 1,100 nucleotides. Heterogeneity of the RNA population is demonstrated in S1 nuclease analyses with a 5'-end-labeled fragment that extends 191 nucleotides into the mature protein coding sequence. At least seven different transcripts can be recognized. The highest levels of RNA transcribed from the whAB1.6 gene are found in the basal segments of the wheat leaf, whereas other chlorophyll a/b-binding transcripts in the cell show a different pattern of abundance. As a control, we show that roots do not contain chlorophyll a/b-binding RNA. The most abundant RNA species shows an interrupted homology with the whAB1.6 gene at the start of the mature protein coding sequence; another species shows homology beginning at the start of the transit peptide and does not include the nontranslated region. Chlorophyll a/b-binding polypeptides accumulate toward the tip of the leaf as shown by Western blot analysis of total thylakoid proteins.


2014 ◽  
Author(s):  
Olgert Denas ◽  
Richard Sandstrom ◽  
Yong Cheng ◽  
Kathryn Beal ◽  
Javier Herrero ◽  
...  

Background: Because species-specific gene expression is driven by species-specific regulation, understanding the relationship between sequence and function of the regulatory regions in different species will help elucidate how differences among species arise. Despite active experimental and computational research, the relationships among sequence, conservation, and function are still poorly understood. Results: We compared transcription factor occupied segments (TFos) for 116 human and 35 mouse TFs in 546 human and 125 mouse cell types and tissues from the Human and the Mouse ENCODE projects. We based the map between human and mouse TFos on a one-to-one nucleotide cross-species mapper, bnMapper, that utilizes whole genome alignments (WGA). Our analysis shows that TFos are under evolutionary constraint, but a substantial portion (25.1% of mouse and 25.85% of human on average) of the TFos does not have a homologous sequence on the other species; this portion varies among cell types and TFs. Furthermore, 47.67% and 57.01% of the homologous TFos sequence shows binding activity on the other species for human and mouse respectively. However, 79.87% and 69.22% is repurposed such that it binds the same TF in different cells or different TFs in the same cells. Remarkably, within the set of TFos not showing conservation of occupancy, the corresponding genome regions in the other species are preferred locations of novel TFos. These events suggest that a substantial amount of functional regulatory sequences is exapted from other biochemically active genomic material. Despite substantial repurposing of TFos, we did not find substantial changes in their predicted target genes, suggesting that CRMs buffer evolutionary events allowing little or no change in the TF – target gene associations. Thus, the small portion of TFos with strictly conserved occupancy underestimates the degree of conservation of regulatory interactions. Conclusion: We mapped regulatory sequences from an extensive number of TFs and cell types between human and mouse. A comparative analysis of this correspondence unveiled the extent of the shared regulatory sequence across TFs and cell types under study. Importantly, a large part of the shared regulatory sequence repurposed on the other species. This sequence, fueled by turnover events, provides a strong case for exaptation in regulatory elements.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Min Lu ◽  
Xinglei Qin ◽  
Yajun Zhou ◽  
Gang Li ◽  
Zhaoyang Liu ◽  
...  

AbstractGemcitabine is the first-line chemotherapy drug for cholangiocarcinoma (CCA), but acquired resistance has been frequently observed in CCA patients. To search for potential long noncoding RNAs (lncRNAs) involved in gemcitabine resistance, two gemcitabine resistant CCA cell lines were established and dysregulated lncRNAs were identified by lncRNA microarray. Long intergenic non-protein coding RNA 665 (LINC00665) were found to rank the top 10 upregulated lncRNAs in our study, and high LINC00665 expression was closely associated with poor prognosis and chemoresistance of CCA patients. Silencing LINC00665 in gemcitabine resistant CCA cells impaired gemcitabine tolerance, while enforced LINC00665 expression increased gemcitabine resistance of sensitive CCA cells. The gemcitabine resistant CCA cells showed increased EMT and stemness properties, and silencing LINC00665 suppressed sphere formation, migration, invasion and expression of EMT and stemness markers. In addition, Wnt/β-Catenin signaling was activated in gemcitabine resistant CCA cells, but LINC00665 knockdown suppressed Wnt/β-Catenin activation. B-cell CLL/lymphoma 9-like (BCL9L), the nucleus transcriptional regulators of Wnt/β-Catenin signaling, plays a key role in the nucleus translocation of β-Catenin and promotes β-Catenin-dependent transcription. In our study, we found that LINC00665 regulated BCL9L expression by acting as a molecular sponge for miR-424-5p. Moreover, silencing BCL9L or miR-424-5p overexpression suppressed gemcitabine resistance, EMT, stemness and Wnt/β-Catenin activation in resistant CCA cells. In conclusion, our results disclosed the important role of LINC00665 in gemcitabine resistance of CCA cells, and provided a new biomarker or therapeutic target for CCA treament.


Genetics ◽  
1997 ◽  
Vol 145 (2) ◽  
pp. 297-309 ◽  
Author(s):  
Stuart J Newfeld ◽  
Richard W Padgett ◽  
Seth D Findley ◽  
Brent G Richter ◽  
Michele Sanicola ◽  
...  

Using an elaborate set of cis-regulatory sequences, the decapentaplegic (dpp) gene displays a dynamic pattern of gene expression during development. The C-terminal portion of the DPP protein is processed to generate a secreted signaling molecule belonging to the transforming growth factor-β (TGF-β) family. This signal, the DPP ligand, is able to influence the developmental fates of responsive cells in a concentration-dependent fashion. Here we examine the sequence level organization of a significant portion of the dpp locus in Drosophila melanogaster and use interspecific comparisons with D. simulans, D. pseudoobscura and D.virilis to explore the molecular evolution of the gene. Our interspecific analysis identified significant selective constraint on both the nucleotide and amino acid sequences. As expected, interspecific comparison of protein coding sequences shows that the C-terminal ligand region is highly conserved. However, the central portion of the protein is also conserved, while the N-terminal third is quite variable. Comparison of noncoding regions reveals significant stretches of nucleotide identity in the 3′ untranslated portion of exon 3 and in the intron between exons 2 and 3. An examination of cDNA sequences representing five classes of dpp transcripts indicates that these transcripts encode the same polypeptide.


Sign in / Sign up

Export Citation Format

Share Document