Protein-coding changes preceded cis-regulatory gains in a newly evolved transcription circuit

AbstractWhile changes in both the coding-sequence of transcriptional regulators and in the cis-regulatory sequences recognized by them have been implicated in the evolution of transcriptional circuits, little is known of how they evolve in concert. We describe an evolutionary pathway in fungi where a new transcriptional circuit (a-specific gene repression by Matα2) evolved by coding changes in an ancient master regulator, followed millions of years later by cis-regulatory sequence changes in the genes of its future regulon. We discerned this order of events by analyzing a group of species in which the coding changes in the regulator are present, but the cis-regulatory changes in the target genes are not. In this group we show that the coding changes became necessary for the regulator’s deeply conserved function and were therefore preserved. We propose that the changes first arose without altering the overall function of the regulator (although changing the details of its mechanism) and were later co-opted to “jump start” the formation of the new circuit.

Download Full-text

Structure and developmental regulation of a wheat gene encoding the major chlorophyll a/b-binding polypeptide

Molecular and Cellular Biology ◽

10.1128/mcb.5.6.1370-1378.1985 ◽

1985 ◽

Vol 5 (6) ◽

pp. 1370-1378

Author(s):

G K Lamppa ◽

G Morelli ◽

N H Chua

Keyword(s):

Amino Acid ◽

Chlorophyll A ◽

Developmental Regulation ◽

Transit Peptide ◽

Regulatory Sequences ◽

Mature Protein ◽

S1 Nuclease ◽

Protein Coding ◽

Coding Sequence ◽

Nontranslated Region

A genomic clone for a major chlorophyll a/b-binding polypeptide of the light-harvesting complex has been sequenced from wheat. This gene, whAB1.6, encodes a 70-nucleotide 5'-nontranslated spacer, a 34-amino-acid NH2-terminal extension, i.e., the transit peptide, and a mature coding protein of 232 amino acid residues. The exact molecular weight of the precursor polypeptide is 28,560. The transit peptide is basic and is rich in serines. No intervening sequences are found in this gene. The transcription start site of the whAB1.6 gene occurs at AAAC as determined by S1 nuclease analysis. Putative regulatory sequences occur upstream of the gene at -25 (TTTAAATA) and at -72 (CCAACCA). Northern blots show a single RNA species estimated to be 1,100 nucleotides. Heterogeneity of the RNA population is demonstrated in S1 nuclease analyses with a 5'-end-labeled fragment that extends 191 nucleotides into the mature protein coding sequence. At least seven different transcripts can be recognized. The highest levels of RNA transcribed from the whAB1.6 gene are found in the basal segments of the wheat leaf, whereas other chlorophyll a/b-binding transcripts in the cell show a different pattern of abundance. As a control, we show that roots do not contain chlorophyll a/b-binding RNA. The most abundant RNA species shows an interrupted homology with the whAB1.6 gene at the start of the mature protein coding sequence; another species shows homology beginning at the start of the transit peptide and does not include the nontranslated region. Chlorophyll a/b-binding polypeptides accumulate toward the tip of the leaf as shown by Western blot analysis of total thylakoid proteins.

Download Full-text

Global properties of regulatory sequences are predicted by transcription factor recognition mechanisms

Genome Biology ◽

10.1186/s13059-021-02503-y ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Zain M. Patel ◽

Timothy R. Hughes

Keyword(s):

Transcription Factor ◽

Transcription Factors ◽

Binding Sites ◽

Transcription Factor Binding Sites ◽

Transcription Factor Binding ◽

Regulatory Sequence ◽

Regulatory Sequences ◽

Factor Binding ◽

Global Properties ◽

Regulatory Sites

Abstract Background Mammalian genomes contain millions of putative regulatory sequences, which are delineated by binding of multiple transcription factors. The degree to which spacing and orientation constraints among transcription factor binding sites contribute to the recognition and identity of regulatory sequence is an unresolved but important question that impacts our understanding of genome function and evolution. Global mechanisms that underlie phenomena including the size of regulatory sequences, their uniqueness, and their evolutionary turnover remain poorly described. Results Here, we ask whether models incorporating different degrees of spacing and orientation constraints among transcription factor binding sites are broadly consistent with several global properties of regulatory sequence. These properties include length, sequence diversity, turnover rate, and dominance of specific TFs in regulatory site identity and cell type specification. Models with and without spacing and orientation constraints are generally consistent with all observed properties of regulatory sequence, and with regulatory sequences being fundamentally small (~ 1 nucleosome). Uniqueness of regulatory regions and their rapid evolutionary turnover are expected under all models examined. An intriguing issue we identify is that the complexity of eukaryotic regulatory sites must scale with the number of active transcription factors, in order to accomplish observed specificity. Conclusions Models of transcription factor binding with or without spacing and orientation constraints predict that regulatory sequences should be fundamentally short, unique, and turn over rapidly. We posit that the existence of master regulators may be, in part, a consequence of evolutionary pressure to limit the complexity and increase evolvability of regulatory sites.

Download Full-text

A regulatory-sequence classifier with a neural network for genomic information processing

10.1101/355974 ◽

2018 ◽

Cited By ~ 1

Author(s):

Koh Onimaru ◽

Osamu Nishimura ◽

Shigehiro Kuraku

Keyword(s):

Deep Learning ◽

Genomic Sequence ◽

Regulatory Sequence ◽

Sequence Information ◽

Regulatory Sequences ◽

Genomic Information ◽

Protein Coding ◽

Coding Regions ◽

Gene Regulatory ◽

Genomic Sequence Information

Genotype-phenotype mapping is one of the fundamental challenges in biology. The difficulties stem in part from the large amount of sequence information and the puzzling genomic code, particularly of non-protein-coding regions such as gene regulatory sequences. However, recently deep learning–based methods were shown to have the ability to decipher the gene regulatory code of genomes. Still, prediction accuracy needs improvement. Here, we report the design of convolution layers that efficiently process genomic sequence information and developed a software, DeepGMAP, to train and compare different deep learning-based models (https://github.com/koonimaru/DeepGMAP). First, we demonstrate that our convolution layers, termed forward- and reverse-sequence scan (FRSS) layers, enhance the power to predict gene regulatory sequences. Second, we assessed previous studies and identified problems associated with data structures that caused overfitting. Finally, we introduce several visualization methods that provide insights into the syntax of gene regulatory sequences.

Download Full-text

Evaluating Enhancer Function and Transcription

Annual Review of Biochemistry ◽

10.1146/annurev-biochem-011420-095916 ◽

2020 ◽

Vol 89 (1) ◽

pp. 213-234 ◽

Cited By ~ 7

Author(s):

Andrew Field ◽

Karen Adelman

Keyword(s):

Noncoding Rna ◽

Molecular Mechanisms ◽

Target Genes ◽

Specific Gene ◽

Regulatory Sequences ◽

Gene Promoters ◽

Enhancer Activity ◽

Protein Coding ◽

Molecular Features ◽

Enhancer Function

Cell-type- and condition-specific profiles of gene expression require coordination between protein-coding gene promoters and cis-regulatory sequences called enhancers. Enhancers can stimulate gene activity at great genomic distances from their targets, raising questions about how enhancers communicate with specific gene promoters and what molecular mechanisms underlie enhancer function. Characterization of enhancer loci has identified the molecular features of active enhancers that accompany the binding of transcription factors and local opening of chromatin. These characteristics include coactivator recruitment, histone modifications, and noncoding RNA transcription. However, it remains unclear which of these features functionally contribute to enhancer activity. Here, we discuss what is known about how enhancers regulate their target genes and how enhancers and promoters communicate. Further, we describe recent data demonstrating many similarities between enhancers and the gene promoters they control, and we highlight unanswered questions in the field, such as the potential roles of transcription at enhancers.

Download Full-text

Structure and developmental regulation of a wheat gene encoding the major chlorophyll a/b-binding polypeptide.

Molecular and Cellular Biology ◽

10.1128/mcb.5.6.1370 ◽

1985 ◽

Vol 5 (6) ◽

pp. 1370-1378 ◽

Cited By ~ 88

Author(s):

G K Lamppa ◽

G Morelli ◽

N H Chua

Keyword(s):

Amino Acid ◽

Chlorophyll A ◽

Developmental Regulation ◽

Transit Peptide ◽

Regulatory Sequences ◽

Mature Protein ◽

S1 Nuclease ◽

Protein Coding ◽

Coding Sequence ◽

Nontranslated Region

A genomic clone for a major chlorophyll a/b-binding polypeptide of the light-harvesting complex has been sequenced from wheat. This gene, whAB1.6, encodes a 70-nucleotide 5'-nontranslated spacer, a 34-amino-acid NH2-terminal extension, i.e., the transit peptide, and a mature coding protein of 232 amino acid residues. The exact molecular weight of the precursor polypeptide is 28,560. The transit peptide is basic and is rich in serines. No intervening sequences are found in this gene. The transcription start site of the whAB1.6 gene occurs at AAAC as determined by S1 nuclease analysis. Putative regulatory sequences occur upstream of the gene at -25 (TTTAAATA) and at -72 (CCAACCA). Northern blots show a single RNA species estimated to be 1,100 nucleotides. Heterogeneity of the RNA population is demonstrated in S1 nuclease analyses with a 5'-end-labeled fragment that extends 191 nucleotides into the mature protein coding sequence. At least seven different transcripts can be recognized. The highest levels of RNA transcribed from the whAB1.6 gene are found in the basal segments of the wheat leaf, whereas other chlorophyll a/b-binding transcripts in the cell show a different pattern of abundance. As a control, we show that roots do not contain chlorophyll a/b-binding RNA. The most abundant RNA species shows an interrupted homology with the whAB1.6 gene at the start of the mature protein coding sequence; another species shows homology beginning at the start of the transit peptide and does not include the nontranslated region. Chlorophyll a/b-binding polypeptides accumulate toward the tip of the leaf as shown by Western blot analysis of total thylakoid proteins.

Download Full-text

Genome-wide comparative analysis reveals human- mouse regulatory landscape and evolution

10.1101/010926 ◽

2014 ◽

Cited By ~ 3

Author(s):

Olgert Denas ◽

Richard Sandstrom ◽

Yong Cheng ◽

Kathryn Beal ◽

Javier Herrero ◽

...

Keyword(s):

Comparative Analysis ◽

Cell Types ◽

Regulatory Elements ◽

The Other ◽

Specific Gene ◽

Regulatory Sequence ◽

Regulatory Sequences ◽

Species Specific ◽

And Function ◽

Human And Mouse

Background: Because species-specific gene expression is driven by species-specific regulation, understanding the relationship between sequence and function of the regulatory regions in different species will help elucidate how differences among species arise. Despite active experimental and computational research, the relationships among sequence, conservation, and function are still poorly understood. Results: We compared transcription factor occupied segments (TFos) for 116 human and 35 mouse TFs in 546 human and 125 mouse cell types and tissues from the Human and the Mouse ENCODE projects. We based the map between human and mouse TFos on a one-to-one nucleotide cross-species mapper, bnMapper, that utilizes whole genome alignments (WGA). Our analysis shows that TFos are under evolutionary constraint, but a substantial portion (25.1% of mouse and 25.85% of human on average) of the TFos does not have a homologous sequence on the other species; this portion varies among cell types and TFs. Furthermore, 47.67% and 57.01% of the homologous TFos sequence shows binding activity on the other species for human and mouse respectively. However, 79.87% and 69.22% is repurposed such that it binds the same TF in different cells or different TFs in the same cells. Remarkably, within the set of TFos not showing conservation of occupancy, the corresponding genome regions in the other species are preferred locations of novel TFos. These events suggest that a substantial amount of functional regulatory sequences is exapted from other biochemically active genomic material. Despite substantial repurposing of TFos, we did not find substantial changes in their predicted target genes, suggesting that CRMs buffer evolutionary events allowing little or no change in the TF – target gene associations. Thus, the small portion of TFos with strictly conserved occupancy underestimates the degree of conservation of regulatory interactions. Conclusion: We mapped regulatory sequences from an extensive number of TFs and cell types between human and mouse. A comparative analysis of this correspondence unveiled the extent of the shared regulatory sequence across TFs and cell types under study. Importantly, a large part of the shared regulatory sequence repurposed on the other species. This sequence, fueled by turnover events, provides a strong case for exaptation in regulatory elements.

Download Full-text

Long non-coding RNA LINC00665 promotes gemcitabine resistance of Cholangiocarcinoma cells via regulating EMT and stemness properties through miR-424-5p/BCL9L axis

Cell Death and Disease ◽

10.1038/s41419-020-03346-4 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Min Lu ◽

Xinglei Qin ◽

Yajun Zhou ◽

Gang Li ◽

Zhaoyang Liu ◽

...

Keyword(s):

Acquired Resistance ◽

Transcriptional Regulators ◽

First Line ◽

Protein Coding ◽

Gemcitabine Resistance ◽

Non Coding Rna ◽

Sphere Formation ◽

Long Non Coding Rna ◽

Line Chemotherapy

AbstractGemcitabine is the first-line chemotherapy drug for cholangiocarcinoma (CCA), but acquired resistance has been frequently observed in CCA patients. To search for potential long noncoding RNAs (lncRNAs) involved in gemcitabine resistance, two gemcitabine resistant CCA cell lines were established and dysregulated lncRNAs were identified by lncRNA microarray. Long intergenic non-protein coding RNA 665 (LINC00665) were found to rank the top 10 upregulated lncRNAs in our study, and high LINC00665 expression was closely associated with poor prognosis and chemoresistance of CCA patients. Silencing LINC00665 in gemcitabine resistant CCA cells impaired gemcitabine tolerance, while enforced LINC00665 expression increased gemcitabine resistance of sensitive CCA cells. The gemcitabine resistant CCA cells showed increased EMT and stemness properties, and silencing LINC00665 suppressed sphere formation, migration, invasion and expression of EMT and stemness markers. In addition, Wnt/β-Catenin signaling was activated in gemcitabine resistant CCA cells, but LINC00665 knockdown suppressed Wnt/β-Catenin activation. B-cell CLL/lymphoma 9-like (BCL9L), the nucleus transcriptional regulators of Wnt/β-Catenin signaling, plays a key role in the nucleus translocation of β-Catenin and promotes β-Catenin-dependent transcription. In our study, we found that LINC00665 regulated BCL9L expression by acting as a molecular sponge for miR-424-5p. Moreover, silencing BCL9L or miR-424-5p overexpression suppressed gemcitabine resistance, EMT, stemness and Wnt/β-Catenin activation in resistant CCA cells. In conclusion, our results disclosed the important role of LINC00665 in gemcitabine resistance of CCA cells, and provided a new biomarker or therapeutic target for CCA treament.

Download Full-text

Molecular Evolution at the decapentaplegic Locus in Drosophila

Genetics ◽

10.1093/genetics/145.2.297 ◽

1997 ◽

Vol 145 (2) ◽

pp. 297-309 ◽

Cited By ~ 3

Author(s):

Stuart J Newfeld ◽

Richard W Padgett ◽

Seth D Findley ◽

Brent G Richter ◽

Michele Sanicola ◽

...

Keyword(s):

Molecular Evolution ◽

Transforming Growth Factor ◽

Transforming Growth Factor Β ◽

Selective Constraint ◽

Amino Acid Sequences ◽

Regulatory Sequences ◽

Protein Coding ◽

Terminal Ligand ◽

Cdna Sequences ◽

Interspecific Comparisons

Using an elaborate set of cis-regulatory sequences, the decapentaplegic (dpp) gene displays a dynamic pattern of gene expression during development. The C-terminal portion of the DPP protein is processed to generate a secreted signaling molecule belonging to the transforming growth factor-β (TGF-β) family. This signal, the DPP ligand, is able to influence the developmental fates of responsive cells in a concentration-dependent fashion. Here we examine the sequence level organization of a significant portion of the dpp locus in Drosophila melanogaster and use interspecific comparisons with D. simulans, D. pseudoobscura and D.virilis to explore the molecular evolution of the gene. Our interspecific analysis identified significant selective constraint on both the nucleotide and amino acid sequences. As expected, interspecific comparison of protein coding sequences shows that the C-terminal ligand region is highly conserved. However, the central portion of the protein is also conserved, while the N-terminal third is quite variable. Comparison of noncoding regions reveals significant stretches of nucleotide identity in the 3′ untranslated portion of exon 3 and in the intron between exons 2 and 3. An examination of cDNA sequences representing five classes of dpp transcripts indicates that these transcripts encode the same polypeptide.

Download Full-text

Characterization of a mammalian smooth muscle myosin heavy-chain gene: complete nucleotide and protein coding sequence and analysis of the 5' end of the gene.

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.88.23.10676 ◽

1991 ◽

Vol 88 (23) ◽

pp. 10676-10680 ◽

Cited By ~ 80

Author(s):

P. Babij ◽

C. Kelly ◽

M. Periasamy

Keyword(s):

Smooth Muscle ◽

Myosin Heavy Chain ◽

Heavy Chain ◽

Chain Gene ◽

Heavy Chain Gene ◽

Myosin Heavy Chain Gene ◽

Protein Coding ◽

Coding Sequence ◽

Muscle Myosin

Download Full-text