Genetic sequences are two-dimensional

Mapping Intimacies ◽

10.1101/299867 ◽

2018 ◽

Cited By ~ 1

Author(s):

Albert J Erives

Keyword(s):

Tandem Repeats ◽

Methodological Approach ◽

Regulatory Sequences ◽

Uniform Size ◽

Multiple Sequence ◽

Replication Slippage ◽

Protein Coding ◽

Gapped Alignment ◽

Genetic Sequences ◽

Reading Frames

AbstractIn attempting to align divergent homologs of a conserved developmental enhancer, a flaw in the homology concept embedded in gapped alignment (GA) was discovered. To correct this flaw, we developed a methodological approach called maximal homology alignment (MHA). The goal of MHA is to rescue internal microparalogy of biological sequences rather than to insert a pattern of gaps (null characters), which transform homologous sequences into strings of uniform size (1-dimensional lengths). The core operation in MHA is the “cinch”, whereby inferred tandem microparalogy is represented in multiple rows across the same span of alignment columns. Thus, MHAs have a second (vertical) paralogy dimension, which re-categorizes most indel mutations as replication slippage and attenuates the indel problem. Furthermore, internally-cinched, inferred microparalogy in a self-MHA can later be relaxed to restore uniformity to 2-dimensional widths in a multiple sequence alignment. This de-cinching operation is used as a first resort before artificial null characters are used. We implement MHA in a program called maximal, which is composed of a series of modules for cinching and cyclelizing divergent tandem repeats. In conclusion, we find that the MHA approach is of higher utility than GA in non-protein-coding regulatory sequences, which are unconstrained by codon-based reading frames and are enriched in dense microparalogical content.

Download Full-text

Tandem repeats ubiquitously flank and select translation initiation sites.

10.21203/rs.3.rs-832312/v1 ◽

2021 ◽

Author(s):

Ali Maddi ◽

Kaveh Kavousi ◽

Masoud Arabfard ◽

Hamid Ohadi ◽

Mina Ohadi

Keyword(s):

Translation Initiation ◽

Tandem Repeats ◽

Evolutionary Divergence ◽

Regulatory Sequences ◽

Significant Excess ◽

Stem Loop ◽

Protein Coding ◽

Rna Sequence ◽

Protein Coding Genes ◽

Human Specific

Abstract Findings in yeast and human suggest that evolutionary divergence in cis-regulatory sequences impact translation initiation sites (TISs). Here we employed the TIS homology concept to study a possible link between all categories of tandem repeats (TRs) and TIS selection. Human and 83 other species were selected, and data was extracted on the entire protein-coding genes (n = 1,611,368) and transcripts (n = 2,730,515) annotated for those species from Ensembl 102. On average, every transcript was flanked by 1.19 TRs of various categories in their 120 bp upstream RNA sequence. We detected statistically significant excess of non-homologous TISs co-occurring with human-specific TRs, and vice versa. We conclude that TRs are abundant cis elements in the upstream sequences of TISs across species, and there is a link between all categories of TRs and TIS selection. TR-induced symmetric and stem-loop structures may function as genetic marks for TIS selection.

Download Full-text

Tandem repeats ubiquitously flank and select translation initiation sites

10.21203/rs.3.rs-832312/v2 ◽

2021 ◽

Author(s):

Ali M.A. Maddi ◽

Kaveh Kavousi ◽

Masoud Arabfard ◽

Hamid Ohadi ◽

Mina Ohadi

Keyword(s):

Translation Initiation ◽

Tandem Repeats ◽

Evolutionary Divergence ◽

Regulatory Sequences ◽

Upstream Sequence ◽

Significant Excess ◽

Protein Coding ◽

Protein Coding Genes ◽

In Cis ◽

Human Specific

Abstract Evolutionary divergence in cis-regulatory sequences impacts translation initiation sites (TISs). The implication of tandem repeats (TRs) in TIS selection remains elusive for the most part. Here we employed the TIS homology concept to study a possible link between all categories of TRs and TIS selection. Human and 83 other species were selected, and data was extracted on the entire protein-coding genes (n=1,611,368) and transcripts (n=2,730,515) annotated for those species from Ensembl 102. Two different weighing vectors were employed to assign TIS homology, and the results were assessed in 10-fold validation. On average, every TIS was flanked by 1.19 TRs of various categories within the 120 bp upstream sequence. We detected statistically significant excess of non-homologous TISs co-occurring with human-specific TRs, vice versa. We conclude that TRs are abundant cis elements in the upstream sequences of TISs across species, and there is a link between all categories of TRs and TIS selection.

Download Full-text

Long-read cDNA sequencing identifies functional pseudogenes in the human transcriptome

Genome Biology ◽

10.1186/s13059-021-02369-0 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Robin-Lee Troskie ◽

Yohaann Jafrani ◽

Tim R. Mercer ◽

Adam D. Ewing ◽

Geoffrey J. Faulkner ◽

...

Keyword(s):

Cultured Cells ◽

Open Reading Frames ◽

Cdna Sequencing ◽

Protein Coding ◽

Dynamic Component ◽

Gene Copies ◽

Long Read ◽

Normal Human ◽

Reading Frames ◽

Transcriptional Landscape

AbstractPseudogenes are gene copies presumed to mainly be functionless relics of evolution due to acquired deleterious mutations or transcriptional silencing. Using deep full-length PacBio cDNA sequencing of normal human tissues and cancer cell lines, we identify here hundreds of novel transcribed pseudogenes expressed in tissue-specific patterns. Some pseudogene transcripts have intact open reading frames and are translated in cultured cells, representing unannotated protein-coding genes. To assess the biological impact of noncoding pseudogenes, we CRISPR-Cas9 delete the nucleus-enriched pseudogene PDCL3P4 and observe hundreds of perturbed genes. This study highlights pseudogenes as a complex and dynamic component of the human transcriptional landscape.

Download Full-text

Molecular Evolution at the decapentaplegic Locus in Drosophila

Genetics ◽

10.1093/genetics/145.2.297 ◽

1997 ◽

Vol 145 (2) ◽

pp. 297-309 ◽

Cited By ~ 3

Author(s):

Stuart J Newfeld ◽

Richard W Padgett ◽

Seth D Findley ◽

Brent G Richter ◽

Michele Sanicola ◽

...

Keyword(s):

Molecular Evolution ◽

Transforming Growth Factor ◽

Transforming Growth Factor Β ◽

Selective Constraint ◽

Amino Acid Sequences ◽

Regulatory Sequences ◽

Protein Coding ◽

Terminal Ligand ◽

Cdna Sequences ◽

Interspecific Comparisons

Using an elaborate set of cis-regulatory sequences, the decapentaplegic (dpp) gene displays a dynamic pattern of gene expression during development. The C-terminal portion of the DPP protein is processed to generate a secreted signaling molecule belonging to the transforming growth factor-β (TGF-β) family. This signal, the DPP ligand, is able to influence the developmental fates of responsive cells in a concentration-dependent fashion. Here we examine the sequence level organization of a significant portion of the dpp locus in Drosophila melanogaster and use interspecific comparisons with D. simulans, D. pseudoobscura and D.virilis to explore the molecular evolution of the gene. Our interspecific analysis identified significant selective constraint on both the nucleotide and amino acid sequences. As expected, interspecific comparison of protein coding sequences shows that the C-terminal ligand region is highly conserved. However, the central portion of the protein is also conserved, while the N-terminal third is quite variable. Comparison of noncoding regions reveals significant stretches of nucleotide identity in the 3′ untranslated portion of exon 3 and in the intron between exons 2 and 3. An examination of cDNA sequences representing five classes of dpp transcripts indicates that these transcripts encode the same polypeptide.

Download Full-text

Profiling variable-number tandem repeat variation across populations using repeat-pangenome graphs

Nature Communications ◽

10.1038/s41467-021-24378-0 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Tsung-Yu Lu ◽

Katherine M. Munson ◽

Alexandra P. Lewis ◽

Qihui Zhu ◽

Luke J. Tallon ◽

...

Keyword(s):

Tandem Repeats ◽

Traditional Approach ◽

Variable Number Tandem Repeat ◽

Variable Number ◽

Population Diversity ◽

Protein Coding ◽

Short Reads ◽

Repeat Structure ◽

Continental Population ◽

Develop Software

AbstractVariable number tandem repeats (VNTRs) are composed of consecutive repetitive DNA with hypervariable repeat count and composition. They include protein coding sequences and associations with clinical disorders. It has been difficult to incorporate VNTR analysis in disease studies that use short-read sequencing because the traditional approach of mapping to the human reference is less effective for repetitive and divergent sequences. In this work, we solve VNTR mapping for short reads with a repeat-pangenome graph (RPGG), a data structure that encodes both the population diversity and repeat structure of VNTR loci from multiple haplotype-resolved assemblies. We develop software to build a RPGG, and use the RPGG to estimate VNTR composition with short reads. We use this to discover VNTRs with length stratified by continental population, and expression quantitative trait loci, indicating that RPGG analysis of VNTRs will be critical for future studies of diversity and disease.

Download Full-text

Disrupting upstream translation in mRNAs is associated with human disease

Nature Communications ◽

10.1038/s41467-021-21812-1 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

David S. M. Lee ◽

Joseph Park ◽

Andrew Kromer ◽

Aris Baras ◽

Daniel J. Rader ◽

...

Keyword(s):

Protein Expression ◽

Biological Significance ◽

Ribosome Profiling ◽

Open Reading Frames ◽

Protein Coding ◽

Stop Codons ◽

Human Genes ◽

Strong Negative Selection ◽

Disease Associations ◽

Reading Frames

AbstractRibosome-profiling has uncovered pervasive translation in non-canonical open reading frames, however the biological significance of this phenomenon remains unclear. Using genetic variation from 71,702 human genomes, we assess patterns of selection in translated upstream open reading frames (uORFs) in 5’UTRs. We show that uORF variants introducing new stop codons, or strengthening existing stop codons, are under strong negative selection comparable to protein-coding missense variants. Using these variants, we map and validate gene-disease associations in two independent biobanks containing exome sequencing from 10,900 and 32,268 individuals, respectively, and elucidate their impact on protein expression in human cells. Our results suggest translation disrupting mechanisms relating uORF variation to reduced protein expression, and demonstrate that translation at uORFs is genetically constrained in 50% of human genes.

Download Full-text

Meiotic expression of human ornithine transcarbamylase in the testes of transgenic mice

Molecular and Cellular Biology ◽

10.1128/mcb.8.4.1821-1825.1988 ◽

1988 ◽

Vol 8 (4) ◽

pp. 1821-1825

Author(s):

K A Kelley ◽

J W Chamberlain ◽

J A Nolan ◽

A L Horwich ◽

F Kalousek ◽

...

Keyword(s):

Transgenic Mice ◽

Fusion Gene ◽

Regulatory Region ◽

Transgenic Animals ◽

Ornithine Transcarbamylase ◽

Regulatory Sequences ◽

Base Pairs ◽

Protein Coding ◽

Wide Range ◽

Flanking Sequences

In an attempt to use mouse metallothionein-I (mMT-I) regulatory sequences to direct expression of human ornithine transcarbamylase in the liver of transgenic animals, fusion genes joining either 1.6 kilobases or 185 base pairs of the mMT-I regulatory region to the human ornithine transcarbamylase protein-coding sequence were used to produce transgenic mice. In mice carrying the fusion gene with 1.6 kilobases of the mMT-I 5'-flanking sequences, transgene expression was observed in a wide range of tissues, but, unexpectedly, expression in liver was never observed. Surprisingly, in mice carrying the fusion gene regulated by only 185 base pairs of the mMT-I 5'-flanking sequences, the transgene was expressed exclusively in male germ cells during the tetraploid, pachytene stage of meiosis.

Download Full-text

Link Between Short tandem Repeats and Translation Initiation Site Selection

10.1101/316950 ◽

2018 ◽

Author(s):

M Arabfard ◽

K Kavousi ◽

A Delbari ◽

M Ohadi

Keyword(s):

Amino Acids ◽

Translation Initiation ◽

Short Tandem Repeats ◽

Tandem Repeats ◽

Initiation Site ◽

Translation Initiation Site ◽

Vertebrate Species ◽

Protein Coding ◽

Human Specific ◽

Short Tandem

AbstractRecent work in yeast and humans suggest that evolutionary divergence in cis-regulatory sequences impact translation initiation sites (TISs). Cis-elements can also affect the efficacy and amount of protein synthesis. Despite their vast biological implication, the landscape and relevance of short tandem repeats (STRs)/microsatellites to the human protein-coding gene TISs remain largely unknown. Here we characterized the STR distribution at the 120 bp cDNA sequence upstream of all annotated human protein-coding gene TISs based on the Ensembl database. Furthermore, we performed a comparative genomics study of all annotated orthologous TIS-flanking sequences across 47 vertebrate species (755,956 transcripts), aimed at identifying human-specific STRs in this interval. We also hypothesized that STRs may be used as genetic codes for the initiation of translation. The initial five amino acid sequences (excluding the initial methionine) that were flanked by STRs in human were BLASTed against the initial orthologous five amino acids in other vertebrate species (2,025,817 pair-wise TIS comparisons) in order to compare the number of events in which human-specific and non-specific STRs occurred with homologous and non-homologous TISs (i.e. ≥50% and <50% similarity of the five amino acids). We characterized human-specific STRs and a bias of this compartment in comparison to the overall (human-specific and non-specific) distribution of STRs (Mann Whitney p=1.4 × 10−11). We also found significant enrichment of non-homologous TISs flanked by human-specific STRs (p<0.00001). In conclusion, our data indicate a link between STRs and TIS selection, which is supported by differential evolution of the human-specific STRs in the TIS upstream flanking sequence.AbbreviationscDNAComplementary DNACDSCoding DNA sequenceSTRShort Tandem RepeatTISTranslation Initiation SiteTSSTranscription Start Site

Download Full-text

Genome Size Estimation and Full-Length Transcriptome of Sphingonotus tsinlingensis: Genetic Background of a Drought-Adapted Grasshopper

Frontiers in Genetics ◽

10.3389/fgene.2021.678625 ◽

2021 ◽

Vol 12 ◽

Author(s):

Lu Zhao ◽

Hang Wang ◽

Ping Li ◽

Kuo Sun ◽

De-Long Guan ◽

...

Keyword(s):

Genome Size ◽

Genetic Background ◽

Tandem Repeats ◽

Full Length ◽

Arid Environments ◽

Size Estimation ◽

Sequencing Data ◽

Protein Coding ◽

Grasshopper Species ◽

Hsp Genes

Sphingonotus Fieber, 1852 (Orthoptera: Acrididae), is a grasshopper genus comprising approximately 170 species, all of which prefer dry environments such as deserts, steppes, and stony benchlands. In this study, we aimed to examine the adaptation of grasshopper species to arid environments. The genome size of Sphingonotus tsinlingensis was estimated using flow cytometry, and the first high-quality full-length transcriptome of this species was produced. The genome size of S. tsinlingensis is approximately 12.8 Gb. Based on 146.98 Gb of PacBio sequencing data, 221.47 Mb full-length transcripts were assembled. Among these, 88,693 non-redundant isoforms were identified with an N50 value of 2,726 bp, which was markedly longer than previous grasshopper transcriptome assemblies. In total, 48,502 protein-coding sequences were identified, and 37,569 were annotated using public gene function databases. Moreover, 36,488 simple tandem repeats, 12,765 long non-coding RNAs, and 414 transcription factors were identified. According to gene functions, 61 cytochrome P450 (CYP450) and 66 heat shock protein (HSP) genes, which may be associated with drought adaptation of S. tsinlingensis, were identified. We compared the transcriptomes of S. tsinlingensis and two other grasshopper species which were less tolerant to drought, namely Mongolotettix japonicus and Gomphocerus licenti. We observed the expression of CYP450 and HSP genes in S. tsinlingensis were higher. We produced the first full-length transcriptome of a Sphingonotus species that has an ultra-large genome. The assembly characteristics were better than those of all known grasshopper transcriptomes. This full-length transcriptome may thus be used to understand the genetic background and evolution of grasshoppers.

Download Full-text

Overexpression-based detection of translatable circular RNAs is vulnerable to coexistent linear RNA byproducts

10.1101/2021.03.23.433163 ◽

2021 ◽

Author(s):

Yanyi Jiang ◽

Xiaofan Chen ◽

Wei Zhang

Keyword(s):

Open Reading Frames ◽

Systematic Evaluation ◽

Circular Rnas ◽

Protein Coding ◽

Rolling Circle ◽

Functional Investigation ◽

Overexpression System ◽

Translation Signals ◽

Coding Potential ◽

Reading Frames

AbstractIn RNA field, the demarcation between coding and non-coding has been negotiated by the recent discovery of occasionally translated circular RNAs (circRNAs). Although absent of 5’ cap structure, circRNAs can be translated cap-independently. Complementary intron-mediated overexpression is one of the most utilized methodologies for circRNA research but not without bearing echoing skepticism for its poorly defined mechanism and latent coexistent side products. In this study, leveraging such circRNA overexpression system, we have interrogated the protein-coding potential of 30 human circRNAs containing infinite open reading frames in HEK293T cells. Surprisingly, pervasive translation signals are detected by immunoblotting. However, intensive mutagenesis reveals that numerous translation signals are generated independently of circRNA synthesis. We have developed a dual tag strategy to isolate translation noise and directly demonstrate that the fallacious translation signals originate from cryptically spliced linear transcripts. The concomitant linear RNA byproducts, presumably concatemers, can be translated to allow pseudo rolling circle translation signals, and can involve backsplicing junction (BSJ) to disqualify the BSJ-based evidence for circRNA translation. We also find non-AUG start codons may engage in the translation initiation of circRNAs. Taken together, our systematic evaluation sheds light on heterogeneous translational outputs from circRNA overexpression vector and comes with a caveat that ectopic overexpression technique necessitates extremely rigorous control setup in circRNA translation and functional investigation.

Download Full-text