scholarly journals The sequence of a large L1Md element reveals a tandemly repeated 5' end and several features found in retrotransposons.

1986 ◽  
Vol 6 (1) ◽  
pp. 168-182 ◽  
Author(s):  
D D Loeb ◽  
R W Padgett ◽  
S C Hardies ◽  
W R Shehee ◽  
M B Comer ◽  
...  

The complete nucleotide sequence of a 6,851-base pair (bp) member of the L1Md repetitive family from a selected random isolate of the BALB/c mouse genome is reported here. Five kilobases of the element contains two overlapping reading frames of 1,137 and 3,900 bp. The entire 3,900-bp frame and the 3' 600 bp of the 1,137-bp frame, when compared with a composite consensus primate L1 sequence, show a ratio of replacement to silent site differences characteristic of protein coding sequences. This more closely defines the protein coding capacity of this repetitive family, which was previously shown to possess a large open reading frame of undetermined extent. The relative organization of the 1,137- and 3,900-bp reading frames, which overlap by 14 bp, bears resemblance to protein-coding, mobile genetic elements. Homology can be found between the amino acid sequence of the 3,900-bp frame and selected domains of several reverse transcriptases. The 5' ends of the two L1Md elements described in this report have multiple copies, 4 2/3 copies and 1 2/3 copy, of a 208-bp direct tandem repeat. The sequence of this 208-bp element differs from the sequence of a previously defined 5' end for an L1Md element, indicating that there are at least two different 5' end motifs for L1Md.

1986 ◽  
Vol 6 (1) ◽  
pp. 168-182
Author(s):  
D D Loeb ◽  
R W Padgett ◽  
S C Hardies ◽  
W R Shehee ◽  
M B Comer ◽  
...  

The complete nucleotide sequence of a 6,851-base pair (bp) member of the L1Md repetitive family from a selected random isolate of the BALB/c mouse genome is reported here. Five kilobases of the element contains two overlapping reading frames of 1,137 and 3,900 bp. The entire 3,900-bp frame and the 3' 600 bp of the 1,137-bp frame, when compared with a composite consensus primate L1 sequence, show a ratio of replacement to silent site differences characteristic of protein coding sequences. This more closely defines the protein coding capacity of this repetitive family, which was previously shown to possess a large open reading frame of undetermined extent. The relative organization of the 1,137- and 3,900-bp reading frames, which overlap by 14 bp, bears resemblance to protein-coding, mobile genetic elements. Homology can be found between the amino acid sequence of the 3,900-bp frame and selected domains of several reverse transcriptases. The 5' ends of the two L1Md elements described in this report have multiple copies, 4 2/3 copies and 1 2/3 copy, of a 208-bp direct tandem repeat. The sequence of this 208-bp element differs from the sequence of a previously defined 5' end for an L1Md element, indicating that there are at least two different 5' end motifs for L1Md.


2021 ◽  
Author(s):  
Laura Munoz-Baena ◽  
Art Poon

Gene overlap occurs when two or more genes are encoded by the same nucleotides. This phenomenon is found in all taxonomic domains, but is particularly common in viruses, where it may increase the information content of compact genomes or influence the creation of new genes. Here we report a global comparative study of overlapping reading frames (OvRFs) of 12,609 virus reference genomes in the NCBI database. We retrieved metadata associated with all annotated reading frames in each genome record to calculate the number, length, and frameshift of OvRFs. Our results show that while the number of OvRFs increases with genome length, they tend to be shorter in longer genomes. The majority of overlaps involve +2 frameshifts, predominantly found in dsDNA viruses. However, the longest overlaps involve no shift in reading frame (+0), increasing the selective burden of the same nucleotide positions within codons, instead of exposing additional sites to purifying selection. Next, we develop a new graph-based representation of the distribution of OvRFs among the reading frames of genomes in a given virus family. In the absence of an unambiguous partition of reading frames by homology at this taxonomic level, we used an alignment-free k-mer based approach to cluster protein coding sequences by similarity. We connect these clusters with two types of directed edges to indicate (1) that constituent reading frames are adjacent in one or more genomes, and (2) that the reading frames overlap. These adjacency graphs not only provide a natural visualization scheme, but also a novel statistical framework for analyzing the effects of gene- and genome-level attributes on the frequencies of overlaps.


1999 ◽  
Vol 10 (04) ◽  
pp. 635-643 ◽  
Author(s):  
AGNIESZKA GIERLIK ◽  
PAWEŁ MACKIEWICZ ◽  
MARIA KOWALCZUK ◽  
STANISŁAW CEBRAT ◽  
MIROSŁAW R. DUDEK

Coding sequences of DNA generate Open Reading Frames (ORFs) inside them with much higher frequency than random DNA sequences do, especially in the antisense strand. This is a specific feature of the genetic code. Since coding sequences are selected for their length, the generated ORFs are indirect results of this selection and their length is also influenced by selection. That is why ORFs found in any genome, even much longer ones than those spontaneously generated in random DNA sequences, should be considered as two different sets of ORFs: The first one coding for proteins, the second one generated by the coding ORFs. Even intergenic sequences possess greater capacity for generating ORFs than random DNA sequences of the same nucleotide composition, which seems to be a premise that intergenic sequences were generated from coding sequences by recombinational mechanisms.


2017 ◽  
Author(s):  
Sondos Samandi ◽  
Annie V. Roy ◽  
Vivian Delcourt ◽  
Jean-François Lucier ◽  
Jules Gagnon ◽  
...  

AbstractRecent studies in eukaryotes have demonstrated the translation of alternative open reading frames (altORFs) in addition to annotated protein coding sequences (CDSs). We show that a large number of small proteins could in fact be coded by altORFs. The putative alternative proteins translated from altORFs have orthologs in many species and evolutionary patterns indicate that altORFs are particularly constrained in CDSs that evolve slowly. Thousands of predicted alternative proteins are detected in proteomic datasets by reanalysis using a database containing predicted alternative proteins. Protein domains and co-conservation analyses suggest a potential functional relationship between small and large proteins encoded in the same genes. This is illustrated with specific examples, including altMiD51, a 70 amino acid mitochondrial fission-promoting protein encoded in MiD51/Mief1/SMCR7L, a gene encoding an annotated protein promoting mitochondrial fission. Our results suggest that many coding genes code for more than one protein that are often functionally related.


2018 ◽  
Author(s):  
Anica Scholz ◽  
Florian Eggenhofer ◽  
Rick Gelhausen ◽  
Björn Grüning ◽  
Kathi Zarnack ◽  
...  

AbstractRibosome profiling (ribo-seq) provides a means to analyze active translation by determining ribosome occupancy in a transcriptome-wide manner. The vast majority of ribosome protected fragments (RPFs) resides within the protein-coding sequence of mRNAs. However, commonly reads are also found within the transcript leader sequence (TLS) (aka 5’ untranslated region) preceding the main open reading frame (ORF), indicating the translation of regulatory upstream ORFs (uORFs). Here, we present a workflow for the identification of translation-regulatory uORFs. Specifically, uORF-Tools identifies uORFs within a given dataset and generates a uORF annotation file. In addition, a comprehensive human uORF annotation file, based on 35 ribo-seq files, is provided, which can serve as an alternative input file for the workflow. To assess the translation-regulatory activity of the uORFs, stimulus-induced changes in the ratio of the RPFs residing in the main ORFs relative to those found in the associated uORFs are determined. The resulting output file allows for the easy identification of candidate uORFs, which have translation-inhibitory effects on their associated main ORFs. uORF-Tools is available as a free and open Snakemake workflow at https://github.com/Biochemistry1-FFM/uORF-Tools. It is easily installed and all necessary tools are provided in a version-controlled manner, which also ensures lasting usability. uORF-Tools is designed for intuitive use and requires only limited computing times and resources.


Genes ◽  
2020 ◽  
Vol 11 (9) ◽  
pp. 982
Author(s):  
Maksim Makarenko ◽  
Alexander Usatov ◽  
Tatiana Tatarinova ◽  
Kirill Azarin ◽  
Alexey Kovalevich ◽  
...  

The genus Helianthus is a diverse taxonomic group with approximately 50 species. Most sunflower genomic investigations are devoted to economically valuable species, e.g., H. annuus, while other Helianthus species, especially perennial, are predominantly a blind spot. In the current study, we have assembled the complete mitogenomes of two perennial species: H. grosseserratus (273,543 bp) and H. strumosus (281,055 bp). We analyzed their sequences and gene profiles in comparison to the available complete mitogenomes of H. annuus. Except for sdh4 and trnA-UGC, both perennial sunflower species had the same gene content and almost identical protein-coding sequences when compared with each other and with annual sunflowers (H. annuus). Common mitochondrial open reading frames (ORFs) (orf117, orf139, and orf334) in sunflowers and unique ORFs for H. grosseserratus (orf633) and H. strumosus (orf126, orf184, orf207) were identified. The maintenance of plastid-derived coding sequences in the mitogenomes of both annual and perennial sunflowers and the low frequency of nonsynonymous mutations point at an extremely low variability of mitochondrial DNA (mtDNA) coding sequences in the Helianthus genus.


1987 ◽  
Vol 7 (7) ◽  
pp. 2435-2443
Author(s):  
I L Andrulis ◽  
J Chen ◽  
P N Ray

Asparagine synthetase cDNAs containing the complete coding region were isolated from a human fibroblast cDNA library. DNA sequence analysis of the clones showed that the message contained one open reading frame encoding a protein of 64,400 Mr, 184 nucleotides of 5' untranslated region, and 120 nucleotides of 3' noncoding sequence. Plasmids containing the asparagine synthetase cDNAs were used in DNA-mediated transfer of genes into asparagine-requiring Jensen rat sarcoma cells. The cDNAs containing the entire protein-coding sequence expressed asparagine synthetase activity and were capable of conferring asparagine prototrophy on the Jensen rat sarcoma cells. However, cDNAs which lacked sequence for as few as 20 amino acids at the amino terminal could not rescue the cells from auxotrophy. The transferant cell lines contained multiple copies of the human asparagine synthetase cDNAs and produced human asparagine synthetase mRNA and asparagine synthetase protein. Several transferants with numerous copies of the cDNAs exhibited only basal levels of enzyme activity. Treatment of these transferant cell lines with 5-azacytidine greatly increased the expression of asparagine synthetase mRNA, protein, and activity.


1987 ◽  
Vol 7 (7) ◽  
pp. 2435-2443 ◽  
Author(s):  
I L Andrulis ◽  
J Chen ◽  
P N Ray

Asparagine synthetase cDNAs containing the complete coding region were isolated from a human fibroblast cDNA library. DNA sequence analysis of the clones showed that the message contained one open reading frame encoding a protein of 64,400 Mr, 184 nucleotides of 5' untranslated region, and 120 nucleotides of 3' noncoding sequence. Plasmids containing the asparagine synthetase cDNAs were used in DNA-mediated transfer of genes into asparagine-requiring Jensen rat sarcoma cells. The cDNAs containing the entire protein-coding sequence expressed asparagine synthetase activity and were capable of conferring asparagine prototrophy on the Jensen rat sarcoma cells. However, cDNAs which lacked sequence for as few as 20 amino acids at the amino terminal could not rescue the cells from auxotrophy. The transferant cell lines contained multiple copies of the human asparagine synthetase cDNAs and produced human asparagine synthetase mRNA and asparagine synthetase protein. Several transferants with numerous copies of the cDNAs exhibited only basal levels of enzyme activity. Treatment of these transferant cell lines with 5-azacytidine greatly increased the expression of asparagine synthetase mRNA, protein, and activity.


Sign in / Sign up

Export Citation Format

Share Document