A Relationship Between GC Content and Coding-Sequence Length

José L. Oliver; Antonio Marín

doi:10.1007/pl00006080

A relationship between GC content and coding-sequence length

Journal of Molecular Evolution ◽

10.1007/bf02338829 ◽

1996 ◽

Vol 43 (3) ◽

pp. 216-223 ◽

Cited By ~ 72

Author(s):

José L. Oliver ◽

Antonio Marín

Keyword(s):

Gc Content ◽

Sequence Length ◽

Coding Sequence

Download Full-text

First Complete Genome Sequence of Brucella abortus 2308 isolated from an abortion storm in a dairy farm in India

10.21203/rs.3.rs-420448/v1 ◽

2021 ◽

Author(s):

Amit Kumar ◽

Malyaj R Prajapati ◽

Surendra Upadhyay ◽

Anamika Bhordia ◽

Vinod Kumar Singh ◽

...

Keyword(s):

Genome Sequence ◽

Dna Sequences ◽

Brucella Abortus ◽

Complete Genome Sequence ◽

Complete Genome ◽

Messenger Rna ◽

Gc Content ◽

Dairy Farm ◽

Rrna Genes ◽

Sequence Length

Abstract The present report communicates the first complete genome sequence of Brucella abortus 2308 strain isolated from a an abortion storm in a dairy farm located at Kanpur, Uttar Pradesh in India. It caused the last trimester abortions of 32 animals out of 100 cows in a dairy over a period of 60 days. The bacteria were isolated in pure culture from the placenta of aborted cows. The genome sequence length of isolated bacteria is 3,285,606 bp with a 57.25 % GC content, an N50 value of 296,426, L50 value of 4 containing 3,119 coding DNA sequences (CDSs), 49 tRNAs, 1 transfer messenger RNA (mRNA), and 3 rRNA genes. It is the first report of Brucella abortus 2308 isolation and complete genome sequence from Indian subcontinent.

Download Full-text

Genetic removal of p70 S6K1 corrects coding sequence length-dependent alterations in mRNA translation in fragile X syndrome mice

10.1101/2020.04.26.062281 ◽

2020 ◽

Author(s):

Sameer Aryal ◽

Francesco Longo ◽

Eric Klann

Keyword(s):

Protein Synthesis ◽

Fragile X Syndrome ◽

Fragile X ◽

Mrna Translation ◽

Ribosome Profiling ◽

Sequence Length ◽

Rna Seq ◽

Double Knockout ◽

Coding Sequence ◽

Mental Retardation Protein

AbstractLoss of the fragile X mental retardation protein (FMRP) causes fragile X syndrome (FXS). FMRP is widely thought to repress protein synthesis, but its translational targets and modes of control remain in dispute. We previously showed that genetic removal of p70 S6 kinase 1 (S6K1) corrects altered protein synthesis as well as synaptic and behavioral phenotypes in FXS mice. In this study, we examined the gene-specificity of altered mRNA translation in FXS and the mechanism of rescue with genetic reduction of S6K1 by carrying out ribosome profiling and RNA-Seq on cortical lysates from wild-type, FXS, S6K1 knockout, and double knockout mice. We observed reduced ribosome footprint abundance in the majority of differentially translated genes in the cortices of FXS mice. We used molecular assays to discover evidence that the reduction in ribosome footprint abundance reflects an increased rate of ribosome translocation, which is captured as a decrease in the number of translating ribosomes at steady state, and is normalized by inhibition of S6K1. We also found that genetic removal of S6K1 prevented a positive-to-negative gradation of alterations in translation efficiencies (RF/mRNA) with coding sequence length across mRNAs in FXS mouse cortices. Our findings reveal the identities of dysregulated mRNAs and a molecular mechanism by which reduction of S6K1 prevents altered translation in FXS.

Download Full-text

Marked intra-genomic variation and pseudogenes in the ITS1-5.8S-ITS2 rDNA of Symphurus plagiusa (Pleuronectiformes: Cynoglossidae)

Animal Biology ◽

10.1163/15707563-17000134 ◽

2018 ◽

Vol 68 (4) ◽

pp. 353-365 ◽

Cited By ~ 1

Author(s):

Li Gong ◽

Wei Shi ◽

Min Yang ◽

Xiaoyu Kong

Keyword(s):

Gc Content ◽

Rdna Sequence ◽

Genomic Variation ◽

Internal Transcribed Spacers ◽

Sequence Length ◽

Its2 Rdna ◽

Rdna Cluster ◽

Specific Level ◽

Its2 Rdna Sequence ◽

Symphurus Plagiusa

Abstract The eukaryotic ribosomal DNA (rDNA) cluster consists of multiple copies of three genes (18S, 5.8S, and 28S rDNA) and two internal transcribed spacers (ITS1 and ITS2). In recent years, an increasing number of rDNA sequence polymorphisms have been identified in numerous species. In the present study, we provide 33 complete ITS (ITS1-5.8S-ITS2) sequences from two Symphurus plagiusa individuals. To the best of our knowledge, these sequences are the first detailed information on ITS sequences in Pleuronectiformes. Here, two divergent types (Type A and B) of the ITS1-5.8S-ITS2 rDNA sequence were found, which mainly differ in sequence length, GC content, nucleotide diversity (π), secondary structure and minimum free energy. The ITS1-5.8S-ITS2 rDNA sequence of Type B was speculated to be a putative pseudogene according to pseudogene identification criteria. Cluster analysis showed that sequences from the same type clustered into one group and two major groups were formed. The high degree of ITS1-5.8S-ITS2 sequence polymorphism at the intra-specific level indicated that the S. plagiusa genome has evolved in a non-concerted evolutionary manner. These results not only provide useful data for ribosomal pseudogene identification, but also further contribute to the study of rDNA evolution in teleostean genomes.

Download Full-text

A novel framework for evaluating the performance of codon usage bias metrics

Journal of The Royal Society Interface ◽

10.1098/rsif.2017.0667 ◽

2018 ◽

Vol 15 (138) ◽

pp. 20170667 ◽

Cited By ~ 3

Author(s):

Sophia S. Liu ◽

Adam J. Hockenberry ◽

Michael C. Jewett ◽

Luís A. N. Amaral

Keyword(s):

Codon Usage ◽

Dna Sequences ◽

Codon Usage Bias ◽

False Negative ◽

Gc Content ◽

Sequence Length ◽

Protein Coding ◽

Cellular Processes ◽

Negative Findings ◽

Measured Effect

The unequal utilization of synonymous codons affects numerous cellular processes including translation rates, protein folding and mRNA degradation. In order to understand the biological impact of variable codon usage bias (CUB) between genes and genomes, it is crucial to be able to accurately measure CUB for a given sequence. A large number of metrics have been developed for this purpose, but there is currently no way of systematically testing the accuracy of individual metrics or knowing whether metrics provide consistent results. This lack of standardization can result in false-positive and false-negative findings if underpowered or inaccurate metrics are applied as tools for discovery. Here, we show that the choice of CUB metric impacts both the significance and measured effect sizes in numerous empirical datasets, raising questions about the generality of findings in published research. To bring about standardization, we developed a novel method to create synthetic protein-coding DNA sequences according to different models of codon usage. We use these benchmark sequences to identify the most accurate and robust metrics with regard to sequence length, GC content and amino acid heterogeneity. Finally, we show how our benchmark can aid the development of new metrics by providing feedback on its performance compared to the state of the art.

Download Full-text

An Evolution Model for Sequence Length Based on Residue Insertion–Deletion Independent of Substitution: An Application to the GC Content in Bacterial Genomes

Bulletin of Mathematical Biology ◽

10.1007/s11538-012-9735-z ◽

2012 ◽

Vol 74 (8) ◽

pp. 1764-1788 ◽

Cited By ~ 3

Author(s):

Sophie Lèbre ◽

Christian J. Michel

Keyword(s):

Gc Content ◽

Evolution Model ◽

Sequence Length ◽

Bacterial Genomes

Download Full-text

Determination and Structural Analysis of the Whole-Genome Sequence of Fusarium equiseti D25-1

10.21203/rs.2.24663/v1 ◽

2020 ◽

Author(s):

Xueping LI ◽

Jianhong Li ◽

Yonghong Qi ◽

Yonggang Liu ◽

Minquan Li

Keyword(s):

Molecular Mechanisms ◽

Gc Content ◽

Whole Genome Sequence ◽

Effective Control ◽

Sequence Length ◽

Comparative Genomic ◽

Whole Genome ◽

Illumina Hiseq ◽

Fusarium Equiseti ◽

Wide Range

Abstract BackgroundFusarium equiseti is a plant pathogen with a wide range of hosts and diverse effects, including probiotic activity. However, the underlying molecular mechanisms remain unclear, hindering its effective control and utilization. In this study, the Illumina HiSeq 4000 and PacBio platforms were used to sequence and assemble the whole genome of Fusarium equiseti D25-1.ResultsThe assembly included 16 fragments with a GC content of 48.01%, gap number of zero, and size of 40,776,005 bp. There were 40,110 exons and 26,281 introns having a total size of 19,787,286 bp and 2,290,434 bp, respectively. The genome had an average copy number of 333, 71, 69, 31, and 108 for tRNAs, rRNAs, sRNAs, snRNAs, and miRNAs, respectively. The total repetitive sequence length was 1,713,918 bp, accounting for 4.2033% of the genome. In total, 13,134 functional genes were annotated, accounting for 94.97% of the total gene number. Toxin-related genes, including two related to zearalenone and 23 related to trichothecene, were identified. A comparative genomic analysis supported the high quality of the F. equiseti assembly, exhibiting good collinearity with the reference strains, 3,483 species-specific genes, and 1,805 core genes. A gene family analysis revealed more than 2,500 single-copy orthologs. F. equiseti was most closely related to Fusarium pseudograminearum based on a phylogenetic analysis at the whole-genome level.ConclusionsOur comprehensive analysis of the whole genome of F. equiseti provides basic data for studies of gene expression, regulatory and functional mechanisms, evolutionary processes, as well as disease prevention and control.

Download Full-text

A Nascent Peptide Code for Translational Control of mRNA Stability in Human Cells

10.1101/2021.12.01.470782 ◽

2021 ◽

Author(s):

Phillip C. Burke ◽

Heungwon Park ◽

Arvind Rasi Subramaniam

Keyword(s):

Amino Acids ◽

Mrna Stability ◽

Translational Control ◽

Gc Content ◽

Human Cells ◽

Sequence Motifs ◽

Coding Sequence ◽

Ribosome Stalling ◽

Nascent Peptide

AbstractStability of eukaryotic mRNAs is associated with their codon, amino acid, and GC content. Yet, coding sequence motifs that predictably alter mRNA stability in human cells remain poorly defined. Here, we develop a massively parallel assay to measure mRNA effects of thousands of synthetic and endogenous coding sequence motifs in human cells. We identify several families of simple dipeptide repeats whose translation triggers acute mRNA instability. Rather than individual amino acids, specific combinations of bulky and positively charged amino acids are critical for the destabilizing effects of dipeptide repeats. Remarkably, dipeptide sequences that form extended β strands in silico and in vitro drive ribosome stalling and mRNA instability in vivo. The resulting nascent peptide code underlies ribosome stalling and mRNA-destabilizing effects of hundreds of endogenous peptide sequences in the human proteome. Our work reveals an intrinsic role for the ribosome as a selectivity filter against the synthesis of bulky and aggregation-prone peptides.

Download Full-text

Holo-Transcriptome Sequences from the Tropical Marine Sponge Cinachyrella alloclada

Journal of Heredity ◽

10.1093/jhered/esab075 ◽

2021 ◽

Author(s):

Yvain Desplat ◽

Jacob F Warner ◽

Jose V Lopez

Keyword(s):

Marine Sponge ◽

High Throughput Sequencing ◽

Gc Content ◽

Model Organism ◽

Sequence Length ◽

Microbial Abundance ◽

Average Sequence Length ◽

The Common ◽

Transcriptome Sequences ◽

Average Sequence

Abstract Marine sponge transcriptomes are underrepresented in current databases. Furthermore, only two sponge genomes are available for comparative studies. Here we present the assembled and annotated holo-transcriptome of the common Florida reef sponge from the species Cinachyrella alloclada. After Illumina high throughput sequencing, the data assembled using Trinity v2.5 confirmed a highly symbiotic organism, with the complexity of high microbial abundance (HMA) sponges. This dataset is enriched in poly-A selected eukaryotic, rather than microbial transcripts. Overall, 39,813 transcripts with verified sponge sequence homology coded for 8,496 unique proteins. The average sequence length was found to be 946 bp with an N50 sequence length of 1290 bp. Overall, the sponge assembly resulted in a GC content of 51.04%, which is within the range of GC bases in a eukaryotic transcriptome. BUSCO scored completeness analysis revealed a completeness of 60.3% and 60.1% based on the Eukaryota and Metazoa databases, respectively. Overall, this study points to an overarching goal of developing the Cinachyrella alloclada sponge as a useful new experimental model organism.

Download Full-text

The effects of sequence length and composition of random sequence peptides on the growth of E. coli cells

10.1101/2021.11.22.469569 ◽

2021 ◽

Author(s):

Johana R. C. Fajardo ◽

Diethard Tautz

Keyword(s):

Cell Growth ◽

De Novo ◽

Random Sequence ◽

Large Fraction ◽

Gc Content ◽

Sequence Length ◽

E Coli ◽

Aggregation Propensity ◽

Evolutionary Innovation ◽

Frequency Changes

We study the potential for the de novo evolution of genes from random nucleotide sequences using libraries of E. coli expressing random sequence peptides. We assess the effects of such peptides on cell growth by monitoring frequency changes of individual clones in a complex library through four serial passages. Using a new analysis pipeline that allows to trace peptides of all lengths, we find that over half of the peptides have consistent effects on cell growth. Across nine different experiments, around 16 % of clones increase in frequency and 36 % decrease, with some variation between individual experiments. Shorter peptides (8 - 20 residues), are more likely to increase in frequency, longer ones are more likely to decrease. GC content, amino acid composition, intrinsic dis-order and aggregation propensity show slightly different patterns between peptide groups. Sequences that increase in frequency tend to be more disordered with lower aggregation propensity. This coincides with the observation that young genes with more disordered structures are better tolerated in genomes. Our data indicate that random sequences can be a source of evolutionary innovation, since a large fraction of them are well tolerated by the cells or can provide a growth advantage.

Download Full-text