An overview on the DNA nucleotide compositions across kingdoms

Mapping Intimacies ◽

10.1101/087569 ◽

2016 ◽

Author(s):

Yabin Guo

Keyword(s):

Codon Usage ◽

Gc Content ◽

Phylogenetic Groups ◽

Interaction Dynamics ◽

Non Coding Rna ◽

Purine Content ◽

Dna Strand ◽

Rna Genes

AbstractThe DNA nucleotide compositions vary among species. This fascinating phenomenon has been studied for decades with some interesting questions remaining unclear. Recent years, thousands of genomes have been sequenced, but general evaluations on the nucleotide compositions across different phylogenetic groups are still absent. In this letter, I analyzed 371 genomes from different kingdoms and provided an overview on DNA nucleotide compositions. A number of important topics were discussed, including GC content, DNA strand symmetricity, CDS purine content, codon usage, thermophilicity in prokaryotes and non-coding RNA genes. I also gave explanations to two long debated questions: 1) both genome GC content and CDS purine content are correlated with the thermophilicity in archaea, but not in bacteria; 2) the purine rich pattern of CDS in most species is mainly a consequence of coding requirement, but not mRNA interaction dynamics. This study provides valuable information and ideas for future investigations.

Download Full-text

Developmental stage related patterns of codon usage and genomic GC content: searching for evolutionary fingerprints with models of stem cell differentiation

Genome Biology ◽

10.1186/gb-2007-8-3-r35 ◽

2007 ◽

Vol 8 (3) ◽

pp. R35 ◽

Cited By ~ 19

Author(s):

Lichen Ren ◽

Ge Gao ◽

Dongxin Zhao ◽

Mingxiao Ding ◽

Jingchu Luo ◽

...

Keyword(s):

Stem Cell ◽

Cell Differentiation ◽

Codon Usage ◽

Developmental Stage ◽

Gc Content ◽

Stem Cell Differentiation ◽

Genomic Gc Content

Download Full-text

Analysis of Codon Usage Patterns in Giardia duodenalis Based on Transcriptome Data from GiardiaDB

Genes ◽

10.3390/genes12081169 ◽

2021 ◽

Vol 12 (8) ◽

pp. 1169

Author(s):

Xin Li ◽

Xiaocen Wang ◽

Pengtao Gong ◽

Nan Zhang ◽

Xichen Zhang ◽

...

Keyword(s):

Codon Usage ◽

Genetic Manipulation ◽

Molecular Genetic ◽

Gc Content ◽

Giardia Duodenalis ◽

Codon Usage Pattern ◽

Protein Size ◽

New Genes ◽

Optimal Codons ◽

Usage Patterns

Giardia duodenalis, a flagellated parasitic protozoan, the most common cause of parasite-induced diarrheal diseases worldwide. Codon usage bias (CUB) is an important evolutionary character in most species. However, G. duodenalis CUB remains unclear. Thus, this study analyzes codon usage patterns to assess the restriction factors and obtain useful information in shaping G. duodenalis CUB. The neutrality analysis result indicates that G. duodenalis has a wide GC3 distribution, which significantly correlates with GC12. ENC-plot result—suggesting that most genes were close to the expected curve with only a few strayed away points. This indicates that mutational pressure and natural selection played an important role in the development of CUB. The Parity Rule 2 plot (PR2) result demonstrates that the usage of GC and AT was out of proportion. Interestingly, we identified 26 optimal codons in the G. duodenalis genome, ending with G or C. In addition, GC content, gene expression, and protein size also influence G. duodenalis CUB formation. This study systematically analyzes G. duodenalis codon usage pattern and clarifies the mechanisms of G. duodenalis CUB. These results will be very useful to identify new genes, molecular genetic manipulation, and study of G. duodenalis evolution.

Download Full-text

Screening and survival analysis of melanoma immunodrug response-related genes and the function of magnetic nanoparticles in gene extraction

Materials Express ◽

10.1166/mex.2021.2037 ◽

2021 ◽

Vol 11 (8) ◽

pp. 1306-1312

Author(s):

Li Song ◽

Ningchao Du ◽

Haitao Luo ◽

Furong Li

Keyword(s):

Survival Analysis ◽

Magnetic Nanoparticles ◽

Drug Response ◽

High Throughput Sequencing ◽

Cox Proportional Hazards ◽

Sequencing Data ◽

Protein Coding ◽

Non Coding Rna ◽

Long Non Coding Rna ◽

Rna Genes

This study aimed to identify the association of protein coding and long non coding RNA genes with immunotherapy response in melanoma. Based on RNA sequencing data of melanoma specimens, the expression levels of protein coding and long non coding RNA genes were calculated using the Kallisto RNA-seq quantification method, and differently expressed genes were detected using the DESeq2 method. Cox proportional hazards regression was used to evaluate the effects of gene expression on survival. According to the clinical data of 14 patients with drug response and 11 patients without drug response, 18 protein coding genes and 14 long non coding RNAs showed differential expressions (multiple of difference > 2 and P < 0.01 after correction), among which the coding genes of differential expression were significantly enriched through the process of cell adhesion (P < 0.01). The results of survival analysis showed that 18 coding genes and 14 long non coding RNA genes had significant effects on patient survival (P < 0.01). In this study, magnetic nanoparticles can be used to extract genomic DNA and total RNA due to their paramagnetism and biocompatibility, then transcriptome high-throughput sequencing was performed. The method has the advantages of removing dangerous reagents such as phenol and chloroform, replacing inorganic coating such as silica with organic oil, and shortening reaction time. Protein coding and long non coding RNA genes as well as magnetic nanoparticles may serve as potential cancer immune biomarker targets for developing future oncological treatments.

Download Full-text

Comparative analysis of codon usage patterns in SARS-CoV-2, its mutants and other respiratory viruses

10.1101/2021.03.03.433699 ◽

2021 ◽

Author(s):

Neetu Tyagi ◽

Rahila Sardar ◽

Dinesh Gupta

Keyword(s):

Codon Usage ◽

Codon Usage Bias ◽

Gc Content ◽

Respiratory Illness ◽

Respiratory Viruses ◽

Nucleotide Composition ◽

Health Crisis ◽

Study Results ◽

Usage Patterns ◽

The Difference

AbstractThe Coronavirus disease 2019 (COVID-19) outbreak caused by Severe Acute Respiratory Syndrome Coronavirus 2 virus (SARS-CoV-2) poses a worldwide human health crisis, causing respiratory illness with a high mortality rate. To investigate the factors governing codon usage bias in all the respiratory viruses, including SARS-CoV-2 isolates from different geographical locations (~62K), including two recently emerging strains from the United Kingdom (UK), i.e., VUI202012/01 and South Africa (SA), i.e., 501.Y.V2 codon usage bias (CUBs) analysis was performed. The analysis includes RSCU analysis, GC content calculation, ENC analysis, dinucleotide frequency and neutrality plot analysis. We were motivated to conduct the study to fulfil two primary aims: first, to identify the difference in codon usage bias amongst all SARS-CoV-2 genomes and, secondly, to compare their CUBs properties with other respiratory viruses. A biased nucleotide composition was found as most of the highly preferred codons were A/U-ending in all the respiratory viruses studied here. Compared with the human host, the RSCU analysis led to the identification of 11 over-represented codons and 9 under-represented codons in SARS-CoV-2 genomes. Correlation analysis of ENC and GC3s revealed that mutational pressure is the leading force determining the CUBs. The present study results yield a better understanding of codon usage preferences for SARS-CoV-2 genomes and discover the possible evolutionary determinants responsible for the biases found among the respiratory viruses, thus unveils a unique feature of the SARS-CoV-2 evolution and adaptation. To the best of our knowledge, this is the first attempt at comparative CUBs analysis on the worldwide genomes of SARS-CoV-2, including novel emerged strains and other respiratory viruses.

Download Full-text

GENT-49. SYSTEMATIC IDENTIFICATION OF ESSENTIAL LONG NON-CODING RNA GENES IN GLIOBLASTOMA

Neuro-Oncology ◽

10.1093/neuonc/now212.354 ◽

2016 ◽

Vol 18 (suppl_6) ◽

pp. vi84-vi85

Author(s):

Siyuan Liu ◽

Max Horlbeck ◽

Seung Woo Cho ◽

Harjus Birk ◽

Martina Malatesta ◽

...

Keyword(s):

Non Coding Rna ◽

Systematic Identification ◽

Long Non Coding Rna ◽

Rna Genes

Download Full-text

Coupling Between Protein Level Selection and Codon Usage Optimization in the Evolution of Bacteria and Archaea

mBio ◽

10.1128/mbio.00956-14 ◽

2014 ◽

Vol 5 (2) ◽

Cited By ~ 25

Author(s):

Wenqi Ran ◽

David M. Kristensen ◽

Eugene V. Koonin

Keyword(s):

Codon Usage ◽

Protein Level ◽

Codon Usage Bias ◽

Protein Sequence ◽

Gc Content ◽

Protein Sequences ◽

Microbial Evolution ◽

Fine Tuning ◽

Selection For ◽

Genomic Gc Content

ABSTRACT The relationship between the selection affecting codon usage and selection on protein sequences of orthologous genes in diverse groups of bacteria and archaea was examined by using the Alignable Tight Genome Clusters database of prokaryote genomes. The codon usage bias is generally low, with 57.5% of the gene-specific optimal codon frequencies (F opt ) being below 0.55. This apparent weak selection on codon usage contrasts with the strong purifying selection on amino acid sequences, with 65.8% of the gene-specific dN/dS ratios being below 0.1. For most of the genomes compared, a limited but statistically significant negative correlation between F opt and dN/dS was observed, which is indicative of a link between selection on protein sequence and selection on codon usage. The strength of the coupling between the protein level selection and codon usage bias showed a strong positive correlation with the genomic GC content. Combined with previous observations on the selection for GC-rich codons in bacteria and archaea with GC-rich genomes, these findings suggest that selection for translational fine-tuning could be an important factor in microbial evolution that drives the evolution of genome GC content away from mutational equilibrium. This type of selection is particularly pronounced in slowly evolving, “high-status” genes. A significantly stronger link between the two aspects of selection is observed in free-living bacteria than in parasitic bacteria and in genes encoding metabolic enzymes and transporters than in informational genes. These differences might reflect the special importance of translational fine-tuning for the adaptability of gene expression to environmental changes. The results of this work establish the coupling between protein level selection and selection for translational optimization as a distinct and potentially important factor in microbial evolution. IMPORTANCE Selection affects the evolution of microbial genomes at many levels, including both the structure of proteins and the regulation of their production. Here we demonstrate the coupling between the selection on protein sequences and the optimization of codon usage in a broad range of bacteria and archaea. The strength of this coupling varies over a wide range and strongly and positively correlates with the genomic GC content. The cause(s) of the evolution of high GC content is a long-standing open question, given the universal mutational bias toward AT. We propose that optimization of codon usage could be one of the key factors that determine the evolution of GC-rich genomes. This work establishes the coupling between selection at the level of protein sequence and at the level of codon choice optimization as a distinct aspect of genome evolution.

Download Full-text

PSoL: a positive sample only learning algorithm for finding non-coding RNA genes

Bioinformatics ◽

10.1093/bioinformatics/btl441 ◽

2006 ◽

Vol 22 (21) ◽

pp. 2590-2596 ◽

Cited By ~ 56

Author(s):

C. Wang ◽

C. Ding ◽

R. F. Meraz ◽

S. R. Holbrook

Keyword(s):

Learning Algorithm ◽

Positive Sample ◽

Non Coding Rna ◽

Rna Genes

Download Full-text

Apoptotic endonuclease EndoG regulates alternative splicing of human telomerase catalytic subunit hTERT

Biomeditsinskaya Khimiya ◽

10.18097/pbmc20166205544 ◽

2016 ◽

Vol 62 (5) ◽

pp. 544-554 ◽

Cited By ~ 6

Author(s):

D.D. Zhdanov ◽

D.A. Vasina ◽

E.V. Orlova ◽

V.S. Orlova ◽

M.V. Pokrovskaya ◽

...

Keyword(s):

Alternative Splicing ◽

Telomerase Activity ◽

Catalytic Subunit ◽

Over Expression ◽

Htert Gene ◽

Template Strand ◽

Non Coding Rna ◽

Dna Strand ◽

Human Telomerase ◽

Long Non Coding Rna

Human telomerase catalytic subunit hTERT is subjected to alternative splicing results in loss of its function and leads to decrease of telomerase activity. However, very little is known about the mechanism of hTERT pre-mRNA alternative splicing. Apoptotic endonuclease EndoG is known to participate this process. The aim of this study was to determine the role of EndoG in regulation of hTERT alternative splicing. Increased expression of b-deletion splice variant was determined during EndoG over-expression in CaCo-2 cell line, after EndoG treatment of cell cytoplasm and nuclei and after nuclei incubation with EndoG digested cell RNA. hTERT alternative splicing was induced by 47-mer RNA oligonucleotide in naked nuclei and in cells after transfection. Identified long non-coding RNA, that is the precursor of 47-mer RNA oligonucleotide. Its size is 1754 nucleotides. Based on the results the following mechanism was proposed. hTERT pre-mRNA is transcribed from coding DNA strand while long non-coding RNA is transcribed from template strand of hTERT gene. EndoG digests long non-coding RNA and produces 47-mer RNA oligonucleotide complementary to hTERT pre-mRNA exon 8 and intron 8 junction place. Interaction of 47-mer RNA oligonucleotide and hTERT pre-mRNA causes alternative splicing.

Download Full-text

A Machine Learning Approach to Unmask Novel Gene Signatures and Prediction of Alzheimer’s Disease Within Different Brain Regions

10.1101/2021.03.03.433689 ◽

2021 ◽

Author(s):

Abhibhav Sharma ◽

Pinki Dey

Keyword(s):

Machine Learning ◽

Alzheimer’S Disease ◽

Alzheimer's Disease ◽

Neurodegenerative Disorder ◽

Brain Regions ◽

Middle Temporal Gyrus ◽

Non Coding Rna ◽

Machine Learning Approach ◽

Microarray Datasets ◽

Rna Genes

AbstractAlzheimer’s disease (AD) is a progressive neurodegenerative disorder whose aetiology is currently unknown. Although numerous studies have attempted to identify the genetic risk factor(s) of AD, the interpretability and/or the prediction accuracies achieved by these studies remained unsatisfactory, reducing their clinical significance. Here, we employ the ensemble of random-forest and regularized regression model (LASSO) to the AD-associated microarray datasets from four brain regions - Prefrontal cortex, Middle temporal gyrus, Hippocampus, and Entorhinal cortex- to discover novel genetic biomarkers through a machine learning-based feature-selection classification scheme. The proposed scheme unrevealed the most optimum and biologically significant classifiers within each brain region, which achieved by far the highest prediction accuracy of AD in 5-fold cross-validation (99% average). Interestingly, along with the novel and prominent biomarkers including CORO1C, SLC25A46, RAE1, ANKIB1, CRLF3, PDYN, numerous non-coding RNA genes were also observed as discriminator, of which AK057435 and BC037880 are uncharacterized long non-coding RNA genes.

Download Full-text

Formation of human long intergenic non-coding RNA genes, pseudogenes, and protein genes: Ancestral sequences are key players

PLoS ONE ◽

10.1371/journal.pone.0230236 ◽

2020 ◽

Vol 15 (3) ◽

pp. e0230236 ◽

Cited By ~ 2

Author(s):

Nicholas Delihas

Keyword(s):

Ancestral Sequences ◽

Non Coding Rna ◽

Key Players ◽

Rna Genes

Download Full-text