Structural-Statistical Properties of the Flavivirus Genomes

Ж.С. Тюлько; Zh.S. Tyulko

doi:10.17537/2017.12.343

Structural-Statistical Properties of the Flavivirus Genomes

Математическая биология и биоинформатика ◽

10.17537/2017.12.343 ◽

2017 ◽

Vol 12 (2) ◽

pp. 343-353 ◽

Cited By ~ 1

Author(s):

Ж.С. Тюлько ◽

Zh.S. Tyulko

Keyword(s):

Mutation Rate ◽

Statistical Approach ◽

Simple Structure ◽

Statistical Properties ◽

Structural Proteins ◽

Genome Sequences ◽

Coding Sequences ◽

Coding Sequence ◽

Coding Regions ◽

Eukaryotic Genomes

Essential structural-statistical properties of coding regions in the genomes of flaviviruses are investigated on base of the Spectral-statistical approach. Both full-length polyprotein coding sequences and their separated structural segments are analyzed. On the whole, structural-statistical properties of the flavivirus genome sequences are shown to be similar to the properties of 3-regularity and latent triplet profile periodicity revealed earlier in the coding regions of prokaryotic and eukaryotic genomes. However, two-level organization of coding is not occurred in discrete segments coding for structural proteins in the flavivirus genomes and property of sequence homogeneity is manifested in significant part of such the segments. These coding sequence particularities are explained by simple structure and high mutation rate of the flavivirus genomes.

Download Full-text

Structural-Statistical Properties of DNA Coding Regions

Математическая биология и биоинформатика ◽

10.17537/2015.10.387 ◽

2015 ◽

Vol 10 (2) ◽

pp. 387-397 ◽

Cited By ~ 1

Author(s):

В.А. Кутыркин ◽

V.A. Kutyrkin

Keyword(s):

Human Genome ◽

Dna Sequences ◽

Statistical Approach ◽

Statistical Properties ◽

Statistical Characteristics ◽

Dna Coding ◽

Coding Regions ◽

Unknown Type ◽

Triplet Periodicity ◽

Special Meaning

Structural-statistical characteristics of the coding DNA sequences (CDSs) from human genome are investigated in the frame of spectral-statistical approach (the 2S-approach). Properties of 3-regularity and latent profile periodicity are among of such the characteristics. Special meaning and intrinsic existence of these properties are confirmed by researching the binary recoded CDSs. The only one kind of singular recoding, that identifies complementary nucleotides, serves to persistence of the original CDSs characteristics. Usage of nonsingular binary recoding proves a statement that latent triplet periodicity in the CDSs of human genome belongs to earlier unknown type called as profile periodicity.

Download Full-text

Mutation analysis of structural and non-structural proteins indicated that the death rate of COVID-19 pandemic may significantly reduce by the end of 2020

10.31219/osf.io/gubys ◽

2020 ◽

Author(s):

Bhaskar Bhadra ◽

Saakshi Jalali ◽

Santanu Dasgupta

Keyword(s):

Mutation Rate ◽

Death Rate ◽

Global Economy ◽

Surface Glycoprotein ◽

World Health ◽

Structural Proteins ◽

Mutation Rates ◽

Genome Sequences ◽

Specific Proteins ◽

Health Organization

The outbreak of the infectious and rapidly expanding coronavirus disease 19 (COVID-19) caused by the SARS-CoV-2 virus has led to a devastating effect on public health and the global economy. The daily country-wise updates from the World Health Organization on number of infected cases and death rates show diverse statistics. In this study, we performed a comparative analysis between the COVID-19 death rate and mutation rate for selected structural and non-structural proteins. A total of 7200 genome sequences of SARS-CoV-2 virus from 49 countries were investigated. The mutation rate of specific proteins of SARS-CoV-2 over the last four months (April – July, 2020) was correlated with the death rate across various countries. From our findings, we suggest a significant correlation between the mutation rates of NSP6 and Surface glycoprotein with the death rate. Additional analysis of cumulative mutation rate of these two proteins with the death rate of three major clusters helped us to hypothesize that mutations of these 2 proteins will grow consistently while the death rate would drop below 0.5% by end of 2020 in cluster I countries. Hence, we propose that with the current mutation rate trend, COVID-19 death rate would significantly weaken by the end of this year.

Download Full-text

Unravelling the hidden inter and intra-varietal diversity of durum wheat commercial varieties used in Portugal

Plant Genetic Resources ◽

10.1017/s1479262119000133 ◽

2019 ◽

Vol 17 (04) ◽

pp. 386-389

Author(s):

Miguel Bento ◽

Sónia Gomes Pereira ◽

Wanda Viegas ◽

Manuela Silva

Keyword(s):

Durum Wheat ◽

Repetitive Sequences ◽

Inter Simple Sequence Repeat ◽

Wheat Breeding ◽

Genomic Diversity ◽

Varietal Diversity ◽

Wheat Varieties ◽

Coding Sequences ◽

Coding Regions ◽

High Level

AbstractAssessing durum wheat genomic diversity is crucial in a changing environmental particularly in the Mediterranean region where it is largely used to produce pasta. Durum wheat varieties cultivated in Portugal and previously assessed regarding thermotolerance ability were screened for the variability of coding sequences associated with technological traits and repetitive sequences. As expected, reduced variability was observed regarding low molecular weight glutenin subunits (LMW-GS) but a specific LMW-GS allelic form associated with improved pasta-making characteristics was absent in one variety. Contrastingly, molecular markers targeting repetitive elements like microsatellites and retrotransposons – Inter Simple Sequence Repeat (ISSR) and Inter Retrotransposons Amplified Polymorphism (IRAP) – disclosed significant inter and intra-varietal diversity. This high level of polymorphism was revealed by the 20 distinct ISSR/IRAP concatenated profiles observed among the 23 individuals analysed. Interestingly, median joining networks and PCoA analysis grouped individuals of the same variety and clustered varieties accordingly with geographical origin. Globally, this work demonstrates that durum wheat breeding strategies induced selection pressure for some relevant coding sequences while maintaining high levels of genomic variability in non-coding regions enriched in repetitive sequences.

Download Full-text

A Comprehensive, Flexible Collection of SARS-CoV-2 Coding Regions

G3 Genes|Genome|Genetics ◽

10.1534/g3.120.401554 ◽

2020 ◽

Vol 10 (9) ◽

pp. 3399-3402 ◽

Cited By ~ 5

Author(s):

Dae-Kyum Kim ◽

Jennifer J Knapp ◽

Da Kuang ◽

Aditya Chawla ◽

Patricia Cassonnet ◽

...

Keyword(s):

Life Cycle ◽

Viral Life Cycle ◽

Coding Sequences ◽

Coding Regions ◽

Global Pandemic ◽

The World ◽

Molecular Studies ◽

Widespread Availability ◽

Rapid Transfer

Abstract The world is facing a global pandemic of COVID-19 caused by the SARS-CoV-2 coronavirus. Here we describe a collection of codon-optimized coding sequences for SARS-CoV-2 cloned into Gateway-compatible entry vectors, which enable rapid transfer into a variety of expression and tagging vectors. The collection is freely available. We hope that widespread availability of this SARS-CoV-2 resource will enable many subsequent molecular studies to better understand the viral life cycle and how to block it.

Download Full-text

Draft Genome Sequences of 10 Clinical K2-Type Klebsiella pneumoniae Strains Isolated in Russia

Microbiology Resource Announcements ◽

10.1128/mra.01023-18 ◽

2018 ◽

Vol 7 (14) ◽

Cited By ~ 1

Author(s):

Nikolay V. Volozhantsev ◽

Angelina A. Kislichkina ◽

Anastasia I. Lev ◽

Ekaterina V. Solovieva ◽

Vera P. Myakinina ◽

...

Keyword(s):

Intensive Care Unit ◽

Klebsiella Pneumoniae ◽

Draft Genome ◽

Capsular Type ◽

Genome Sequences ◽

Protein Coding ◽

Coding Sequences ◽

Content Type ◽

Neurosurgical Intensive Care ◽

Neurosurgical Intensive Care Unit

We report here the genome sequences of 10 Klebsiella pneumoniae strains of capsular type K2 isolated in Russia from patients in an infectious clinical hospital and neurosurgical intensive care unit. The draft genome sizes range from 5.34 to 5.87 Mb and include 5,448 to 6,137 protein-coding sequences.

Download Full-text

Genome Sequence of Rheinheimera salexigens sp. nov. Isolated from a Fishing Hook off O‘ahu, Hawai‘i

Genome Announcements ◽

10.1128/genomea.01390-16 ◽

2016 ◽

Vol 4 (6) ◽

Cited By ~ 1

Author(s):

Xuehua Wan ◽

Shaobin Hou ◽

Kazukuni Hayashi ◽

James Anderson ◽

Stuart P. Donachie

Keyword(s):

Genome Sequence ◽

Draft Genome ◽

Draft Genome Sequence ◽

Protein Coding ◽

Coding Sequences ◽

Coding Regions ◽

Roche 454

Rheinheimera salexigens KH87 T is an obligately halophilic gammaproteobacterium. The strain’s draft genome sequence, generated by the Roche 454 GS FLX+ platform, comprises two scaffolds of ~3.4 Mbp and ~3 kbp, with 3,030 protein-coding sequences and 58 tRNA coding regions. The G+C content is 42 mol%.

Download Full-text

A Database of Potential Reading Frame Shifts in Coding Sequences from Different Eukaryotic Genomes

BIOPHYSICS ◽

10.1134/s0006350919030217 ◽

2019 ◽

Vol 64 (3) ◽

pp. 339-348

Author(s):

Yu. M. Suvorova ◽

V. M. Pugacheva ◽

E. V. Korotkov

Keyword(s):

Coding Sequences ◽

Reading Frame ◽

Potential Reading ◽

Eukaryotic Genomes

Download Full-text

PATACSDB - The database of polyA translational attenuators in coding sequences

10.7287/peerj.preprints.1557v1 ◽

2015 ◽

Author(s):

Malgorzata Habich ◽

Sergej Djuranovic ◽

Pawel Szczesny

Keyword(s):

Gene Dosage ◽

Interesting Case ◽

Minimal Length ◽

Regulatory Mechanisms ◽

Coding Sequences ◽

Translation Apparatus ◽

Recent Addition ◽

Dosage Balance ◽

The Stability ◽

Eukaryotic Genomes

Recent addition to the repertoire of gene expression regulatory mechanisms are polyadenylate (polyA) tracks encoding for poly-lysine runs in protein sequences. Such tracks stall translation apparatus and induce frameshifting independently of the effects of charged nascent poly-lysine sequence on the ribosome exit channel. As such they substantially influence the stability of mRNA and amount of protein produced from a given transcript. Single base changes in these regions are enough to exert a measurable response on both protein and mRNA abundance, and makes each of these sequences potentially interesting case studies for effects of synonymous mutation, gene dosage balance and natural frameshifting. Here we present the PATACSDB, a resource that contain comprehensive list of polyA tracks from over 250 eukaryotic genomes. Our data is based on Ensembl genomic database of coding sequences and filtered with algorithm of 12A-1 which selects sequences of polyA tracks with a minimal length of 12 A's allowing for one mismatched base. The PATACSDB database is accesible at: http://sysbio.ibb.waw.pl/patacsdb. Source code is available for download from GitHub repository at http://github.com/habich/PATACSDB, including the scripts to recreate the database from the scratch on user's own computer.

Download Full-text

The unique coding sequence ofpmoCABoperon from type Ia methanotrophs simultaneously optimizes transcription and translation

10.1101/543546 ◽

2019 ◽

Author(s):

Juan C. Villada ◽

Maria F. Duran ◽

Patrick K. H. Lee

Keyword(s):

Methane Oxidation ◽

Meta Analysis ◽

Nucleotide Composition ◽

Global Scale ◽

Translation Efficiency ◽

Coding Sequences ◽

Coding Sequence ◽

Conserved Sequence ◽

Type Ia ◽

Optimal Resource

Understanding the interplay between genotype and phenotype is a fundamental goal of functional genomics. Methane oxidation is a microbial phenotype with global-scale significance as part of the carbon biogeochemical cycle, and is a sink for greenhouse gas. Microorganisms that oxidize methane (methanotrophs) are taxonomically diverse and widespread around the globe. Recent reports have suggested that type Ia methanotrophs are the most prevalent methane-oxidizing bacteria in different environments. In methanotrophic bacteria, complete methane oxidation is encoded in four operons (pmoCAB, mmoXYZBCD, mxaFI, andxoxF), but how evolution has shaped these genes to execute methane oxidation remains poorly understood. Here, we used a genomic meta-analysis to investigate the coding sequences that encode methane oxidation. By analyzing isolate and metagenome-assembled genomes from phylogenetically and geographically diverse sources, we detected an anomalous nucleotide composition bias in the coding sequences of particulate methane monooxygenase genes (pmoCAB) from type Ia methanotrophs around the globe. We found that this was a highly conserved sequence that optimizes codon usage in order to maximize translation efficiency and accuracy, while minimizing the synthesis cost of transcripts and proteins. We show that among the seven types of methanotrophs, only type Ia methanotrophs possess a unique coding sequence of thepmoCABoperon that is under positive selection for optimal resource allocation and efficient synthesis of transcripts and proteins in environmental counter gradients with high oxygen and low methane concentrations. This adaptive trait possibly enables type Ia methanotrophs to respond robustly to fluctuating methane availability and explains their global prevalence.

Download Full-text

Distinctive functional regime of endogenous lncRNAs in dark regions of human genome

10.1101/2020.12.06.413880 ◽

2020 ◽

Author(s):

Anyou Wang ◽

Rong Hai

Keyword(s):

Human Genome ◽

Rna Processing ◽

Self Regulation ◽

Post Translational Modification ◽

Protein Coding ◽

Noncoding Regions ◽

Coding Regions ◽

Rnaseq Data ◽

Response To Stress ◽

Eukaryotic Genomes

AbstractEukaryotic genomes gradually gain noncoding regions when advancing evolution and human genome actively transcribes >90% of its noncoding regions1, suggesting their criticality in evolutionary human genome. Yet <1% of them have been functionally characterized2, leaving most human genome in dark. Here we systematically decode endogenous lncRNAs located in unannotated regions of human genome and decipher a distinctive functional regime of lncRNAs hidden in massive RNAseq data. LncRNAs divergently distribute across chromosomes, independent of protein-coding regions. Their transcriptions barely initiate on promoters through polymerase II, but mostly on enhancers. Yet conventional enhancer activators(e.g. H3K4me1) only account for a small proportion of lncRNA activation, suggesting alternatively unknown mechanisms initiating the majority of lncRNAs. Meanwhile, lncRNA-self regulation also notably contributes to lncRNA activation. LncRNAs trans-regulate broad bioprocesses, including transcription and RNA processing, cell cycle, respiration, response to stress, chromatin organization, post-translational modification, and development. Overall lncRNAs govern their owned regime distinctive from protein’s.

Download Full-text