Structural-Statistical Properties of DNA Coding Regions

В.А. Кутыркин; V.A. Kutyrkin

doi:10.17537/2015.10.387

Structural-Statistical Properties of DNA Coding Regions

Математическая биология и биоинформатика ◽

10.17537/2015.10.387 ◽

2015 ◽

Vol 10 (2) ◽

pp. 387-397 ◽

Cited By ~ 1

Author(s):

В.А. Кутыркин ◽

V.A. Kutyrkin

Keyword(s):

Human Genome ◽

Dna Sequences ◽

Statistical Approach ◽

Statistical Properties ◽

Statistical Characteristics ◽

Dna Coding ◽

Coding Regions ◽

Unknown Type ◽

Triplet Periodicity ◽

Special Meaning

Structural-statistical characteristics of the coding DNA sequences (CDSs) from human genome are investigated in the frame of spectral-statistical approach (the 2S-approach). Properties of 3-regularity and latent profile periodicity are among of such the characteristics. Special meaning and intrinsic existence of these properties are confirmed by researching the binary recoded CDSs. The only one kind of singular recoding, that identifies complementary nucleotides, serves to persistence of the original CDSs characteristics. Usage of nonsingular binary recoding proves a statement that latent triplet periodicity in the CDSs of human genome belongs to earlier unknown type called as profile periodicity.

Download Full-text

New statistical approach to discriminate between protein coding and non-coding regions in DNA sequences and its evaluation

Journal of Theoretical Biology ◽

10.1016/s0022-5193(86)80176-x ◽

1986 ◽

Vol 120 (2) ◽

pp. 223-236 ◽

Cited By ~ 23

Author(s):

Christian J. Michel

Keyword(s):

Dna Sequences ◽

Statistical Approach ◽

Protein Coding ◽

Coding Regions

Download Full-text

Structural-Statistical Properties of the Flavivirus Genomes

Математическая биология и биоинформатика ◽

10.17537/2017.12.343 ◽

2017 ◽

Vol 12 (2) ◽

pp. 343-353 ◽

Cited By ~ 1

Author(s):

Ж.С. Тюлько ◽

Zh.S. Tyulko

Keyword(s):

Mutation Rate ◽

Statistical Approach ◽

Simple Structure ◽

Statistical Properties ◽

Structural Proteins ◽

Genome Sequences ◽

Coding Sequences ◽

Coding Sequence ◽

Coding Regions ◽

Eukaryotic Genomes

Essential structural-statistical properties of coding regions in the genomes of flaviviruses are investigated on base of the Spectral-statistical approach. Both full-length polyprotein coding sequences and their separated structural segments are analyzed. On the whole, structural-statistical properties of the flavivirus genome sequences are shown to be similar to the properties of 3-regularity and latent triplet profile periodicity revealed earlier in the coding regions of prokaryotic and eukaryotic genomes. However, two-level organization of coding is not occurred in discrete segments coding for structural proteins in the flavivirus genomes and property of sequence homogeneity is manifested in significant part of such the segments. These coding sequence particularities are explained by simple structure and high mutation rate of the flavivirus genomes.

Download Full-text

DNA methylation in satellite repeats disorders

Essays in Biochemistry ◽

10.1042/ebc20190028 ◽

2019 ◽

Vol 63 (6) ◽

pp. 757-771 ◽

Cited By ~ 4

Author(s):

Claire Francastel ◽

Frédérique Magdinier

Keyword(s):

Dna Methylation ◽

Human Genome ◽

Repetitive Dna ◽

Dna Sequences ◽

Satellite Repeats ◽

Tremendous Progress ◽

Genes Encoding ◽

Dna Elements ◽

Near Future

Abstract Despite the tremendous progress made in recent years in assembling the human genome, tandemly repeated DNA elements remain poorly characterized. These sequences account for the vast majority of methylated sites in the human genome and their methylated state is necessary for this repetitive DNA to function properly and to maintain genome integrity. Furthermore, recent advances highlight the emerging role of these sequences in regulating the functions of the human genome and its variability during evolution, among individuals, or in disease susceptibility. In addition, a number of inherited rare diseases are directly linked to the alteration of some of these repetitive DNA sequences, either through changes in the organization or size of the tandem repeat arrays or through mutations in genes encoding chromatin modifiers involved in the epigenetic regulation of these elements. Although largely overlooked so far in the functional annotation of the human genome, satellite elements play key roles in its architectural and topological organization. This includes functions as boundary elements delimitating functional domains or assembly of repressive nuclear compartments, with local or distal impact on gene expression. Thus, the consideration of satellite repeats organization and their associated epigenetic landmarks, including DNA methylation (DNAme), will become unavoidable in the near future to fully decipher human phenotypes and associated diseases.

Download Full-text

Isolation and transcriptional characterization of three genes which function at start, the controlling event of the Saccharomyces cerevisiae cell division cycle: CDC36, CDC37, and CDC39

Molecular and Cellular Biology ◽

10.1128/mcb.3.5.881-891.1983 ◽

1983 ◽

Vol 3 (5) ◽

pp. 881-891

Author(s):

H J Breter ◽

J Ferguson ◽

T A Peterson ◽

S I Reed

Keyword(s):

Saccharomyces Cerevisiae ◽

Cell Division ◽

Dna Sequences ◽

Shuttle Vector ◽

Control Process ◽

Haploid Cell ◽

Loop Analysis ◽

Mrna Species ◽

Coding Regions ◽

Plasmid Library

The genes CDC36, CDC37, and CDC39, thought to function in the cell division control process in Saccharomyces cerevisiae, were isolated from a recombinant plasmid library prepared by partial digestion of S. cerevisiae genomic DNA with Sau3A and insertion into the S. cerevisiae-Escherichia coli shuttle vector YRp7. In each case, S. cerevisiae DNA sequences were identified which could complement mutant alleles of the gene in question and which could direct integration of a plasmid at the chromosomal location known to correspond to that gene. Complementing DNA segments were subcloned to remove extraneous coding regions. The coding regions corresponding to CDC36, CDC37, and CDC39 were then identified and localized by R-loop analysis. The estimated sizes of the three coding regions were 615, 1,400, and 2,700 base pairs, respectively. Transcriptional orientation of the coding regions was established by using M13 vectors to prepare strand-specific probes followed by hybridization to blots of electrophoresed S. cerevisiae mRNA. The intracellular steady-state abundance of the mRNA species corresponding to the genes was estimated by comparing hybridization signals on RNA blots to that of a previously determined standard, the cell cycle start gene CDC28. The quantities calculated for the three mRNA species were low, ranging from 1.5 +/- 1 copies per haploid cell for the CDC36 mRNA to 3.1 +/- 1.5 and 4.6 +/- 2 copies per haploid cell for the CDC37 and CDC39 mRNAs, respectively. The CDC28 mRNA had been previously estimated at 7.0 +/- 2 copies per cell.

Download Full-text

ABySS 2.0: Resource-Efficient Assembly of Large Genomes using a Bloom Filter

10.1101/068338 ◽

2016 ◽

Cited By ~ 4

Author(s):

Shaun D Jackman ◽

Benjamin P Vandervalk ◽

Hamid Mohamadi ◽

Justin Chu ◽

Sarah Yeo ◽

...

Keyword(s):

Human Genome ◽

Dna Sequences ◽

Message Passing ◽

Large Scale ◽

De Novo ◽

Bloom Filter ◽

Genomic Variation ◽

De Bruijn Graph ◽

Single Individual ◽

Probabilistic Data Structure

AbstractThe assembly of DNA sequences de novo is fundamental to genomics research. It is the first of many steps towards elucidating and characterizing whole genomes. Downstream applications, including analysis of genomic variation between species, between or within individuals critically depends on robustly assembled sequences. In the span of a single decade, the sequence throughput of leading DNA sequencing instruments has increased drastically, and coupled with established and planned large-scale, personalized medicine initiatives to sequence genomes in the thousands and even millions, the development of efficient, scalable and accurate bioinformatics tools for producing high-quality reference draft genomes is timely.With ABySS 1.0, we originally showed that assembling the human genome using short 50 bp sequencing reads was possible by aggregating the half terabyte of compute memory needed over several computers using a standardized message-passing system (MPI). We present here its re-design, which departs from MPI and instead implements algorithms that employ a Bloom filter, a probabilistic data structure, to represent a de Bruijn graph and reduce memory requirements.We present assembly benchmarks of human Genome in a Bottle 250 bp Illumina paired-end and 6 kbp mate-pair libraries from a single individual, yielding a NG50 (NGA50) scaffold contiguity of 3.5 (3.0) Mbp using less than 35 GB of RAM, a modest memory requirement by today’s standard that is often available on a single computer. We also investigate the use of BioNano Genomics and 10x Genomics’ Chromium data to further improve the scaffold contiguity of this assembly to 42 (15) Mbp.

Download Full-text

Distinctive functional regime of endogenous lncRNAs in dark regions of human genome

10.1101/2020.12.06.413880 ◽

2020 ◽

Author(s):

Anyou Wang ◽

Rong Hai

Keyword(s):

Human Genome ◽

Rna Processing ◽

Self Regulation ◽

Post Translational Modification ◽

Protein Coding ◽

Noncoding Regions ◽

Coding Regions ◽

Rnaseq Data ◽

Response To Stress ◽

Eukaryotic Genomes

AbstractEukaryotic genomes gradually gain noncoding regions when advancing evolution and human genome actively transcribes >90% of its noncoding regions1, suggesting their criticality in evolutionary human genome. Yet <1% of them have been functionally characterized2, leaving most human genome in dark. Here we systematically decode endogenous lncRNAs located in unannotated regions of human genome and decipher a distinctive functional regime of lncRNAs hidden in massive RNAseq data. LncRNAs divergently distribute across chromosomes, independent of protein-coding regions. Their transcriptions barely initiate on promoters through polymerase II, but mostly on enhancers. Yet conventional enhancer activators(e.g. H3K4me1) only account for a small proportion of lncRNA activation, suggesting alternatively unknown mechanisms initiating the majority of lncRNAs. Meanwhile, lncRNA-self regulation also notably contributes to lncRNA activation. LncRNAs trans-regulate broad bioprocesses, including transcription and RNA processing, cell cycle, respiration, response to stress, chromatin organization, post-translational modification, and development. Overall lncRNAs govern their owned regime distinctive from protein’s.

Download Full-text

Genes for intermediate filament proteins and the draft sequence of the human genome

Journal of Cell Science ◽

10.1242/jcs.114.14.2569 ◽

2001 ◽

Vol 114 (14) ◽

pp. 2569-2575 ◽

Cited By ~ 1

Author(s):

Michael Hesse ◽

Thomas M. Magin ◽

Klaus Weber

Keyword(s):

Human Genome ◽

Dna Sequences ◽

Intermediate Filament ◽

Muscle Protein ◽

Gene Families ◽

Type I ◽

Type Ii ◽

Draft Sequence ◽

Intermediate Filament Proteins ◽

Processed Pseudogenes

We screened the draft sequence of the human genome for genes that encode intermediate filament (IF) proteins in general, and keratins in particular. The draft covers nearly all previously established IF genes including the recent cDNA and gene additions, such as pancreatic keratin 23, synemin and the novel muscle protein syncoilin. In the draft, seven novel type II keratins were identified, presumably expressed in the hair follicle/epidermal appendages. In summary, 65 IF genes were detected, placing IF among the 100 largest gene families in humans. All functional keratin genes map to the two known keratin clusters on chromosomes 12 (type II plus keratin 18) and 17 (type I), whereas other IF genes are not clustered. Of the 208 keratin-related DNA sequences, only 49 reflect true keratin genes, whereas the majority describe inactive gene fragments and processed pseudogenes. Surprisingly, nearly 90% of these inactive genes relate specifically to the genes of keratins 8 and 18. Other keratin genes, as well as those that encode non-keratin IF proteins, lack either gene fragments/pseudogenes or have only a few derivatives. As parasitic derivatives of mature mRNAs, the processed pseudogenes of keratins 8 and 18 have invaded most chromosomes, often at several positions. We describe the limits of our analysis and discuss the striking unevenness of pseudogene derivation in the IF multigene family. Finally, we propose to extend the nomenclature of Moll and colleagues to any novel keratin.

Download Full-text

Fast Time Delay Neural Networks for Detecting DNA Coding Regions

Knowledge-Based and Intelligent Information and Engineering Systems - Lecture Notes in Computer Science ◽

10.1007/978-3-642-04595-0_41 ◽

2009 ◽

pp. 333-341

Author(s):

Hazem M. El-Bakry ◽

Mohamed Hamada

Keyword(s):

Neural Networks ◽

Time Delay ◽

Fast Time ◽

Dna Coding ◽

Coding Regions

Download Full-text

Coding Regions (of the Human Genome)

10.1007/springerreference_34716 ◽

2011 ◽

Keyword(s):

Human Genome ◽

Coding Regions

Download Full-text

Statistical properties of nonlinear one-dimensional wave fields

Nonlinear Processes in Geophysics ◽

10.5194/npg-12-671-2005 ◽

2005 ◽

Vol 12 (5) ◽

pp. 671-689 ◽

Cited By ~ 23

Author(s):

D. Chalikov

Keyword(s):

Surface Waves ◽

Nonlinear Wave ◽

High Accuracy ◽

Statistical Properties ◽

Statistical Characteristics ◽

Small Scale ◽

Sea Waves ◽

Wave Fields ◽

Linear Waves ◽

Stokes Waves

Abstract. A numerical model for long-term simulation of gravity surface waves is described. The model is designed as a component of a coupled Wave Boundary Layer/Sea Waves model, for investigation of small-scale dynamic and thermodynamic interactions between the ocean and atmosphere. Statistical properties of nonlinear wave fields are investigated on a basis of direct hydrodynamical modeling of 1-D potential periodic surface waves. The method is based on a nonstationary conformal surface-following coordinate transformation; this approach reduces the principal equations of potential waves to two simple evolutionary equations for the elevation and the velocity potential on the surface. The numerical scheme is based on a Fourier transform method. High accuracy was confirmed by validation of the nonstationary model against known solutions, and by comparison between the results obtained with different resolutions in the horizontal. The scheme allows reproduction of the propagation of steep Stokes waves for thousands of periods with very high accuracy. The method here developed is applied to simulation of the evolution of wave fields with large number of modes for many periods of dominant waves. The statistical characteristics of nonlinear wave fields for waves of different steepness were investigated: spectra, curtosis and skewness, dispersion relation, life time. The prime result is that wave field may be presented as a superposition of linear waves is valid only for small amplitudes. It is shown as well, that nonlinear wave fields are rather a superposition of Stokes waves not linear waves. Potential flow, free surface, conformal mapping, numerical modeling of waves, gravity waves, Stokes waves, breaking waves, freak waves, wind-wave interaction.

Download Full-text