scholarly journals Analysis of Similarity/Dissimilarity of DNA Sequences Based on Chaos Game Representation

2013 ◽  
Vol 2013 ◽  
pp. 1-6 ◽  
Author(s):  
Wei Deng ◽  
Yihui Luan

The Chaos Game is an algorithm that can allow one to produce pictures of fractal structures. Considering that the four bases A, G, C, and T of DNA sequences can be divided into three classes according to their chemical structure, we propose different kinds of CGR-walk sequences. Based on CGR coordinates of random sequences, we introduce some invariants for the DNA primary sequences. As an application, we can make the examination of similarity/dissimilarity among the first exon ofβ-globin gene of different species. The results indicate that our method is efficient and can get more biological information.

Fractals ◽  
2006 ◽  
Vol 14 (01) ◽  
pp. 27-35 ◽  
Author(s):  
TOMOYA SUZUKI ◽  
TOHRU IKEGUCHI ◽  
MASUO SUZUKI

Iterative function systems are often used for investigating fractal structures. The method is also referred as Chaos Game Representation (CGR), and is applied for representing characteristic structures of DNA sequences visually. In this paper, we proposed an original way of plotting CGR to easily confirm the property of the temporal evaluation of a time series. We also showed existence of spurious characteristic structures of time series, if we carelessly applied the CGR to real time series. We revealed that the source of spurious identification came from non-uniformity of the frequency histograms of the time series, which is often the case of analyzing real time series. We also showed how to avoid such spurious identification by applying the method of surrogate data and introducing conditional probabilities of the time series.


2007 ◽  
Vol 15 (03) ◽  
pp. 287-297 ◽  
Author(s):  
JIE SONG

A new 3D graphical representation of DNA sequences according to chemical structures of the bases is proposed, reflecting the distribution of bases with different chemical structure, preserving information on sequential adjacency of bases, avoiding loss of information accompanying alternative 3D representations in which the curve standing for DNA overlaps and intersects itself. Based on this representation, a numerical characterization approach is presented by constructing a six-component vector whose components are the normalized leading eigenvalues of the L/L matrices associated with the DNA sequences. The examination of similarities among the coding sequences of the first exon of β-globin gene of different species illustrates the utility of the approach.


2016 ◽  
Vol 2016 ◽  
pp. 1-7 ◽  
Author(s):  
Bhagwan N. Rekadwad ◽  
Juan M. Gonzalez ◽  
Chandrahasya N. Khobragade

A total of five highly related strains of an unidentified marine bacterium were analyzed through their short genome sequences (AM260709–AM260713). Genome-to-Genome Distance (GGDC) showed high similarity to Pseudoalteromonas haloplanktis (X67024). The generated unique Quick Response (QR) codes indicated no identity to other microbial species or gene sequences. Chaos Game Representation (CGR) showed the number of bases concentrated in the area. Guanine residues were highest in number followed by cytosine. Frequency of Chaos Game Representation (FCGR) indicated that CC and GG blocks have higher frequency in the sequence from the evaluated marine bacterium strains. Maximum GC content for the marine bacterium strains ranged 53-54%. The use of QR codes, CGR, FCGR, and GC dataset helped in identifying and interpreting short genome sequences from specific isolates. A phylogenetic tree was constructed with the bootstrap test (1000 replicates) using MEGA6 software. Principal Component Analysis (PCA) was carried out using EMBL-EBI MUSCLE program. Thus, generated genomic data are of great assistance for hierarchical classification in Bacterial Systematics which combined with phenotypic features represents a basic procedure for a polyphasic approach on unambiguous bacterial isolate taxonomic classification.


2020 ◽  
Vol 20 (2) ◽  
pp. e11
Author(s):  
Vicente Enrique Machaca Arceda

Viral subtyping classification is very relevant for the appropriate diagnosis and treatment of illnesses. The most used tools are based on alignment-based methods, nevertheless, they are becoming too slow with the increase of genomic data. For that reason, alignment-free methods have emerged as an alternative. In this work, we analyzed four alignment-free algorithms: two methods use k-mer frequencies (Kameris and Castor-KRFE); the third method used a frequency chaos game representation of a DNA with CNNs; finally the last one, process DNA sequences as a digital signal (ML-DSP). From the comparison, Kameris and Castor-KRFE outperformed the rest, followed by the method based on CNNs.


Sign in / Sign up

Export Citation Format

Share Document