Analysis of Similarity/Dissimilarity of DNA Sequences Based on Chaos Game Representation

Abstract and Applied Analysis ◽

10.1155/2013/926519 ◽

2013 ◽

Vol 2013 ◽

pp. 1-6 ◽

Cited By ~ 5

Author(s):

Wei Deng ◽

Yihui Luan

Keyword(s):

Dna Sequences ◽

Chemical Structure ◽

Globin Gene ◽

Biological Information ◽

Fractal Structures ◽

Chaos Game Representation ◽

Random Sequences ◽

Analysis Of Similarity ◽

Chaos Game ◽

Game Representation

The Chaos Game is an algorithm that can allow one to produce pictures of fractal structures. Considering that the four bases A, G, C, and T of DNA sequences can be divided into three classes according to their chemical structure, we propose different kinds of CGR-walk sequences. Based on CGR coordinates of random sequences, we introduce some invariants for the DNA primary sequences. As an application, we can make the examination of similarity/dissimilarity among the first exon ofβ-globin gene of different species. The results indicate that our method is efficient and can get more biological information.

Download Full-text

APPLICATION OF CHAOS GAME REPRESENTATION TO NONLINEAR TIME SERIES ANALYSIS

Fractals ◽

10.1142/s0218348x06003064 ◽

2006 ◽

Vol 14 (01) ◽

pp. 27-35 ◽

Cited By ~ 3

Author(s):

TOMOYA SUZUKI ◽

TOHRU IKEGUCHI ◽

MASUO SUZUKI

Keyword(s):

Time Series ◽

Real Time ◽

Dna Sequences ◽

Surrogate Data ◽

Nonlinear Time Series Analysis ◽

Conditional Probabilities ◽

Fractal Structures ◽

Chaos Game Representation ◽

Chaos Game ◽

Game Representation

Iterative function systems are often used for investigating fractal structures. The method is also referred as Chaos Game Representation (CGR), and is applied for representing characteristic structures of DNA sequences visually. In this paper, we proposed an original way of plotting CGR to easily confirm the property of the temporal evaluation of a time series. We also showed existence of spurious characteristic structures of time series, if we carelessly applied the CGR to real time series. We revealed that the source of spurious identification came from non-uniformity of the frequency histograms of the time series, which is often the case of analyzing real time series. We also showed how to avoid such spurious identification by applying the method of surrogate data and introducing conditional probabilities of the time series.

Download Full-text

Evaluation of Chaos Game Representation for Comparison of DNA Sequences

Lecture Notes in Computer Science - Combinatorial Image Analysis ◽

10.1007/978-3-030-05288-1_14 ◽

2018 ◽

pp. 179-188

Author(s):

André R. S. Marcal

Keyword(s):

Dna Sequences ◽

Chaos Game Representation ◽

Chaos Game ◽

Game Representation

Download Full-text

ANALYSIS OF SIMILARITY/DISSIMILARITY OF DNA SEQUENCES BY A NEW 3D GRAPHICAL REPRESENTATION

Journal of Biological System ◽

10.1142/s0218339007002234 ◽

2007 ◽

Vol 15 (03) ◽

pp. 287-297 ◽

Cited By ~ 4

Author(s):

JIE SONG

Keyword(s):

Dna Sequences ◽

Chemical Structure ◽

Graphical Representation ◽

Globin Gene ◽

Coding Sequences ◽

Chemical Structures ◽

Numerical Characterization ◽

Analysis Of Similarity ◽

3D Graphical Representation ◽

3D Representations

A new 3D graphical representation of DNA sequences according to chemical structures of the bases is proposed, reflecting the distribution of bases with different chemical structure, preserving information on sequential adjacency of bases, avoiding loss of information accompanying alternative 3D representations in which the curve standing for DNA overlaps and intersects itself. Based on this representation, a numerical characterization approach is presented by constructing a six-component vector whose components are the normalized leading eigenvalues of the L/L matrices associated with the DNA sequences. The examination of similarities among the coding sequences of the first exon of β-globin gene of different species illustrates the utility of the approach.

Download Full-text

Multifractal analysis of DNA sequences using a novel chaos-game representation

Physica A Statistical Mechanics and its Applications ◽

10.1016/s0378-4371(01)00333-8 ◽

2001 ◽

Vol 300 (1-2) ◽

pp. 271-284 ◽

Cited By ~ 30

Author(s):

J.M. Gutiérrez ◽

M.A. Rodrı́guez ◽

G. Abramson

Keyword(s):

Dna Sequences ◽

Multifractal Analysis ◽

Chaos Game Representation ◽

Chaos Game ◽

Game Representation

Download Full-text

Wavelet-based multifractal analysis of DNA sequences by using chaos-game representation

Chinese Physics B ◽

10.1088/1674-1056/19/1/010205 ◽

2010 ◽

Vol 19 (1) ◽

pp. 010205-8 ◽

Cited By ~ 11

Author(s):

Han Jia-Jing ◽

Fu Wei-Juan

Keyword(s):

Dna Sequences ◽

Multifractal Analysis ◽

Chaos Game Representation ◽

Chaos Game ◽

Game Representation

Download Full-text

Chaos game representation (CGR)-walk model for DNA sequences

Chinese Physics B ◽

10.1088/1674-1056/18/1/060 ◽

2009 ◽

Vol 18 (1) ◽

pp. 370-376 ◽

Cited By ~ 8

Author(s):

Gao Jie ◽

Xu Zhen-Yuan

Keyword(s):

Dna Sequences ◽

Chaos Game Representation ◽

Chaos Game ◽

Game Representation

Download Full-text

Similarities Of DNA sequences based on 3D chaos game representation

2010 3rd International Conference on Biomedical Engineering and Informatics ◽

10.1109/bmei.2010.5639720 ◽

2010 ◽

Cited By ~ 1

Author(s):

Hailan Huang ◽

Long Shi

Keyword(s):

Dna Sequences ◽

Chaos Game Representation ◽

Chaos Game ◽

Game Representation

Download Full-text

Genomic Analysis of a Marine Bacterium: Bioinformatics for Comparison, Evaluation, and Interpretation of DNA Sequences

BioMed Research International ◽

10.1155/2016/7215379 ◽

2016 ◽

Vol 2016 ◽

pp. 1-7 ◽

Cited By ~ 1

Author(s):

Bhagwan N. Rekadwad ◽

Juan M. Gonzalez ◽

Chandrahasya N. Khobragade

Keyword(s):

Dna Sequences ◽

Marine Bacterium ◽

Gc Content ◽

Genomic Analysis ◽

Principal Component ◽

Genome Sequences ◽

Qr Codes ◽

Chaos Game Representation ◽

Chaos Game ◽

Game Representation

A total of five highly related strains of an unidentified marine bacterium were analyzed through their short genome sequences (AM260709–AM260713). Genome-to-Genome Distance (GGDC) showed high similarity to Pseudoalteromonas haloplanktis (X67024). The generated unique Quick Response (QR) codes indicated no identity to other microbial species or gene sequences. Chaos Game Representation (CGR) showed the number of bases concentrated in the area. Guanine residues were highest in number followed by cytosine. Frequency of Chaos Game Representation (FCGR) indicated that CC and GG blocks have higher frequency in the sequence from the evaluated marine bacterium strains. Maximum GC content for the marine bacterium strains ranged 53-54%. The use of QR codes, CGR, FCGR, and GC dataset helped in identifying and interpreting short genome sequences from specific isolates. A phylogenetic tree was constructed with the bootstrap test (1000 replicates) using MEGA6 software. Principal Component Analysis (PCA) was carried out using EMBL-EBI MUSCLE program. Thus, generated genomic data are of great assistance for hierarchical classification in Bacterial Systematics which combined with phenotypic features represents a basic procedure for a polyphasic approach on unambiguous bacterial isolate taxonomic classification.

Download Full-text

An analysis of k-mer frequency features with SVM and CNN for viral subtyping classification

Journal of Computer Science and Technology ◽

10.24215/16666038.20.e11 ◽

2020 ◽

Vol 20 (2) ◽

pp. e11

Author(s):

Vicente Enrique Machaca Arceda

Keyword(s):

Dna Sequences ◽

Genomic Data ◽

Digital Signal ◽

Diagnosis And Treatment ◽

Chaos Game Representation ◽

The Third ◽

Alignment Free ◽

Chaos Game ◽

Frequency Features ◽

Game Representation

Viral subtyping classification is very relevant for the appropriate diagnosis and treatment of illnesses. The most used tools are based on alignment-based methods, nevertheless, they are becoming too slow with the increase of genomic data. For that reason, alignment-free methods have emerged as an alternative. In this work, we analyzed four alignment-free algorithms: two methods use k-mer frequencies (Kameris and Castor-KRFE); the third method used a frequency chaos game representation of a DNA with CNNs; finally the last one, process DNA sequences as a digital signal (ML-DSP). From the comparison, Kameris and Castor-KRFE outperformed the rest, followed by the method based on CNNs.

Download Full-text

Encoding and Decoding DNA Sequences by Integer Chaos Game Representation

Journal of Computational Biology ◽

10.1089/cmb.2018.0173 ◽

2019 ◽

Vol 26 (2) ◽

pp. 143-151 ◽

Cited By ~ 3

Author(s):

Changchuan Yin

Keyword(s):

Dna Sequences ◽

Chaos Game Representation ◽

Chaos Game ◽

Game Representation

Download Full-text