scholarly journals Exploring Repetitive DNA Landscapes Using REPCLASS, a Tool That Automates the Classification of Transposable Elements in Eukaryotic Genomes

2009 ◽  
Vol 1 ◽  
pp. 205-220 ◽  
Author(s):  
Cédric Feschotte ◽  
Umeshkumar Keswani ◽  
Nirmal Ranganathan ◽  
Marcel L. Guibotsy ◽  
David Levine
Genes ◽  
2019 ◽  
Vol 10 (12) ◽  
pp. 1014 ◽  
Author(s):  
Ana Paço ◽  
Renata Freitas ◽  
Ana Vieira-da-Silva

Eukaryotic genomes are rich in repetitive DNA sequences grouped in two classes regarding their genomic organization: tandem repeats and dispersed repeats. In tandem repeats, copies of a short DNA sequence are positioned one after another within the genome, while in dispersed repeats, these copies are randomly distributed. In this review we provide evidence that both tandem and dispersed repeats can have a similar organization, which leads us to suggest an update to their classification based on the sequence features, concretely regarding the presence or absence of retrotransposons/transposon specific domains. In addition, we analyze several studies that show that a repetitive element can be remodeled into repetitive non-coding or coding sequences, suggesting (1) an evolutionary relationship among DNA sequences, and (2) that the evolution of the genomes involved frequent repetitive sequence reshuffling, a process that we have designated as a “DNA remodeling mechanism”. The alternative classification of the repetitive DNA sequences here proposed will provide a novel theoretical framework that recognizes the importance of DNA remodeling for the evolution and plasticity of eukaryotic genomes.


2019 ◽  
Author(s):  
Ren-Gang Zhang ◽  
Zhao-Xuan Wang ◽  
Shujun Ou ◽  
Guang-Yuan Li

AbstractSummaryTransposable elements (TEs) constitute an import part in eukaryotic genomes, but their classification, especially in the lineage or clade level, is still challenging. For this purpose, we propose TEsorter, which is based on conserved protein domains of TEs. It is easy-to-use, fast with multiprocessing, sensitive and precise to classify TEs especially LTR retrotransposons (LTR-RTs). Its results can also directly reflect phylogenetic relationships and diversities of the classified LTR-RTs.AvailabilityThe code in Python is freely available at https://github.com/zhangrengang/TEsorter.


2012 ◽  
Vol 34 (8) ◽  
pp. 1009-1019
Author(s):  
Hong-En XU ◽  
Hua-Hao ZHANG ◽  
Min-Jin HAN ◽  
Yi-Hong SHEN ◽  
Xian-Zhi HUANG ◽  
...  

Author(s):  
Murilo Horacio Pereira da Cruz ◽  
Douglas Silva Domingues ◽  
Priscila Tiemi Maeda Saito ◽  
Alexandre Rossi Paschoal ◽  
Pedro Henrique Bugatti

Abstract Transposable elements (TEs) are the most represented sequences occurring in eukaryotic genomes. Few methods provide the classification of these sequences into deeper levels, such as superfamily level, which could provide useful and detailed information about these sequences. Most methods that classify TE sequences use handcrafted features such as k-mers and homology-based search, which could be inefficient for classifying non-homologous sequences. Here we propose an approach, called transposable elements pepresentation learner (TERL), that preprocesses and transforms one-dimensional sequences into two-dimensional space data (i.e., image-like data of the sequences) and apply it to deep convolutional neural networks. This classification method tries to learn the best representation of the input data to classify it correctly. We have conducted six experiments to test the performance of TERL against other methods. Our approach obtained macro mean accuracies and F1-score of 96.4% and 85.8% for superfamilies and 95.7% and 91.5% for the order sequences from RepBase, respectively. We have also obtained macro mean accuracies and F1-score of 95.0% and 70.6% for sequences from seven databases into superfamily level and 89.3% and 73.9% for the order level, respectively. We surpassed accuracy, recall and specificity obtained by other methods on the experiment with the classification of order level sequences from seven databases and surpassed by far the time elapsed of any other method for all experiments. Therefore, TERL can learn how to predict any hierarchical level of the TEs classification system and is about 20 times and three orders of magnitude faster than TEclass and PASTEC, respectively https://github.com/muriloHoracio/TERL. Contact:[email protected]


PeerJ ◽  
2019 ◽  
Vol 7 ◽  
pp. e8311 ◽  
Author(s):  
Simon Orozco-Arias ◽  
Gustavo Isaza ◽  
Romain Guyot ◽  
Reinel Tabares-Soto

Background Transposable elements (TEs) constitute the most common repeated sequences in eukaryotic genomes. Recent studies demonstrated their deep impact on species diversity, adaptation to the environment and diseases. Although there are many conventional bioinformatics algorithms for detecting and classifying TEs, none have achieved reliable results on different types of TEs. Machine learning (ML) techniques can automatically extract hidden patterns and novel information from labeled or non-labeled data and have been applied to solving several scientific problems. Methodology We followed the Systematic Literature Review (SLR) process, applying the six stages of the review protocol from it, but added a previous stage, which aims to detect the need for a review. Then search equations were formulated and executed in several literature databases. Relevant publications were scanned and used to extract evidence to answer research questions. Results Several ML approaches have already been tested on other bioinformatics problems with promising results, yet there are few algorithms and architectures available in literature focused specifically on TEs, despite representing the majority of the nuclear DNA of many organisms. Only 35 articles were found and categorized as relevant in TE or related fields. Conclusions ML is a powerful tool that can be used to address many problems. Although ML techniques have been used widely in other biological tasks, their utilization in TE analyses is still limited. Following the SLR, it was possible to notice that the use of ML for TE analyses (detection and classification) is an open problem, and this new field of research is growing in interest.


2021 ◽  
Vol 22 (2) ◽  
pp. 602
Author(s):  
Elisa Carotti ◽  
Federica Carducci ◽  
Adriana Canapa ◽  
Marco Barucca ◽  
Samuele Greco ◽  
...  

Transposable elements (TEs) represent a considerable fraction of eukaryotic genomes, thereby contributing to genome size, chromosomal rearrangements, and to the generation of new coding genes or regulatory elements. An increasing number of works have reported a link between the genomic abundance of TEs and the adaptation to specific environmental conditions. Diadromy represents a fascinating feature of fish, protagonists of migratory routes between marine and freshwater for reproduction. In this work, we investigated the genomes of 24 fish species, including 15 teleosts with a migratory behaviour. The expected higher relative abundance of DNA transposons in ray-finned fish compared with the other fish groups was not confirmed by the analysis of the dataset considered. The relative contribution of different TE types in migratory ray-finned species did not show clear differences between oceanodromous and potamodromous fish. On the contrary, a remarkable relationship between migratory behaviour and the quantitative difference reported for short interspersed nuclear (retro)elements (SINEs) emerged from the comparison between anadromous and catadromous species, independently from their phylogenetic position. This aspect is likely due to the substantial environmental changes faced by diadromous species during their migratory routes.


2021 ◽  
Author(s):  
Matias Rodriguez ◽  
Wojciech Makałowski

AbstractTransposable elements (TEs) are major genomic components in most eukaryotic genomes and play an important role in genome evolution. However, despite their relevance the identification of TEs is not an easy task and a number of tools were developed to tackle this problem. To better understand how they perform, we tested several widely used tools for de novo TE detection and compared their performance on both simulated data and well curated genomic sequences. The results will be helpful for identifying common issues associated with TE-annotation and for evaluating how comparable are the results obtained with different tools.


Sign in / Sign up

Export Citation Format

Share Document