Superior ab initio Identification, Annotation and Characterisation of TEs and Segmental Duplications from Genome Assemblies

Mapping Intimacies ◽

10.1101/190694 ◽

2017 ◽

Cited By ~ 2

Author(s):

Lu Zeng ◽

R. Daniel Kortschak ◽

Joy M. Raison ◽

Terry Bertozzi ◽

David L. Adelson

Keyword(s):

Transposable Elements ◽

Ab Initio ◽

Dna Sequences ◽

Repetitive Elements ◽

Segmental Duplications ◽

Evolutionary Analysis ◽

Mobile Dna ◽

Consensus Sequences ◽

New Genes ◽

Genome Assemblies

AbstractTransposable Elements (TEs) are mobile DNA sequences that make up significant fractions of amniote genomes. However, they are difficult to detect and annotate ab initio because of their variable features, lengths and clade-specific variants. We have addressed this problem by refining and developing a Comprehensive ab initio Repeat Pipeline (CARP) to identify and cluster TEs and other repetitive sequences in genome assemblies. The pipeline begins with a pairwise alignment using krishna, a custom aligner. Single linkage clustering is then carried out to produce families of repetitive elements. Consensus sequences are then filtered for protein coding genes and then annotated using Repbase and a custom library of retrovirus and reverse transcriptase sequences. This process yields three types of family: fully annotated, partially annotated and unannotated. Fully annotated families reflect recently diverged/young known TEs present in Repbase. The remaining two types of families contain a mixture of novel TEs and segmental duplications. These can be resolved by aligning these consensus sequences back to the genome to assess copy number vs. length distribution. Our pipeline has three significant advantages compared to other methods for ab initio repeat identification: 1) we generate not only consensus sequences, but keep the genomic intervals for the original aligned sequences, allowing straightforward analysis of evolutionary dynamics, 2) consensus sequences represent low-divergence, recently/currently active TE families, 3) segmental duplications are annotated as a useful by-product. We have compared our ab initio repeat annotations for 7 genome assemblies (1 unpublished) to other methods and demonstrate that CARP compares favourably with RepeatModeler, the most widely used repeat annotation package.Author summaryTransposable elements (TEs) are interspersed repetitive DNA sequences, also known as ‘jumping genes’, because of their ability to replicate in to new genomic locations. TEs account for a significant proportion of all eukaryotic genomes. Previous studies have found that TE insertions have contributed to new genes, coding sequences and regulatory regions. They also play an important role in genome evolution. Therefore, we developed a novel, ab initio approach for identifying and annotating repetitive elements. The idea is simple: define a “repeat” as any sequence that occurs at least twice in the genome. Our ab initio method is able to identify species-specific TEs with high sensitivity and accuracy including both TEs and segmental duplications. Because of the high degree of sequence identity used in our method, the TEs we find are less diverged and may still be active. We also retain all the information that links identified repeat consensus sequences to their genome intervals, permiting direct evolutionary analysis of the TE families we identify.

Download Full-text

Superior ab initio identification, annotation and characterisation of TEs and segmental duplications from genome assemblies

PLoS ONE ◽

10.1371/journal.pone.0193588 ◽

2018 ◽

Vol 13 (3) ◽

pp. e0193588 ◽

Cited By ~ 8

Author(s):

Lu Zeng ◽

R. Daniel Kortschak ◽

Joy M. Raison ◽

Terry Bertozzi ◽

David L. Adelson

Keyword(s):

Ab Initio ◽

Segmental Duplications ◽

Genome Assemblies

Download Full-text

Comparative Genomics of Copia and Gypsy Retroelements in Three Banana Genomes: A, B, and S Genomes

Pertanika Journal of Tropical Agricultural Science ◽

10.47836/pjtas.44.4.01 ◽

2021 ◽

Vol 44 (4) ◽

Author(s):

Sigit Nur Pratama ◽

Fenny Martha Dwivany ◽

Husna Nugrahapraja

Keyword(s):

Phylogenetic Analysis ◽

Transposable Elements ◽

Evolutionary Relationship ◽

Repetitive Elements ◽

B Genome ◽

Copy Numbers ◽

Multiple Alignments ◽

A Genome ◽

Genome Assemblies ◽

Relationship Of

In plants, the proportion of transposable elements (TEs) is generally dominated by long terminal repeat (LTR) retroelements. Therefore, it significantly impacts on genome expansion and genetic and phenotypic variation, namely Copia and Gypsy. Despite such contribution, TEs characterisation in an important crop such as banana [Musa balbisiana (B genome), Musa acuminata (A genome), and Musa schizocarpa (S genome)] remains poorly understood. This study aimed to compare B, A, and S genomes based on repetitive element proportions and copy numbers and determine the evolutionary relationship of LTR using phylogenetic analysis of the reverse transcriptase (RT) domain. Genome assemblies were acquired from the Banana Genome Hub (banana-genome-hub.southgreen.fr). Repetitive elements were masked by RepeatMasker 4.0.9 before Perl parsing. Phylograms were constructed according to domain analysis using DANTE (Domain-based ANnotation of Transposable Elements), alignments were made using MAFFT 7 (multiple alignments using fast Fourier transform), and trees were inferred using FastTree 2. The trees were inspected using SeaView 4 and visualised with FigTree 1.4.4. We reported that B, A, and S genomes are composed of repetitive elements with 19.38%, 20.78%, and 25.96%, respectively. The elements were identified with dominant proportions in the genome are LTR, in which Copia is more abundant than Gypsy. Based on RT phylogenetic analysis, LTR elements are clustered into 13 ancient lineages in which Sire (Copia) and Reina (Gypsy) are shown to be the most abundant LTR lineages in bananas.

Download Full-text

A Field Guide to Eukaryotic Transposable Elements

Annual Review of Genetics ◽

10.1146/annurev-genet-040620-022145 ◽

2020 ◽

Vol 54 (1) ◽

pp. 539-561 ◽

Cited By ~ 1

Author(s):

Jonathan N. Wells ◽

Cédric Feschotte

Keyword(s):

Genetic Variation ◽

Transposable Elements ◽

Dna Sequences ◽

Genetic Factors ◽

Mobile Dna ◽

Substantial Fraction ◽

Evolutionary Origins ◽

Unique Biology ◽

Dramatic Variation ◽

Eukaryotic Genomes

Transposable elements (TEs) are mobile DNA sequences that propagate within genomes. Through diverse invasion strategies, TEs have come to occupy a substantial fraction of nearly all eukaryotic genomes, and they represent a major source of genetic variation and novelty. Here we review the defining features of each major group of eukaryotic TEs and explore their evolutionary origins and relationships. We discuss how the unique biology of different TEs influences their propagation and distribution within and across genomes. Environmental and genetic factors acting at the level of the host species further modulate the activity, diversification, and fate of TEs, producing the dramatic variation in TE content observed across eukaryotes. We argue that cataloging TE diversity and dissecting the idiosyncratic behavior of individual elements are crucial to expanding our comprehension of their impact on the biology of genomes and the evolution of species.

Download Full-text

Transposable elements in individual genotypes of Drosophila simulans

10.1101/781419 ◽

2019 ◽

Author(s):

Sarah Signor

Keyword(s):

Transposable Elements ◽

Transposable Element ◽

Dna Sequences ◽

Copy Number ◽

Natural Populations ◽

Population Level ◽

Drosophila Simulans ◽

Mobile Dna ◽

Open Question ◽

Number Of Individuals

AbstractTransposable elements are mobile DNA sequences that are able to copy themselves within a host’s genome. Within insects they often make up a substantial proportion of the genome. While they are the subject of intense research, often times when copy number is estimated it is estimated only at the population level, or in a limited number of individuals within a population. However, an important aspect of transposable element spread is the variance between individuals in activity. Do transposable elements accumulate at different rates in different genetic backgrounds? Using two populations of Drosophila simulans from California and Africa I estimated transposable element copy number in individual genotypes. Some active transposable elements seem to be a property of the species, while others of the populations. I find that in addition to population level differences in transposable element load certain genotypes accumulate transposable elements at a much higher rate than others. Most likely active transposable elements are fairly rare, and were inherited only by specific genotypes that were used to create the inbred lines. Whether or not this reflects dynamics in natural populations, where transposable elements may accumulate in specific genotypes and maintain themselves in the population rather than being active at low levels population wide, is an open question.

Download Full-text

Jump around: transposons in and out of the laboratory

F1000Research ◽

10.12688/f1000research.21018.1 ◽

2020 ◽

Vol 9 ◽

pp. 135

Author(s):

Anuj Kumar

Keyword(s):

Transposable Elements ◽

Dna Sequences ◽

Major Constituent ◽

Mobile Dna ◽

Mutagenic Potential ◽

Complex Interactions ◽

Wide Range ◽

Transposon Insertions ◽

The Impact

Since Barbara McClintock’s groundbreaking discovery of mobile DNA sequences some 70 years ago, transposable elements have come to be recognized as important mutagenic agents impacting genome composition, genome evolution, and human health. Transposable elements are a major constituent of prokaryotic and eukaryotic genomes, and the transposition mechanisms enabling transposon proliferation over evolutionary time remain engaging topics for study, suggesting complex interactions with the host, both antagonistic and mutualistic. The impact of transposition is profound, as over 100 human heritable diseases have been attributed to transposon insertions. Transposition can be highly mutagenic, perturbing genome integrity and gene expression in a wide range of organisms. This mutagenic potential has been exploited in the laboratory, where transposons have long been utilized for phenotypic screening and the generation of defined mutant libraries. More recently, barcoding applications and methods for RNA-directed transposition are being used towards new phenotypic screens and studies relevant for gene therapy. Thus, transposable elements are significant in affecting biology both in vivo and in the laboratory, and this review will survey advances in understanding the biological role of transposons and relevant laboratory applications of these powerful molecular tools.

Download Full-text

Why did the Tc1-like elements of mollusks acquired the spliceosomal introns?

10.1101/656579 ◽

2019 ◽

Author(s):

M.V. Puzakov ◽

L.V. Puzakova ◽

S.V. Cheresiz

Keyword(s):

Transposable Elements ◽

Dna Sequences ◽

Inverted Repeats ◽

Considerable Influence ◽

Dna Transposons ◽

Spliceosomal Introns ◽

New Genes ◽

Eukaryotic Genes ◽

Low Copy Number ◽

Diverse Groups

AbstractTransposable elements are the DNA sequences capable of transpositions within the genome and, thus, exerting a considerable influence on the genome functioning and structure and providing the source of new genes. Transposable elements are classified into retrotransposons and the DNA transposons. IS630/Tc1/mariner superfamily of DNA transposons is one of the most diverse groups broadly represented among the eukaryotes. We identified a new group of Tc1-like elements in the mollusks, which we named TLEWI. These DNA transposons are characterized by the low copy number, the lack of terminal inverted repeats and the presence of DD36E signature and the spliceosomal introns in transposase sequence. Their prevalence among the mollusks is limited to subclass Pteriomorpha (Bivalvia). Since TLEWI possess the features of domesticated TE and the structure similar to the eukaryotic genes, which is not typical for the DNA transposons, we consider the hypothesis of co-optation of TLEWI gene by the bivalves.

Download Full-text

satDNA Analyzer 1.2 as a Valuable Computing Tool for Evolutionary Analysis of Satellite-DNA Families: Revisiting Y-Linked Satellite-DNA Sequences of Rumex (Polygonaceae)

Bioinformatics Research and Development - Lecture Notes in Computer Science ◽

10.1007/978-3-540-71233-6_11 ◽

2007 ◽

pp. 131-139

Author(s):

Rafael Navajas-Pérez ◽

Manuel Ruiz Rejón ◽

Manuel Garrido-Ramos ◽

José Luis Aznarte ◽

Cristina Rubio-Escudero

Keyword(s):

Dna Sequences ◽

Satellite Dna ◽

Evolutionary Analysis

Download Full-text

Genome-wide identification and evolutionary analysis of RLKs involved in the response to aluminium stress in peanut

BMC Plant Biology ◽

10.1186/s12870-021-03031-4 ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Xin Wang ◽

Ming-Hua Wu ◽

Dong Xiao ◽

Ruo-Lan Huang ◽

Jie Zhan ◽

...

Keyword(s):

Stress Responses ◽

Segmental Duplication ◽

Tandem Duplication ◽

Expression Patterns ◽

Purifying Selection ◽

Segmental Duplications ◽

Evolutionary Analysis ◽

Gene Pairs ◽

Al Stress ◽

Peanut Genome

Abstract Background As an important cash crop, the yield of peanut is influenced by soil acidification and pathogen infection. Receptor-like protein kinases play important roles in plant growth, development and stress responses. However, little is known about the number, location, structure, molecular phylogeny, and expression of RLKs in peanut, and no comprehensive analysis of RLKs in the Al stress response in peanuts have been reported. Results A total of 1311 AhRLKs were identified from the peanut genome. The AhLRR-RLKs and AhLecRLKs were further divided into 24 and 35 subfamilies, respectively. The AhRLKs were randomly distributed across all 20 chromosomes in the peanut. Among these AhRLKs, 9.53% and 61.78% originated from tandem duplications and segmental duplications, respectively. The ka/ks ratios of 96.97% (96/99) of tandem duplication gene pairs and 98.78% (646/654) of segmental duplication gene pairs were less than 1. Among the tested tandem duplication clusters, there were 28 gene conversion events. Moreover, all total of 90 Al-responsive AhRLKs were identified by mining transcriptome data, and they were divided into 7 groups. Most of the Al-responsive AhRLKs that clustered together had similar motifs and evolutionarily conserved structures. The gene expression patterns of these genes in different tissues were further analysed, and tissue-specifically expressed genes, including 14 root-specific Al-responsive AhRLKs were found. In addition, all 90 Al-responsive AhRLKs which were distributed unevenly in the subfamilies of AhRLKs, showed different expression patterns between the two peanut varieties (Al-sensitive and Al-tolerant) under Al stress. Conclusions In this study, we analysed the RLK gene family in the peanut genome. Segmental duplication events were the main driving force for AhRLK evolution, and most AhRLKs subject to purifying selection. A total of 90 genes were identified as Al-responsive AhRLKs, and the classification, conserved motifs, structures, tissue expression patterns and predicted functions of Al-responsive AhRLKs were further analysed and discussed, revealing their putative roles. This study provides a better understanding of the structures and functions of AhRLKs and Al-responsive AhRLKs.

Download Full-text

TBP and SNAP50 transcription factors bind specifically to the Pr77 promoter sequence from trypanosomatid non-LTR retrotransposons

Parasites & Vectors ◽

10.1186/s13071-021-04803-5 ◽

2021 ◽

Vol 14 (1) ◽

Author(s):

Francisco Macías ◽

Raquel Afonso-Lehmann ◽

Patricia E. Carreira ◽

M. Carmen Thomas

Keyword(s):

Transcription Factors ◽

Protein Interactions ◽

Dna Sequences ◽

Direct Interaction ◽

Promoter Sequence ◽

Nuclear Proteins ◽

Hill Coefficient ◽

Small Nuclear Rna ◽

Mobile Dna ◽

Mobility Shift

Abstract Background Trypanosomatid genomes are colonized by active and inactive mobile DNA elements, such as LINE, SINE-like, SIDER and DIRE retrotransposons. These elements all share a 77-nucleotide-long sequence at their 5′ ends, known as Pr77, which activates transcription, thereby generating abundant unspliced and translatable transcripts. However, transcription factors that mediates this process have still not been reported. Methods TATA-binding protein (TBP) and small nuclear RNA-activating protein 50 kDa (SNAP50) recombinant proteins and specific antibodies raised against them were generated. Protein capture assay, electrophoretic mobility-shift assays (EMSA) and EMSA competition assays carried out using these proteins and nuclear proteins of the parasite together to specific DNA sequences used as probes allowed detecting direct interaction of these transcription factors to Pr77 sequence. Results This study identified TBP and SNAP50 as part of the DNA-protein complex formed by the Pr77 promoter sequence and nuclear proteins of Trypanosoma cruzi. TBP establishes direct and specific contact with the Pr77 sequence, where the DPE and DPE downstream regions are docking sites with preferential binding. TBP binds cooperatively (Hill coefficient = 1.67) to Pr77 and to both strands of the Pr77 sequence, while the conformation of this highly structured sequence is not involved in TBP binding. Direct binding of SNAP50 to the Pr77 sequence is weak and may be mediated by protein–protein interactions through other trypanosomatid nuclear proteins. Conclusions Identification of the transcription factors that mediate Pr77 transcription may help to elucidate how these retrotransposons are mobilized within the trypanosomatid genomes and their roles in gene regulation processes in this human parasite. Graphic abstract

Download Full-text

Insights into functional and evolutionary analysis of carbaryl metabolic pathway from Pseudomonas sp. strain C5pp

Scientific Reports ◽

10.1038/srep38430 ◽

2016 ◽

Vol 6 (1) ◽

Cited By ~ 16

Author(s):

Vikas D. Trivedi ◽

Pramod Kumar Jangir ◽

Rakesh Sharma ◽

Prashant S. Phale

Keyword(s):

Draft Genome ◽

Degradation Pathway ◽

Evolutionary Analysis ◽

Extradiol Dioxygenase ◽

Transfer Event ◽

New Family ◽

New Genes ◽

Catabolic Genes ◽

Soil Isolate ◽

Genes Encoding

Abstract Carbaryl (1-naphthyl N-methylcarbamate) is a most widely used carbamate pesticide in the agriculture field. Soil isolate, Pseudomonas sp. strain C5pp mineralizes carbaryl via 1-naphthol, salicylate and gentisate, however the genetic organization and evolutionary events of acquisition and assembly of pathway have not yet been studied. The draft genome analysis of strain C5pp reveals that the carbaryl catabolic genes are organized into three putative operons, ‘upper’, ‘middle’ and ‘lower’. The sequence and functional analysis led to identification of new genes encoding: i) hitherto unidentified 1-naphthol 2-hydroxylase, sharing a common ancestry with 2,4-dichlorophenol monooxygenase; ii) carbaryl hydrolase, a member of a new family of esterase; and iii) 1,2-dihydroxy naphthalene dioxygenase, uncharacterized type-II extradiol dioxygenase. The ‘upper’ pathway genes were present as a part of a integron while the ‘middle’ and ‘lower’ pathway genes were present as two distinct class-I composite transposons. These findings suggest the role of horizontal gene transfer event(s) in the acquisition and evolution of the carbaryl degradation pathway in strain C5pp. The study presents an example of assembly of degradation pathway for carbaryl.

Download Full-text