paralogous sequence
Recently Published Documents


TOTAL DOCUMENTS

8
(FIVE YEARS 0)

H-INDEX

4
(FIVE YEARS 0)

2020 ◽  
Vol 48 (19) ◽  
pp. e114-e114
Author(s):  
Timofey Prodanov ◽  
Vikas Bansal

Abstract The ability to characterize repetitive regions of the human genome is limited by the read lengths of short-read sequencing technologies. Although long-read sequencing technologies such as Pacific Biosciences (PacBio) and Oxford Nanopore Technologies can potentially overcome this limitation, long segmental duplications with high sequence identity pose challenges for long-read mapping. We describe a probabilistic method, DuploMap, designed to improve the accuracy of long-read mapping in segmental duplications. It analyzes reads mapped to segmental duplications using existing long-read aligners and leverages paralogous sequence variants (PSVs)—sequence differences between paralogous sequences—to distinguish between multiple alignment locations. On simulated datasets, DuploMap increased the percentage of correctly mapped reads with high confidence for multiple long-read aligners including Minimap2 (74.3–90.6%) and BLASR (82.9–90.7%) while maintaining high precision. Across multiple whole-genome long-read datasets, DuploMap aligned an additional 8–21% of the reads in segmental duplications with high confidence relative to Minimap2. Using DuploMap-aligned PacBio circular consensus sequencing reads, an additional 8.9 Mb of DNA sequence was mappable, variant calling achieved a higher F1 score and 14 713 additional variants supported by linked-read data were identified. Finally, we demonstrate that a significant fraction of PSVs in segmental duplications overlaps with variants and adversely impacts short-read variant calling.


Author(s):  
Timofey Prodanov ◽  
Vikas Bansal

AbstractThe ability to characterize repetitive regions of the human genome is limited by the read lengths of short-read sequencing technologies. Although long-read sequencing technologies such as Pacific Biosciences and Oxford Nanopore can potentially overcome this limitation, long segmental duplications with high sequence identity pose challenges for long-read mapping. We describe a probabilistic method, DuploMap, designed to improve the accuracy of long read mapping in segmental duplications. It analyzes reads mapped to segmental duplications using existing long-read aligners and leverages paralogous sequence variants (PSVs) – sequence differences between paralogous sequences – to distinguish between multiple alignment locations. On simulated datasets, Duplomap increased the percentage of correctly mapped reads with high confidence for multiple long-read aligners including Minimap2 (74.3% to 90.6%) and BLASR (82.9% to 90.7%) while maintaining high precision. Across multiple whole-genome long-read datasets, DuploMap aligned an additional 8-21% of the reads in segmental duplications with high confidence relative to Minimap2. Using Duplomap aligned PacBio CCS reads, an additional 8.9 Mbp of DNA sequence was mappable, variant calling achieved a higher F1-score and 14,713 additional variants supported by linked-read data were identified. Finally, we demonstrate that a significant fraction of PSVs in segmental duplications overlap with variants and adversely impact short-read variant calling.


2019 ◽  
Vol 158 (2) ◽  
pp. 106-113 ◽  
Author(s):  
Josiane B. Traldi ◽  
Kaline Ziemniczak ◽  
Juliana de Fátima Martinez ◽  
Daniel R. Blanco ◽  
Roberto L. Lui ◽  
...  

The karyotypes of the family Parodontidae consist of 2n = 54 chromosomes. The main chromosomal evolutionary changes of its species are attributed to chromosome rearrangements in repetitive DNA regions in their genomes. Physical mapping of the H1 and H4 histones was performed in 7 Parodontidae species to analyze the chromosome rearrangements involved in karyotype diversification in the group. In parallel, the observation of a partial sequence of an endogenous retrovirus (ERV) retrotransposon in the H1 histone sequence was evaluated to verify molecular co-option of the transposable elements (TEs) and to assess paralogous sequence dispersion in the karyotypes. Six of the studied species had an interstitial histone gene cluster in the short arm of the autosomal pair 13. Besides this interstitial cluster, in Apareiodon davisi, a probable further site was detected in the terminal region of the long arm in the same chromosome pair. The H1/H4 clusters in Parodon cf. pongoensis were located in the smallest chromosomes (pair 20). In addition, scattered H1 signals were observed on the chromosomes in all species. The H1 sequence showed an ERV in the open reading frame (ORF), and the scattered H1 signals on the chromosomes were attributed to the ERV's location. The H4 sequence had no similarity to the TEs and displayed no dispersed signals. Furthermore, the degeneration of the inner ERV in the H1 sequence (which overlapped a stretch of the H1 ORF) was discussed regarding the likelihood of molecular co-option of this retroelement in histone gene function in Parodontidae.


2008 ◽  
Vol 280 (4) ◽  
pp. 293-304 ◽  
Author(s):  
Melanie L. Hand ◽  
Rebecca C. Ponting ◽  
Michelle C. Drayton ◽  
Kahlil A. Lawless ◽  
Noel O. I. Cogan ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document