scholarly journals Error Correcting Optical Mapping Data

2018 ◽  
Author(s):  
Kingshuk Mukherjee ◽  
Darshan Washimkar ◽  
Martin D. Muggli ◽  
Leena Salmela ◽  
Christina Boucher

AbstractOptical mapping is a unique system that is capable of producing high-resolution, high-throughput genomic map data that gives information about the structure of a genome [21]. Recently it has been used for scaffolding contigs and assembly validation for large-scale sequencing projects, including the maize [32], goat [6], and amborella [4] genomes. However, a major impediment in the use of this data is the variety and quantity of errors in the raw optical mapping data, which are called Rmaps. The challenges associated with using Rmap data are analogous to dealing with insertions and deletions in the alignment of long reads. Moreover, they are arguably harder to tackle since the data is numerical and susceptible to inaccuracy. We develop cOMet to error correct Rmap data, which to the best of our knowledge is the only optical mapping error correction method. Our experimental results demonstrate that cOMet has high prevision and corrects 82.49% of insertion errors and 77.38% of deletion errors in Rmap data generated from the E. coli K-12 reference genome. Out of the deletion errors corrected, 98.26% are true errors. Similarly, out of the insertion errors corrected, 82.19% are true errors. It also successfully scales to large genomes, improving the quality of 78% and 99% of the Rmaps in the plum and goat genomes, respectively. Lastly, we show the utility of error correction by demonstrating how it improves the assembly of Rmap data. Error corrected Rmap data results in an assembly that is more contiguous, and covers a larger fraction of the genome.

2019 ◽  
Vol 36 (3) ◽  
pp. 682-689
Author(s):  
Leena Salmela ◽  
Kingshuk Mukherjee ◽  
Simon J Puglisi ◽  
Martin D Muggli ◽  
Christina Boucher

Abstract Motivation Optical mapping data is used in many core genomics applications, including structural variation detection, scaffolding assembled contigs and mis-assembly detection. However, the pervasiveness of spurious and deleted cut sites in the raw data, which are called Rmaps, make assembly and alignment of them challenging. Although there exists another method to error correct Rmap data, named cOMet, it is unable to scale to even moderately large sized genomes. The challenge faced in error correction is in determining pairs of Rmaps that originate from the same region of the same genome. Results We create an efficient method for determining pairs of Rmaps that contain significant overlaps between them. Our method relies on the novel and nontrivial adaption and application of spaced seeds in the context of optical mapping, which allows for spurious and deleted cut sites to be accounted for. We apply our method to detecting and correcting these errors. The resulting error correction method, referred to as Elmeri, improves upon the results of state-of-the-art correction methods but in a fraction of the time. More specifically, cOMet required 9.9 CPU days to error correct Rmap data generated from the human genome, whereas Elmeri required less than 15 CPU hours and improved the quality of the Rmaps by more than four times compared to cOMet. Availability and implementation Elmeri is publicly available under GNU Affero General Public License at https://github.com/LeenaSalmela/Elmeri. Supplementary information Supplementary data are available at Bioinformatics online.


2017 ◽  
Author(s):  
Yuxuan Yuan ◽  
Zbyněk Milec ◽  
Philipp E. Bayer ◽  
Jan Vrána ◽  
Jaroslav Doležel ◽  
...  

AbstractWhole genome sequencing has been widely used to detect structural variations (SVs). However, the limited single molecule size makes it difficult to characterize large-scale SVs in a genome because they cannot fully cover such vast and complex regions. Recently, optical mapping in nanochannels has provided novel resolution to detect large-scale SVs by comparing the physical location of the nickase recognition sequence in genomes. Other than in humans, SVs discovered in plants by optical mapping have not been validated. To assess the accuracy of SV calling in plants by optical mapping, we selected two genetically diverse subspecies of the Trifolium model species, subterranean clover cvs. Daliak and Yarloop. The SVs discovered by BioNano optical mapping (BOM) were validated using Illumina short reads. In the analysis, BOM identified 12 large-scale regions containing deletions and 19 containing insertions in Yarloop. The 12 large-scale regions contained 71 small deletions when validated by Illumina short reads. The results suggest that BOM could detect the total size of deletions and insertions, but it could not precisely report the location and actual quantity of SVs in the genome. Nucleotide-level validation is crucial to confirm and characterize SVs reported by optical mapping. The accuracy of SV detection by BOM is highly dependent on the quality of reference genomes and the density of selected nickases.


2021 ◽  
Author(s):  
Aurélie Canaguier ◽  
Romane Guilbaud ◽  
Erwan Denis ◽  
Ghislaine Magdelenat ◽  
Caroline Belser ◽  
...  

AbstractBackgroundStructural Variations (SVs) are very diverse genomic rearrangements. In the past, their detection was restricted to cytological approaches, then to NGS read size and partitionned assemblies. Due to the current capabilities of technologies such as long read sequencing and optical mapping, larger SVs detection are becoming more and more accessible.This study proposes a comparison in SVs detection and characterization from long-read sequencing obtained with the MinION device developed by Oxford Nanopore Technologies and from optical mapping produced by the Saphyr device commercialized by Bionano Genomics. The genomes of the two Arabidopsis thaliana ecotypes Columbia-0 (Col-0) and Landsberg erecta 1 (Ler-1) were chosen to guide the use of one or the other technology.ResultsWe described the SVs detected from the alignment of the best ONT assembly and DLE-1 optical maps of A. thaliana Ler-1 on the public reference Col-0 TAIR10.1. After filtering, 1 184 and 591 Ler-1 SVs were retained from ONT and BioNano technologies respectively. A total of 948 Ler-1 ONT SVs (80.1%) corresponded to 563 Bionano SVs (95.3%) leading to 563 common locations in both technologies. The specific locations were scrutinized to assess improvement in SV detection by either technology. The ONT SVs were mostly detected near TE and gene features, and resistance genes seemed particularly impacted.ConclusionsStructural variations linked to ONT sequencing error were removed and false positives limited, with high quality Bionano SVs being conserved. When compared with the Col-0 TAIR10.1 reference, most of detected SVs were found in same locations. ONT assembly sequence leads to more specific SVs than Bionano one, the later being more efficient to characterize large SVs. Even if both technologies are obvious complementary approaches, ONT data appears to be more adapted to large scale populations study, while Bionano performs better in improving assembly and describing specificity of a genome compared to a reference.


2009 ◽  
pp. 27-53
Author(s):  
A. Yu. Kudryavtsev

Diversity of plant communities in the nature reserve “Privolzhskaya Forest-Steppe”, Ostrovtsovsky area, is analyzed on the basis of the large-scale vegetation mapping data from 2000. The plant community classi­fication based on the Russian ecologic-phytocoenotic approach is carried out. 12 plant formations and 21 associations are distinguished according to dominant species and a combination of ecologic-phytocoenotic groups of species. A list of vegetation classification units as well as the characteristics of theshrub and woody communities are given in this paper.


Genetics ◽  
2001 ◽  
Vol 159 (4) ◽  
pp. 1765-1778
Author(s):  
Gregory J Budziszewski ◽  
Sharon Potter Lewis ◽  
Lyn Wegrich Glover ◽  
Jennifer Reineke ◽  
Gary Jones ◽  
...  

Abstract We have undertaken a large-scale genetic screen to identify genes with a seedling-lethal mutant phenotype. From screening ~38,000 insertional mutant lines, we identified >500 seedling-lethal mutants, completed cosegregation analysis of the insertion and the lethal phenotype for >200 mutants, molecularly characterized 54 mutants, and provided a detailed description for 22 of them. Most of the seedling-lethal mutants seem to affect chloroplast function because they display altered pigmentation and affect genes encoding proteins predicted to have chloroplast localization. Although a high level of functional redundancy in Arabidopsis might be expected because 65% of genes are members of gene families, we found that 41% of the essential genes found in this study are members of Arabidopsis gene families. In addition, we isolated several interesting classes of mutants and genes. We found three mutants in the recently discovered nonmevalonate isoprenoid biosynthetic pathway and mutants disrupting genes similar to Tic40 and tatC, which are likely to be involved in chloroplast protein translocation. Finally, we directly compared T-DNA and Ac/Ds transposon mutagenesis methods in Arabidopsis on a genome scale. In each population, we found only about one-third of the insertion mutations cosegregated with a mutant phenotype.


2021 ◽  
Vol 53 (1) ◽  
Author(s):  
Martin Johnsson ◽  
Andrew Whalen ◽  
Roger Ros-Freixedes ◽  
Gregor Gorjanc ◽  
Ching-Yi Chen ◽  
...  

Abstract Background Meiotic recombination results in the exchange of genetic material between homologous chromosomes. Recombination rate varies between different parts of the genome, between individuals, and is influenced by genetics. In this paper, we assessed the genetic variation in recombination rate along the genome and between individuals in the pig using multilocus iterative peeling on 150,000 individuals across nine genotyped pedigrees. We used these data to estimate the heritability of recombination and perform a genome-wide association study of recombination in the pig. Results Our results confirmed known features of the recombination landscape of the pig genome, including differences in genetic length of chromosomes and marked sex differences. The recombination landscape was repeatable between lines, but at the same time, there were differences in average autosome-wide recombination rate between lines. The heritability of autosome-wide recombination rate was low but not zero (on average 0.07 for females and 0.05 for males). We found six genomic regions that are associated with recombination rate, among which five harbour known candidate genes involved in recombination: RNF212, SHOC1, SYCP2, MSH4 and HFM1. Conclusions Our results on the variation in recombination rate in the pig genome agree with those reported for other vertebrates, with a low but nonzero heritability, and the identification of a major quantitative trait locus for recombination rate that is homologous to that detected in several other species. This work also highlights the utility of using large-scale livestock data to understand biological processes.


Sign in / Sign up

Export Citation Format

Share Document