scholarly journals Single Nucleotide Polymorphisms Caused by Assembly Errors

2010 ◽  
Vol 3 ◽  
pp. GEI.S3653
Author(s):  
Jürgen Kleffe ◽  
Robert Weißmann ◽  
Florian F. Schmitzberger

We compare the results of three different assembler programs, Celera, Phrap and Mira2, for the same set of about a hundred thousand Sanger reads derived from an unknown bacterial genome. In difference to previous assembly comparisons we do not focus on speed of computation and numbers of assembled contigs but on how the different sequence assemblies agree by content. Threefold consistently assembled genome regions are identified in order to estimate a lower bound of erroneously identified single nucleotide polymorphisms (SNP) caused by nothing but the process of mathematical sequence assembly. We identified 509 sequence triplets common to all three de-novo assemblies spanning only 34% (3.3 Mb) of the bacterial genome with 175 of these regions (~1.5 Mb) including erroneous SNPs and insertion/deletions. Within these triplets this on average leads to one error per 7,155 base pairs. Replacing the assembler Mira2 by the most recent version Mira3, the letter number even drops to 5,923. Our results therefore suggest that a considerably high number of erroneous SNPs may be present in current sequence data and mathematicians should urgently take up research on numerical stability of sequence assembly algorithms. Furthermore, even the latest versions of currently used assemblers produce erroneous SNPs that depend on the order reads are used as input. Such errors will severely hamper molecular diagnostics as well as relating genome variation and disease. This issue needs to be addressed urgently as the field is moving fast into clinical applications.

2021 ◽  
Author(s):  
Tofazzal Islam ◽  
Nadia Afroz ◽  
ChuShin Koh ◽  
M. Nazmul Haque ◽  
Md. Jillur Rahman ◽  
...  

Abstract Background Jackfruit (Artocarpus heterophyllus Lam.) is a tropical and sub-tropical fruit tree distributed in Asia, Africa, and South America. It is the national fruit of Bangladesh and produces fruit in the summer season only. However, a year-round jackfruit variety, BARI Kanthal-3 developed by Bangladesh Agricultural Research Institute (BARI) provides fruits from September to June. This study aimed to evaluate the agronomic performance of BARI Kanthal-3 and to generate a draft whole genome sequence to obtain molecular insights of this important unique variety. Results Number of fruits, average each fruit weight, fruit yield per plant, edible portion in fruit and ß carotene content of BARI Kanthal-3 (n = 5) were 422/plant/year, 5.60 kg, 236.32 kg/year, 53.5% and 3614 mg/100g, respectively. During de novo assembly, 817.7 Mb of the BARI Kanthal-3 genome was scaffolded. However, in the reference-guided genome assembly, almost 843 Mb of the BARI Kanthal-3 genome was scaffolded. Through BUSCO assessment, 97.2% of the core genes were represented in the assembly with 1.3% and 1.5% either fragmented or missing, respectively. By comparing the single copy orthologues (SCOs) in three closely and one distantly related species of BARI Kanthal-3, 706 SCOs were found to be shared across the genomes of the five species. The phylogenetic analysis of the shared SCOs showed that A. heterophyllus is the closest species to BARI Kantal-3. The estimated genome size of BARI Kanthal-3 was 1.04 giga base pairs (Gbp) with a heterozygosity rate of 1.62%. The estimated GC content was 34.10%. Variant analysis revealed that BARI Kanthal-3 includes 5.7 M (35%) and 10.4 M (65%) simple and heterozygous single nucleotide polymorphisms (SNPs), and about 90% of all these polymorphisms are located in inter-genic regions. Conclusion The whole-genome sequence of A. heterophyllus cv. BARI Kanthal-3 reveals extremely high single nucleotide polymorphisms in inter-genic regions. The findings of this study will help better understanding the evolution, domestication, phylogenetic relationships, year-round fruiting and the markers development for molecular breeding of this highly nutritious fruit crop.


2015 ◽  
Vol 5 (4) ◽  
pp. 121-126
Author(s):  
Shruti Singh ◽  
Kiran Singh ◽  
Manisha Sachan

  DNMT3A and DNMT3B are de novo methyltransferases which are responsi-ble for de novo methylation patterns of the unmethylated DNA. Two Single nucleotide polymorphisms (SNPs) in these genes i.e. -448A>G in DNMT3A and C46359T in DNMT3B, contribute a lot to the genetic susceptibility to breast cancer. In the present study, we analyzed the genotype frequencies of -448A>G polymorphism of DNMT3A and C46359T polymorphism of DNMT3B in breast cancer patients and healthy control subjects to explore the associa-tion of these single nucleotide polymorphisms with susceptibility to develop breast carcinoma. Genotyping was done by PCR-RFLP. 74 patients and 76 controls were genotyped for the DNMT3A (-448A>G) SNP, whereas 72 pa-tients and 107 controls were screened for DNMT3B (C46359T) polymor-phism. Our study clearly suggest that compared to GG carriers, the DNMT3A-448AA homozygotes had a 2.92 fold risk of developing breast carcinoma whereas for DNMT3B (C46359T) polymorphism, CT & CT+CC genotype carri-ers showed a 1.32 & 1.23 fold risk of developing breast carcinoma respec-tively. In Conclusion, DNMT3A SNP -448A>G contributes to genetic suscepti-bility to breast carcinoma whereas DNMT3B SNP C46359T was not found to be associated with pathogenesis of breast cancer in north Indian population.


BMC Genomics ◽  
2009 ◽  
Vol 10 (1) ◽  
pp. 4 ◽  
Author(s):  
Hindrik HD Kerstens ◽  
Sonja Kollers ◽  
Arun Kommadath ◽  
Marisol del Rosario ◽  
Bert Dibbits ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document